HILDA 2017: Workshop on Human-In-the-Loop Data Analytics

Introduction

Any data management system needs to work together with people, whose needs determine the goals for the system, and who must provide the input and who need to work effectively with the output. Data management systems will work much better when they take account of the cognitive and physiological characteristics of the people involved. Recent technology trends (such as touch screens, motion detection, and voice recognition) are widening the possibilities for users to interact with systems, and many information-provision industries are shifting to personalized processing to better target their services to the users’ wishes. HILDA is a workshop that will allow researchers and practitioners to exchange ideas and results relating to how data management can be done with awareness of the people who form part of the processes. A sample of topics that is are in the spirit of this workshop includes, but is not limited to: novel query interfaces, interactive query refinement, data exploration and analysis, data visualization, human-assisted data integration and cleaning, perception-aware data processing, database systems designed for highly interactive use cases, empirical studies of database use, and crowd-powered data infrastructure.

HILDA intends to be a forum where people from varied communities engage with one another's ideas. We are keen to have submissions that present initial ideas and visions, just as much as reports on early results, or reflections on completed projects. The workshop will focus on discussion and interaction, rather than static presentations of what is in the paper.

Submission

Authors are invited to submit original, unpublished research papers that are not being considered for publication in any other forum. Papers must follow the ACM Proceedings Format. Papers submitted can be between four and six pages in length, including references and appendix.

Submissions will be handled through EasyChair.

Keynote Speakers

Ben Shneiderman, University of Maryland/College Park [details]
Mike Cafarella, University of Michigan [details]

Proceedings

All HILDA papers are available through this TOC and will provide free access to the papers maintained in the ACM DL for a one year period from the conference start date.

Important Dates

Workshop Date: May 14, 2017
Submissions due: March 10, 2017, 11.59PM US EDT
~~March 3, 2017 11:59PM US EDT~~
Notification of outcome: April 2, 11.59PM US EDT
~~March 26, 2017 11:59PM US EDT~~
Camera-ready due: April 9, 2017 11:59PM US EDT

Related Workshops

Program Chairs

Carsten Binnig (Brown University, co-chair)
Joseph M. Hellerstein (University of California, Berkeley, co-chair)
Aditya Parameswaran (University of Illinois, co-chair)

Program Committee

Abraham Bernstein, University of Zurich
Adriane Chapman, MITRE McLean
Anil Bahuman, Reliance Industries
Anushka Anand, Tableau Software
Beth Trushkowsky, Harvey Mudd College
Brian Lim, NUS Singapore
Carl-Christian Kanne, Platfora
Chris Re, Stanford University
Dafna Shahaf, Hebrew University Jerusalem
Danyel Fisher, Microsoft Research
Dana Groff, MongoDB
Eugene Wu, Columbia University
Giorgio Caviglia, Trifacta Inc
Guoliang Li, Tsinghua University
Harish Doraiswamy, NYU Data Science Center
James Terwilliger, Microsoft Research
Jessica Hullman, University of Washington
Martin Kersten, CWI
Olga Papemmanouil, Brandeis University
Oliver Kennedy, University at Buffalo
Patrick Olivier, Newcastle University
Remco Chang, Tufts University
Rick Cole, Tableau Software
Stratos Idreos, Harvard University
Sudeepa Roy, Duke University
Tim Kraska, Brown University
Tiziana Catarci, Sapienza Universit di Roma
Yunyao Li, IBM Research

Steering Committee

Alan Fekete (University of Sydney)
Laura Haas (IBM Research)
Arnab Nandi (The Ohio State University)

Keynotes

A Two-Handed Future for Human-In-The-Loop Data Analytics — Michael Cafarella, University of Michigan

Abstract: The amount of research using human-in-the-loop methods has exploded over the last few years. Moreover, it has done so across many different subfields of computing, including data analytics, general-purpose data management, human-computer interaction, and others. Almost as impressive as the number of projects is the sheer breadth of methods that fall under the human-in-the-loop label. By drawing on human-in-the-loop examples from data analytics, program synthesis, and information extraction, I will propose that humans in these projects usually serve one of two roles: to specify correctness, or to specify a search space. Further, it may be possible to build generic and reusable human-in-the-loop software primitives that enable these two goals.
Bio: Michael Cafarella is an Associate Professor of Computer Science and Engineering at the University of Michigan. His research interests include databases, information extraction, data integration, and data mining. Mike received his PhD from the University of Washington, Seattle, in 2009 with advisors Oren Etzioni and Dan Suciu. He received the NSF CAREER award in 2011 and a Sloan Research Fellowship in 2016. In addition to his academic work, he costarted (with Doug Cutting) the Hadoop open-source project, which is widely used at Facebook, Twitter, and elsewhere. In 2015, he cofounded (with Chris Re and Feng Niu) Lattice Data, Inc.

Information Visualization for Knowledge Discovery: Big Insights from Big Data — Ben Shneiderman, University of Maryland

Abstract: Interactive information visualization tools provide researchers with remarkable capabilities to support discovery from Big Data resources. Users can begin with an overview, zoom in on areas of interest, filter out unwanted items, and then click for details-on-demand. The Big Data initiatives and commercial success stories, plus widespread use by prominent sites such as the New York Times have made visualization a key technology. The central theme is the integration of statistics with visualization to support user discovery. Our work focuses on temporal event sequences such as found in electronic health records (www.cs.umd.edu/hcil/eventflow), and social network data such a twitter discussion patterns (www.codeplex.com/nodexl). The talk closes with 8 Golden Rules for Big Data.
Bio: Ben Shneiderman (http://www.cs.umd.edu/~ben) is a Distinguished University Professor in the Department of Computer Science, Founding Director (1983-2000) of the Human-Computer Interaction Laboratory (http://www.cs.umd.edu/hcil/), and a Member of the UM Institute for Advanced Computer Studies (UMIACS) at the University of Maryland. He is a Fellow of the AAAS, ACM, IEEE, and NAI, and a Member of the National Academy of Engineering, in recognition of his pioneering contributions to human-computer interaction and information visualization. His contributions include the direct manipulation concept, clickable highlighted web-links, touchscreen keyboards, dynamic query sliders, development of treemaps, novel network visualizations for NodeXL, and temporal event sequence analysis for electronic health records. Shneiderman is the co-author with Catherine Plaisant of Designing the User Interface: Strategies for Effective Human-Computer Interaction (6th ed., 2016) http://www.awl.com/DTUI/. With Stu Card and Jock Mackinlay, he co-authored Readings in Information Visualization: Using Vision to Think (1999). He co-authored, Analyzing Social Media Networks with NodeXL (www.codeplex.com/nodexl) (Morgan Kaufmann) with Derek Hansen and Marc Smith. Shneiderman’s latest book is The New ABCs of Research: Achieving Breakthrough Collaborations (Oxford, 2016) (www.cs.umd.edu/hcil/newabcs).

Program / Schedule (Room: Continental A)

9:00-9:10 Opening

9:10-10:00

Keynote: Michael Cafarella

10:00-10:30 Session 1: "Assisted Data Discovery" Session Chair: Eugene Wu

Yannis Katsis, Nikos Koulouris, Yannis Papakonstantinou and Kevin Patrick.
Assisting Discovery in Public Health (Distinguished Long Talk)
Yue Guo, Carsten Binnig and Tim Kraska.
What you see is not what you get! Detecting Simpson's Paradoxes during Data Exploration

10:30-11:00 Break

11:00-12:00 Session 2: "Simplifying Machine Learning" Session Chair: Yannis Papakonstantinou

Ce Zhang, Wentao Wu and Tian Li.
An Overreaction to the Broken Machine Learning Abstraction: The ease.ml Vision (Distinguished Long Talk)
Sanjay Krishnan and Eugene Wu.
PALM: Machine Learning Explanations For Iterative Debugging
Paroma Varma, Dan Iter, Christopher De Sa and Christopher Ré.
Flipper: A Systematic Approach to Debugging Training Sets
Paolo Tamagnini, Josua Krause, Aritra Dasgupta and Enrico Bertini.
Interpreting Black-Box Classifiers Using Instance-Level Visual Explanations

12:00-12:30 Session 3: "Tools and Workflows" Session Chair: Alan Fekete

Hui Miao, Amit Chavan and Amol Deshpande.
ProvDB: Lifecycle Management of Collaborative Analysis Workflows (Distinguished Long Talk)
Alexandr A. Kalinin, Selvam Palanimalai and Ivo D. Dinov.
SOCRAT Platform Design: A Web Architecture for Interactive Visual Analytics Applications

12:30-14:00 Lunch

14:00-14:50

Keynote: Ben Shneiderman

14:50-15:30 Session 4: "User Interfaces and Intent" Session Chair: Amol Deshpande

Dominik Moritz and Danyel Fisher.
What Users Don't Expect about Exploratory Data Analysis on AQP Systems (Distinguished Long Talk)
Haoci Zhang, Thibault Sellam and Eugene Wu.
Precision Interfaces
Dharmil Chandarana, Vraj Shah, Arun Kumar and Lawrence Saul.
SpeakQL: Towards Speech-driven Multi-modal Querying

15:30-16:00 Break

16:00-16:50 Session 5: "Collaboration and Feedback" Session Chair: Arun Kumar

Anhai Doan, Adel Ardalan, Jeffrey Ballard, Yash Govind, Pradap Konda, Han Li, Sidharth Mudgal, Erik Paulson, Paul Suganthan G.C. and Haojun Zhang.
Human-in-the-Loop Challenges for Entity Matching: A Midterm Report (Distinguished Long Talk)
Nurzety Azuan, Suzanne Embury and Norman Paton.
Observing the Data Scientist: Using Manual Corrections As Implicit Feedback
John Wenskovitch and Chris North.
Observation-Level Interaction with Clustering and Dimension Reduction Algorithms
Ben McCamish, Arash Termehchy and Behrouz Touri.
A Game-theoretic Approach to Data Interaction: A Progress Report

16:50-18:00 Poster / Demo Session (All Papers)

19:30-21:00 Dinner in small groups (Organization: https://goo.gl/rxrcM1, Meet: 19:15 in the hotel lobby)

Contact

For questions, email us at 2017@hilda.io.

Join us on Twitter.

Introduction

Submission

Keynote Speakers

Proceedings

Important Dates

Related Workshops

Program Chairs

Program Committee

Steering Committee

Keynotes

Program / Schedule (Room: Continental A)

Keynote: Michael Cafarella

Assisting Discovery in Public Health (Distinguished Long Talk)

An Overreaction to the Broken Machine Learning Abstraction: The ease.ml Vision (Distinguished Long Talk)

ProvDB: Lifecycle Management of Collaborative Analysis Workflows (Distinguished Long Talk)

Keynote: Ben Shneiderman

What Users Don't Expect about Exploratory Data Analysis on AQP Systems (Distinguished Long Talk)

Human-in-the-Loop Challenges for Entity Matching: A Midterm Report (Distinguished Long Talk)

Sponsors

Contact

Follow us