Workshop on Human-In-the-Loop Data Analytics
Co-located with SIGMOD 2017 (14 May 2017, Chicago)


Any data management system needs to work together with people, whose needs determine the goals for the system, and who must provide the input and who need to work effectively with the output. Data management systems will work much better when they take account of the cognitive and physiological characteristics of the people involved. Recent technology trends (such as touch screens, motion detection, and voice recognition) are widening the possibilities for users to interact with systems, and many information-provision industries are shifting to personalized processing to better target their services to the users’ wishes. HILDA is a workshop that will allow researchers and practitioners to exchange ideas and results relating to how data management can be done with awareness of the people who form part of the processes. A sample of topics that is are in the spirit of this workshop includes, but is not limited to: novel query interfaces, interactive query refinement, data exploration and analysis, data visualization, human-assisted data integration and cleaning, perception-aware data processing, database systems designed for highly interactive use cases, empirical studies of database use, and crowd-powered data infrastructure.

HILDA intends to be a forum where people from varied communities engage with one another's ideas. We are keen to have submissions that present initial ideas and visions, just as much as reports on early results, or reflections on completed projects. The workshop will focus on discussion and interaction, rather than static presentations of what is in the paper.


Authors are invited to submit original, unpublished research papers that are not being considered for publication in any other forum. Papers must follow the ACM Proceedings Format. Papers submitted can be between four and six pages in length, including references and appendix.

Submissions will be handled through EasyChair.

Keynote Speakers

  • Ben Shneiderman, University of Maryland/College Park [details]
  • Mike Cafarella, University of Michigan [details]


  • All HILDA papers are available through this TOC and will provide free access to the papers maintained in the ACM DL for a one year period from the conference start date.

  • Important Dates

    • Workshop Date: May 14, 2017
    • Submissions due: March 10, 2017, 11.59PM US EDT
      March 3, 2017 11:59PM US EDT
    • Notification of outcome: April 2, 11.59PM US EDT
      March 26, 2017 11:59PM US EDT
    • Camera-ready due: April 9, 2017 11:59PM US EDT

    Related Workshops

Program Chairs

  • Carsten Binnig (Brown University, co-chair)
  • Joseph M. Hellerstein (University of California, Berkeley, co-chair)
  • Aditya Parameswaran (University of Illinois, co-chair)

Program Committee

  • Abraham Bernstein, University of Zurich
  • Adriane Chapman, MITRE McLean
  • Anil Bahuman, Reliance Industries
  • Anushka Anand, Tableau Software
  • Beth Trushkowsky, Harvey Mudd College
  • Brian Lim, NUS Singapore
  • Carl-Christian Kanne, Platfora
  • Chris Re, Stanford University
  • Dafna Shahaf, Hebrew University Jerusalem
  • Danyel Fisher, Microsoft Research
  • Dana Groff, MongoDB
  • Eugene Wu, Columbia University
  • Giorgio Caviglia, Trifacta Inc
  • Guoliang Li, Tsinghua University
  • Harish Doraiswamy, NYU Data Science Center
  • James Terwilliger, Microsoft Research
  • Jessica Hullman, University of Washington
  • Martin Kersten, CWI
  • Olga Papemmanouil, Brandeis University
  • Oliver Kennedy, University at Buffalo
  • Patrick Olivier, Newcastle University
  • Remco Chang, Tufts University
  • Rick Cole, Tableau Software
  • Stratos Idreos, Harvard University
  • Sudeepa Roy, Duke University
  • Tim Kraska, Brown University
  • Tiziana Catarci, Sapienza Universit di Roma
  • Yunyao Li, IBM Research

Steering Committee

  • Alan Fekete (University of Sydney)
  • Laura Haas (IBM Research)
  • Arnab Nandi (The Ohio State University)


A Two-Handed Future for Human-In-The-Loop Data Analytics — Michael Cafarella, University of Michigan

  • Abstract: The amount of research using human-in-the-loop methods has exploded over the last few years. Moreover, it has done so across many different subfields of computing, including data analytics, general-purpose data management, human-computer interaction, and others. Almost as impressive as the number of projects is the sheer breadth of methods that fall under the human-in-the-loop label. By drawing on human-in-the-loop examples from data analytics, program synthesis, and information extraction, I will propose that humans in these projects usually serve one of two roles: to specify correctness, or to specify a search space. Further, it may be possible to build generic and reusable human-in-the-loop software primitives that enable these two goals.
  • Bio: Michael Cafarella is an Associate Professor of Computer Science and Engineering at the University of Michigan. His research interests include databases, information extraction, data integration, and data mining. Mike received his PhD from the University of Washington, Seattle, in 2009 with advisors Oren Etzioni and Dan Suciu. He received the NSF CAREER award in 2011 and a Sloan Research Fellowship in 2016. In addition to his academic work, he costarted (with Doug Cutting) the Hadoop open-source project, which is widely used at Facebook, Twitter, and elsewhere. In 2015, he cofounded (with Chris Re and Feng Niu) Lattice Data, Inc.

Information Visualization for Knowledge Discovery: Big Insights from Big Data — Ben Shneiderman, University of Maryland

  • Abstract: Interactive information visualization tools provide researchers with remarkable capabilities to support discovery from Big Data resources. Users can begin with an overview, zoom in on areas of interest, filter out unwanted items, and then click for details-on-demand. The Big Data initiatives and commercial success stories, plus widespread use by prominent sites such as the New York Times have made visualization a key technology. The central theme is the integration of statistics with visualization to support user discovery. Our work focuses on temporal event sequences such as found in electronic health records (, and social network data such a twitter discussion patterns ( The talk closes with 8 Golden Rules for Big Data.
  • Bio: Ben Shneiderman ( is a Distinguished University Professor in the Department of Computer Science, Founding Director (1983-2000) of the Human-Computer Interaction Laboratory (, and a Member of the UM Institute for Advanced Computer Studies (UMIACS) at the University of Maryland. He is a Fellow of the AAAS, ACM, IEEE, and NAI, and a Member of the National Academy of Engineering, in recognition of his pioneering contributions to human-computer interaction and information visualization. His contributions include the direct manipulation concept, clickable highlighted web-links, touchscreen keyboards, dynamic query sliders, development of treemaps, novel network visualizations for NodeXL, and temporal event sequence analysis for electronic health records. Shneiderman is the co-author with Catherine Plaisant of Designing the User Interface: Strategies for Effective Human-Computer Interaction (6th ed., 2016) With Stu Card and Jock Mackinlay, he co-authored Readings in Information Visualization: Using Vision to Think (1999). He co-authored, Analyzing Social Media Networks with NodeXL ( (Morgan Kaufmann) with Derek Hansen and Marc Smith. Shneiderman’s latest book is The New ABCs of Research: Achieving Breakthrough Collaborations (Oxford, 2016) (

Program / Schedule (Room: Continental A)

9:00-9:10       Opening


Keynote: Michael Cafarella

10:00-10:30       Session 1: "Assisted Data Discovery" Session Chair: Eugene Wu

  • Yannis Katsis, Nikos Koulouris, Yannis Papakonstantinou and Kevin Patrick.
    Assisting Discovery in Public Health (Distinguished Long Talk)
  • Yue Guo, Carsten Binnig and Tim Kraska.
    What you see is not what you get! Detecting Simpson's Paradoxes during Data Exploration

10:30-11:00       Break

11:00-12:00       Session 2: "Simplifying Machine Learning" Session Chair: Yannis Papakonstantinou

  • Ce Zhang, Wentao Wu and Tian Li.
    An Overreaction to the Broken Machine Learning Abstraction: The Vision (Distinguished Long Talk)
  • Sanjay Krishnan and Eugene Wu.
    PALM: Machine Learning Explanations For Iterative Debugging
  • Paroma Varma, Dan Iter, Christopher De Sa and Christopher Ré.
    Flipper: A Systematic Approach to Debugging Training Sets
  • Paolo Tamagnini, Josua Krause, Aritra Dasgupta and Enrico Bertini.
    Interpreting Black-Box Classifiers Using Instance-Level Visual Explanations

12:00-12:30       Session 3: "Tools and Workflows" Session Chair: Alan Fekete

  • Hui Miao, Amit Chavan and Amol Deshpande.
    ProvDB: Lifecycle Management of Collaborative Analysis Workflows (Distinguished Long Talk)
  • Alexandr A. Kalinin, Selvam Palanimalai and Ivo D. Dinov.
    SOCRAT Platform Design: A Web Architecture for Interactive Visual Analytics Applications

12:30-14:00       Lunch


Keynote: Ben Shneiderman

14:50-15:30       Session 4: "User Interfaces and Intent" Session Chair: Amol Deshpande

  • Dominik Moritz and Danyel Fisher.
    What Users Don't Expect about Exploratory Data Analysis on AQP Systems (Distinguished Long Talk)
  • Haoci Zhang, Thibault Sellam and Eugene Wu.
    Precision Interfaces
  • Dharmil Chandarana, Vraj Shah, Arun Kumar and Lawrence Saul.
    SpeakQL: Towards Speech-driven Multi-modal Querying

15:30-16:00       Break

16:00-16:50       Session 5: "Collaboration and Feedback" Session Chair: Arun Kumar

  • Anhai Doan, Adel Ardalan, Jeffrey Ballard, Yash Govind, Pradap Konda, Han Li, Sidharth Mudgal, Erik Paulson, Paul Suganthan G.C. and Haojun Zhang.
    Human-in-the-Loop Challenges for Entity Matching: A Midterm Report (Distinguished Long Talk)
  • Nurzety Azuan, Suzanne Embury and Norman Paton.
    Observing the Data Scientist: Using Manual Corrections As Implicit Feedback
  • John Wenskovitch and Chris North.
    Observation-Level Interaction with Clustering and Dimension Reduction Algorithms
  • Ben McCamish, Arash Termehchy and Behrouz Touri.
    A Game-theoretic Approach to Data Interaction: A Progress Report

16:50-18:00       Poster / Demo Session (All Papers)

19:30-21:00       Dinner in small groups (Organization:, Meet: 19:15 in the hotel lobby)



For questions, email us at

Follow us

Join us on Twitter.