Workshop on Human-In-the-Loop Data Analytics
June 26, 2016 | Co-located with SIGMOD 2016 in San Francisco, CA

Trifacta Location: 575 Market, San Francisco


Any data management system needs to work together with people, whose needs determine the goals for the system, and who must provide the input and who need to work effectively with the output. Data management systems will work much better when they take account of the cognitive and physiological characteristics of the people involved. Recent technology trends (such as touch screens, motion detection, and voice recognition) are widening the possibilities for users to interact with systems, and many information-provision industries are shifting to personalized processing to better target their services to the users’ wishes. HILDA is a new workshop that will allow researchers and practitioners to exchange ideas and results relating to how data management can be done with awareness of the people who form part of the processes. A sample of topics that is are in the spirit of this workshop includes, but is not limited to: novel query interfaces, interactive query refinement, data exploration and analysis, data visualization, human-assisted data integration and cleaning, perception-aware data processing, database systems designed for highly interactive use cases, empirical studies of database use, and crowd-powered data infrastructure.

HILDA intends to be a forum where people from varied communities engage with one another's ideas. We are keen to have submissions that present initial ideas and visions, just as much as reports on early results, or reflections on completed projects. The workshop will focus on discussion and interaction, rather than static presentations of what is in the paper.

Proceedings[ACM DL: Available June 26]

Registration[sigmod website]

Program / Schedule[now available]

Student Travel Scholarships [more details]

Keynote Speakers

Important Dates

  • Workshop Date: June 26
  • Submissions due: April 25, 2016 11:59PM US EDT
  • Notification of outcome: May 10, 2016 11:59PM US EDT
  • Camera-ready due: May 31, 2016 11:59PM US EDT

Related Workshops

Organizing & Program Committee

  • Alan Fekete (University of Sydney, co-chair)
  • Arnab Nandi (The Ohio State University, co-chair)
  • Carsten Binnig (Brown University, co-chair)
  • Anil Bahuman (Reliance Industries)
  • Abraham Bernstein (University of Zurich)
  • Adam Marcus (Unlimited Labs)
  • Aditya Parameswaram (University of Illinois)
  • Carl-Christian Kanne (Platfora)
  • Chris Ŕe (Stanford University)
  • Christopher Nguyen (Arimo Inc.)
  • Dana Groff (MongoDB, Inc.)
  • Eugene Wu (Columbia University)
  • Giorgio Caviglia (Trifacta Inc.)
  • James Terwilliger (Microsoft Research)
  • Jill Freyne (CSIRO)
  • Jun Yang (Duke University)
  • Michael Cafarella (University of Michigan, Ann Arbor)
  • Olga Papemmanouil (Brandeis University)
  • Oliver Kennedy (University at Buffalo)
  • Patrick Olivier (Newcastle University)
  • Remco Chang (Tufts University)
  • Rick Cole (Tableau Software)
  • Steven Drucker (Microsoft Research)
  • Stratos Idreos (Harvard University)
  • Tim Kraska (Brown University)
  • Yunyao Li (IBM Research)


Keeping People “In the Loop” While Accelerating Discovery from Data — Laura Haas, IBM Research

Data analytics is a complex, multi-stage, iterative process, done by humans, for humans, and typically involving a team of people with diverse expertise. From observational studies and interviews, we know that these teams struggle with a number of tasks, for example, finding appropriate data, analytic tools and models, working with a broad range of tools to transform and analyze data, keeping track of their results and how they were derived, and locating other experts who can assist them in each of these tasks.

At the IBM Research Accelerated Discovery Lab, our mission is to help data scientists get insights and discoveries from data faster and with better results than they can today. To do that, we are creating an environment that provides much of what they need, from hardware to analytics software to contextual data and expertise. But addressing data scientists’ key challenges requires a platform for collaboration and sharing, a platform that integrates with the user’s tools and environment to support all aspects of the analytics process. In this talk, I will draw on studies of data scientists in both business and scientific contexts to illustrate the challenge and motivate a key aspect of our solution: an integration hub for information about people, data, and tools, manifested as a rich metadata graph. The graph allows us to track provenance, support social networks and contextual search, and provide recommendations for data sets, models, tools and people of interest. I will also describe LabBook, a conversational user experience we have built leveraging the graph, and show how it is used by a team of data scientists. We have found that conversation as a metaphor for human-in-the-loop systems offers opportunities for mixed-initiative cooperation and collaboration among people and systems.

Vega: Declarative Interactive Data Visualization — Jeffrey Heer, University of Washington

From specifying queries to understanding query results, visualization tools have become a common and powerful way to interface with database systems. At the same time, the growing scale and complexity of data is requiring visualization system builders to turn to data management techniques. A convergence among the fields is both timely and needed. Drawing on both of these traditions, the Vega project is developing high-level declarative languages for integrated data transformation, visual encoding and interaction design. I will first introduce the Vega and Vega-Lite languages, and show how they can be used to create customized interactive graphics. I will then turn to ongoing research exploring the potential of Vega to enable a new generation of visual analysis tools.

Program / Schedule

8:15-8:30       Opening

8:30-9:30       Keynote: Laura Haas

9:30-10:10       Session "User Interfaces" (Long Talks) Session Chair: Anil Bahuman

  • Thibault Sellam and Martin Kersten.
    Have a Chat with Clustine, Conversational Engine to Query Large Tables
  • Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe and Jeffrey Heer.
    Towards A General-Purpose Query Language for Visualization Recommendation

10:10-10:40       Break

10:40-12:00       Session "Interactive Analytics" (Long Talks) Session Chair: Stratos Idreos

  • Minsuk Kahng, Dezhi Fang and Duen Horng Chau.
    Visual Exploration of Machine Learning Results using Data Cube Analysis
  • Daniel Alabi and Eugene Wu.
    PFunk-H: Approximate Query Processing using Perceptual Models
  • Andrew Crotty, Alexander Galakatos, Emanuel Zgraggen, Carsten Binnig and Tim Kraska.
    The Case for Interactive Data Exploration Accelerators (IDEAs)
  • Niranjan Kamat, Eugene Wu, and Arnab Nandi.
    TrendQuery: A System for Interactive Exploration of Trends

12:00-13:30       Lunch

13:30-14:10       Session "Data Cleaning and Extraction" (Long Talks) Session Chair: Rick Cole

  • Sanjay Krishnan, Daniel Haas, Eugene Wu and Michael Franklin.
    Towards Reliable Interactive Data Cleaning: A User Survey and Recommendations
  • Henry Ehrenberg, Jaeho Shin, Alex Ratner, Jason Fries and Christopher Re.
    Data Programming with DDLite: Putting Humans in a Different Part of the Loop

14:10-15:00       Short Talks

  • Danyel Fisher.
    Big Data Exploration Requires Collaboration Between Visualization and Data Infrastructures
  • Jessica Zeitz Self, Radha Krishnan Vinayagam, J.T. Fry and Chris North.
    Bridging the Gap between User Intention and Model Parameters for Data Analytics
  • Muhammad El-Hindi, Zheguang Zhao, Carsten Binnig and Tim Kraska.
    VisTrees: Fast Indexes for Interactive Data Exploration
  • Linus Karsai, Alan Fekete, Paolo Missier and Judy Kay.
    Clustering Provenance
  • Juliana Freire, Boris Glavic, Oliver Kennedy and Heiko Mueller.
    The Exception that Improves the Rule
  • Luis Tari, Varish Mulwad and Anna von Reden.
    Interactive Online Learning for Clinical Entity Recognition
  • Manasi Vartak, Harihar Subramanyam, Wei-En Lee, Srinidhi Viswanathan, Saadiyah Husnoo, Samuel Madden, Matei Zaharia.
    MODELDB: A System for Machine Learning Model Management
  • Yifan Wu, Joseph Hellerstein and Eugene Wu.
    A DeVIL-ish Approach to Inconsistency in Interactive Visualizations

15:00-15:30       Break

15:30-16:30       Keynote: Jeff Heer

16:45-17:45       Poster / Demo Session (All Papers)

18:30-21:00       Trifacta Open House (Reception)



For questions, email us at

Follow us

Join us on Twitter.