HILDA brings together researchers and practitioners to exchange ideas and results on human-data interaction. It explores how data management and analysis can be made more effective when taking into account the people who design and build these processes as well as those who are impacted by their results.
In HILDA 2022, we implemented a mentoring program (inspired by workshops such as PLATEAU) and are continuing it this year. Our focus is on promising and early-stage research, with a core component of the program being that each paper is assigned a mentor. More details on the process are below.
The theme for this edition of the workshop is HILDA and Large Language Models (LLMs), however, the workshop is not limited to this theme and other topics are also of interest. We encourage research on guidelines and best practices for effective human-LLM collaboration. We also encourage research that questions the role of humans in traditional data pipelines with the emergence of LLMs.
Program
The schedule is not finalized and may change. Please check it again closer to the event.
8:30 | Opening Remarks |
8:35 | Keynote by Tim Kraska: ML and Generative AI is reshaping the entire data service industry, but what should academia do? |
Session Chair: Kexin Rong | |
9:20 |
Transparent Data Preprocessing for Machine Learning
|
9:35 |
Towards Extending XAI for Full Data Science Pipelines
|
9:50 |
Guided Querying over Videos using Autocompletion Suggestions
|
10:05 | Break |
LLM in the Industry Panel | |
Moderator: Behrooz Omidvar-Tehrani | |
10:30 | Introduction
|
10:35 | Presentation
|
11:00 | Discussion, Q&A
|
12:00 | Lunch Break |
Session Chair: Roee Shraga | |
14:00 | Keynote by Renée Miller: Semantic Benchmark Generation: Can LLMs Generate Better Benchmarks than Humans? |
14:45 |
(Short Paper) “It Took Longer than I was Expecting:” Why is Dataset Search Still so Hard?
|
14:55 |
(Short Paper) Key Insights from a Feature Discovery Use-Case Study
|
15:05 | Drag, Drop, Merge: A Tool for Streamlining Integration of Longitudinal Survey Instruments
|
15:20 | More of that, please: Domain Adaptation of Information Extraction through Examples & Feedback
|
15:35 | Break |
Session Chair: Kexin Rong | |
16:00 |
A Diagram Unifying ER and Data Flow Notation For Data Integration and Transformations For Data Science Collaborations
|
16:15 |
CopycHats: Question Sequencing with Artificial Agents
|
16:30 |
LLMs as an Interactive Database Interface for Designing Large Queries
|
16:45 |
Pipe(line) Dreams: Fully Automated End-to-End Analysis and Visualization
|
17:00 |
Cocoon: Semantic Table Profiling Using Large Language Models
|
17:15 |
Causal Dataset Discovery with Large Language Models
|
17:30 | Closing Remarks |
HILDA 2024 Keynote Talks
Our exciting program will feature the following invited keynote speakers to talk about the challenges of human-data interaction.
HILDA 2024 Industry Panel
This year, we will also be pioneering an industry panel designed to foster meaningful discussions between industry leaders and researchers on the optimization and application of Large Language Models (LLMs) in the tech industry. Our panelists include:
Xin Luna Dong |
Hadas Kotek |
Tim Kraska |
Fatma Ozcan |
Raghu Ramakrishnan |
What to submit
We encourage both standard research papers and more unusual works—for instance papers that describe in-progress work, reports on experiences, question accepted wisdom, raise open problems, or propose speculative new approaches. A HILDA submission should describe work or perspectives that will lead to interesting discussions at the workshop or that the authors want feedback on.
We welcome work that proposes innovations in design to improve the way people can work with data management systems, as well as work that studies empirically how humans interact with existing systems. We welcome research that comes from the traditions of the database systems community, and also reports on industry activities, and research on data topics from communities that study people and organizations. A sample of topics that are in the spirit of this workshop include, but are not limited to:
- novel query interfaces,
- interactive query refinement,
- data exploration and analysis,
- data visualization,
- human-assisted data integration and cleaning,
- perception-aware data processing,
- database systems designed for highly interactive use cases,
- empirical studies of database use,
- evaluating and ensuring fairness in data-driven decision making processes
- understanding the outcomes of processes through provenance and explanations
- interactive debugging of complex data systems
- crowd-powered data infrastructure, etc.
Submissions can also examine any of the above topics from an application or domain perspective.
HILDA is a forum where people from multiple communities engage with one another's ideas. We are keen to have submissions that present initial ideas and visions, just as much as reports on early results, or reflections on completed projects.
The workshop will focus on discussion and interaction, rather than static presentations of what is in the paper.
Review and Mentorship Process
HILDA reviews are single blind. All submitted papers will be reviewed by at least three reviewers who will determine the fit of the work for HILDA's unique mentorship process this year, the quality of the work, and its potential for future research.
Every accepted paper will be assigned a mentor who will engage with the authors providing constructive feedback through one-on-one, virtual, discussions. We hope that the authors will work closely with their mentors to improve the substance and direction of their work.
Authors and mentors can withdraw without repercussions due to unforeseen conflicts. In such situations, the program chairs will try to find another suitable mentor.
Submission
Authors are invited to submit papers between four and six pages in length excluding references and using the standard SIGMOD paper formatting template Submissions should reflect the current state of the research work but also include a section on limitations and challenges that they wish to receive feedback from their mentors and the HILDA community on.
We are following SIGMOD24 submission format, i.e., 2-column ACM Proceedings Format, using either the sample-sigconf.tex or Interim layout.docx template provided at https://www.acm.org/publications/proceedings-template for LaTeX (version 2e) or Word, respectively. If you plan to use ACM's official Overleaf template, please use the 2-column template available at https://www.overleaf.com/latex/templates/association-for-computing-machinery-acm-sig-proceedings-template/bmvfhcdnxfty.
Submission website: https://cmt3.research.microsoft.com/HILDA2024
Proceedings
- We will provide links to accepted papers in the program here as well as publish them for a year through the ACM DL.
Important Dates
- Workshop Date: June 14, 2024
- Submission (extended):
April 7, 2024April 15, 2024 AOE - Notification of outcome: May 7, 2024 (Tentative)
- Camera-ready due: May 30, 2024 (before the workshop)
Workshop Chairs
- Jean-Daniel Fekete (Inria & Université Paris-Saclay)
- Behrooz Omidvar-Tehrani (AWS AI Labs)
- Kexin Rong (Georgia Institute of Technology)
- Roee Shraga (Worcester Polytechnic Institute)
Mentors
- Amir Gilad (The Hebrew University)
- Amit Somech (Bar-Ilan University)
- Arvind Satyanarayan (MIT CSAIL)
- Bar Genossar (Technion - Israel Institute of Technology)
- Brit Youngmann (Technion - Israel Institute of Technology)
- Dixin Tang (University of Texas, Austin)
- Fatemeh Nargesian (University of Rochester)
- Giuseppe Santucci (University of Rome "La Sapienza")
- Iddo Drori (Boston University and Columbia University)
- Aamod Khatiwada (Northeastern)
- Grace Fan (Northeastern)
- Oliver A Kennedy (University at Buffalo, SUNY)
- Senjuti Basu Roy (New Jersey Institute of Technology)
- Slava Novgorodov (Tel Aviv University)
- Tiziana Catarci (University of Rome "La Sapienza")
- Vidya Setlur (Tableau Research)
- Yannis Katsis (IBM Research)
- Zhengjie Miao (Simon Fraser University)
Steering Committee
- Carsten Binnig (TU Darmstadt)
- Juliana Freire (New York University)
- Aditya Parameswaran (University of California, Berkeley)
- Arnab Nandi (The Ohio State University)
Contact
For questions, please email the workshop chairs directly.
Follow us
Join us on Twitter.