Visual Interactive Data Exploration and Machine Learning

Interactive Data Exploration and Machine Learning

This research area of our lab is about data management techniques to better support human-in-the-loop data exploration workloads over large amounts of data.

You can find a list of projects in this research area below. Among others, in one project we collaborated with the data management labs of Brown University and MIT on a system in called Vizdom that allows users to visually compose and execute complex analytical workflows on interactive whiteboards (e.g., a Microsoft Surface device). The backend of the system extends state of the art approximate query processing techniques to leverage perceptual effects in order to run the computations at interactive speeds. This work received a best demo award at the VLDB 2015 conference.


We present a new system for interactive text summarization called Sherlock. The task of automatically producing textual summaries is an important step to understand a collection of multiple topic-related documents. It has many real-world applications in journalism, medicine, and many more. A new approximate summarization model into Sherlock that can guarantee interactive speeds even for large text collections to keep the user engaged in the process.



Large state-of-the-art corpora for training neural networks to create abstractive summaries are mostly limited to the news genre, as it is expensive to acquire human-written summaries for other types of text at a large scale. We present a novel automatic corpus construction approach to tackle this issue as well as three new large open-licensed summarization corpora based on our approach that can be used for training abstractive summarization models.



We build a new system for interactive analytics through pen and touch called Vizdom. Vizdom’s frontend allows users to visually compose complex workflows of ML and statistics operators on an interactive whiteboard, and the back end leverages recent advances in workflow compilation tech niques to run these computations at interactive speeds.