Our extended abstract "ASET: Ad-hoc Structured Exploration of Text Collections” was accepted to ADIB 2021

We propse a new way to explore textual contents in a structured way

2021/08/13

In this paper, we propose a new system called ASET that allows users to perform structured explorations of text collections in an ad-hoc manner. The main idea of ASET is to use a new two-phase approach that first extracts a superset of information nuggets from the texts using existing extractors such as named entity recognizers and then matches the extractions to a structured table definition as requested by the user based on embeddings. In our evaluation, we show that ASET is thus able to extract structured data from real-world text collections in high quality without the need to design extraction pipelines upfront.

We will present that work at AIDB on Friday, August 20 at 11:40 h (UTC+2).

AIDB 2021 is co-hosted with VLDB 2021.

Authors:

  • Benjamin Hättasch (TU Darmstadt)
  • Jan-Micha Bodensohn (TU Darmstadt)
  • Carsten Binnig (TU Darmstadt)