INCEpTION

INCEpTION: Towards an Infrastructure for the Distributed Exploration and Annotation of Large Corpora and Knowledge Bases

Motivation

The annotation of specific semantic phenomena often require compiling task-specific corpora and creating or extending task-specific knowledge bases. Presently, researchers require a broad range of skills and tools to address such semantic annotation tasks.

In the recently funded INCEpTION project, UKP Lab at TU Darmstadt aims towards building an annotation platform that incorporates all the related tasks into a joint web-based platform. The following sections briefly outline the involved tasks.

Corpus Extraction

This is the act of extracting a task-specific corpus from a larger background corpus by means of querying. This task shall be supported by automatic assistive features, e.g. interactively selecting relevant units and automatically finding similar units (i.e. getting more samples from a seed set) or reducing a large result set to a smaller but diverse result set by means of clustering similar results. The query mechanism to be used shall be able to incorporate information from the knowledge base (see knowledge management below).

Knowledge Management

This describes the ability to model knowlegde to be extracted from or connected to text. The knowledge base enables the creation of cross-document relations. We aim specifically at structured knowledge – i.e. not just flat or hierarchical tagsets, but rather entity classes and entities that may have properties and may also be linked to each other.

Text Annotation

This is the ability to perform text-level annotations. This task interacts closely with the knowledge base (see above) in terms of anchoring statements about entities and their properties in the text, i.e. to provide textual evidence for statements about entities that are already in the knowledge base or to derive statements about entities from the text.

Synergies and Assistive Support

For all of the three of the steps above, we intend to include assistive mechanisms (usually based on machine learning), e.g. to help in classification/clustering during the subcorporation step, to suggest applicable annotations in the text annotation mode, or e.g. to detect redundantly defined knowledge statements and suggest that users might link/merge them. This is an indicative list which is to be refined during the course of the project in accordance with user needs.

People

  • Dr. Richard Eckart de Castilho
  • Prof. Dr. Iryna Gurevych
  • Jan-Christoph Klie
  • Ute Winchenbach

Former Project Members

  • Beto Boullosa
  • Michael Bugert
  • Naveen Kumar

Downloads

Funding

INCEpTION is funded by the German Research Foundation under grant № EC 503/1-1 and GU 798/21-1.

Publications

Schulz, Claudia ; Sailer, Michael ; Kiesewetter, Jan ; Bauer, Elisabeth ; Fischer, Frank ; Fischer, Martin R. ; Gurevych, Iryna (2018):
Automatic Recommendations for Data Coding: a use case from medical and teacher education.
In: Proceedings of the 14th eScience IEEE International Conference, Amsterdam, Netherlands, 29.10.2018--01.11.2018, DOI: 10.1109/eScience.2018.00100,
[Online-Edition: https://fileserver.ukp.informatik.tu-darmstadt.de/UKP_Webpag...],
[Konferenzveröffentlichung]

Eckart de Castilho, Richard ; Klie, Jan-Christoph ; Kumar, Naveen ; Boullosa, Beto ; Gurevych, Iryna (2018):
INCEpTION - Corpus-based Data Science from Scratch.
In: Digital Infrastructures for Research (DI4R) 2018, Lisbon, Portugal, 9-11 October 2018, [Online-Edition: https://fileserver.ukp.informatik.tu-darmstadt.de/UKP_Webpag...],
[Konferenzveröffentlichung]

Eckart de Castilho, Richard ; Klie, Jan-Christoph ; Kumar, Naveen ; Boullosa, Beto ; Gurevych, Iryna (2018):
Linking Text and Knowledge using the INCEpTION annotation platform.
In: Proceedings of the 14th eScience IEEE International Conference, In: The 14th eScience IEEE International Conference, Amsterdam, Netherlands, 29.10.2018--01.11.2018, DOI: 10.1109/eScience.2018.00077,
[Online-Edition: https://ieeexplore.ieee.org/document/8588696],
[Konferenzveröffentlichung]

Boullosa, Beto ; Eckart de Castilho, Richard ; Kumar, Naveen ; Klie, Jan-Christoph ; Gurevych, Iryna (2018):
Integrating Knowledge-Supported Search into the INCEpTION Annotation Platform.
In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, In: The 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31.10.2018--04.11.2018, Demo Papers, [Online-Edition: http://www.aclweb.org/anthology/D18-2022],
[Konferenzveröffentlichung]

Klie, Jan-Christoph (2018):
INCEpTION: Interactive Machine-assisted Annotation.
In: Proceedings of the First Biennial Conference on Design of Experimental Search & Information Retrieval Systems, In: First Biennial Conference on Design of Experimental Search & Information Retrieval Systems, Bertinoro, Italy, 28.08.2018--31.08.2018, [Online-Edition: http://ceur-ws.org/Vol-2167/short8.pdf],
[Konferenzveröffentlichung]

Klie, Jan-Christoph ; Bugert, Michael ; Boullosa, Beto ; Eckart de Castilho, Richard ; Gurevych, Iryna (2018):
The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation.
In: Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, Association for Computational Linguistics, In: The 27th International Conference on Computational Linguistics (COLING 2018), Santa Fe, USA, 20.08.2018--26.08.2018, [Online-Edition: http://aclweb.org/anthology/C18-2002],
[Konferenzveröffentlichung]

Boullosa, Beto ; Eckart de Castilho, Richard ; Geyken, Alexander ; Lemnitzer, Lothar ; Gurevych, Iryna (2017):
A tool for extracting sense-disambiguated example sentences through user feedback.
In: Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Valencia, Spain, [Online-Edition: http://aclweb.org/anthology/E17-3018],
[Konferenzveröffentlichung]

go to TU-biblio search on ULB website