INCEpTION

(Funding Period: 2017 - 2022)

Towards an Infrastructure for the Distributed Exploration and Annotation of Large Corpora and Knowledge Bases

Motivation

The annotation of specific semantic phenomena often require compiling task-specific corpora and creating or extending task-specific knowledge bases. Presently, researchers require a broad range of skills and tools to address such semantic annotation tasks.

In the recently funded INCEpTION project, UKP Lab at TU Darmstadt aims towards building an annotation platform that incorporates all the related tasks into a joint web-based platform. The following sections briefly outline the involved tasks.

Corpus Extraction

This is the act of extracting a task-specific corpus from a larger background corpus by means of querying. This task shall be supported by automatic assistive features, e.g. interactively selecting relevant units and automatically finding similar units (i.e. getting more samples from a seed set) or reducing a large result set to a smaller but diverse result set by means of clustering similar results. The query mechanism to be used shall be able to incorporate information from the knowledge base (see knowledge management below).

Knowledge Management

This describes the ability to model knowlegde to be extracted from or connected to text. The knowledge base enables the creation of cross-document relations. We aim specifically at structured knowledge – i.e. not just flat or hierarchical tagsets, but rather entity classes and entities that may have properties and may also be linked to each other.

Text Annotation

This is the ability to perform text-level annotations. This task interacts closely with the knowledge base (see above) in terms of anchoring statements about entities and their properties in the text, i.e. to provide textual evidence for statements about entities that are already in the knowledge base or to derive statements about entities from the text.

Synergies and Assistive Support

For all of the three of the steps above, we intend to include assistive mechanisms (usually based on machine learning), e.g. to help in classification/clustering during the subcorporation step, to suggest applicable annotations in the text annotation mode, or e.g. to detect redundantly defined knowledge statements and suggest that users might link/merge them. This is an indicative list which is to be refined during the course of the project in accordance with user needs.

People

  • Dr. Richard Eckart de Castilho
  • Prof. Dr. Iryna Gurevych
  • Jan-Christoph Klie
  • Ute Winchenbach

Former Project Members

  • Beto Boullosa
  • Michael Bugert
  • Naveen Kumar

Funding

INCEpTION is funded by the German Research Foundation under grant № EC 503/1-1 and GU 798/21-1.

Publications

Scheunemann, Christoph ; Naumann, Julian ; Eichler, Max ; Stowe, Kevin ; Gurevych, Iryna (2020):
Data Collection and Annotation Pipeline for Social Good Projects.
AI for Social Good - AAAI Fall Symposium 2020, virtual Conference, 13.-14.11.2020, [Conference or Workshop Item]

Wu, Mingzhu ; Moosavi, Nafise Sadat ; Rücklé, Andreas ; Gurevych, Iryna (2020):
Improving QA Generalization by Concurrent Modeling of Multiple Biases.
In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 839-853,
Association for Computational Linguistics, Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), virtual Conference, 16.-20.11., DOI: 10.18653/v1/2020.findings-emnlp.74,
[Conference or Workshop Item]

Rücklé, Andreas ; Pfeiffer, Jonas ; Gurevych, Iryna (2020):
MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale.
In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2471-2486,
Association for Computational Linguistics, 2020 Conference on Empirical Methods in Natural Language Processing, virtual Conference, 16.-20.11.2020, ISBN 978-1-952148-60-6,
DOI: 10.18653/v1/2020.emnlp-main.194,
[Conference or Workshop Item]

Klie, Jan-Christoph ; Eckart de Castilho, Richard ; Gurevych, Iryna (2020):
From Zero to Hero: Human-In-The-Loop Entity Linking in Low Resource Domains.
pp. 6982-6993, The 58th annual meeting of the Association for Computational Linguistics (ACL 2020), virtual Conference, 05.-10.07.2020, [Conference or Workshop Item]

Eckart de Castilho, Richard ; Ide, Nancy ; Kim, Jin-Dong ; Klie, Jan-Christoph ; Suderman, Keith (2019):
Towards cross-platform interoperability for machine-assisted annotation.
In: Genomics & Informatics, 17 (2), pp. e19.. Genomics Inform, DOI: 10.5808/GI.2019.17.2.e19,
[Article]

Schulz, Claudia ; Sailer, Michael ; Kiesewetter, Jan ; Bauer, Elisabeth ; Fischer, Frank ; Fischer, Martin R. ; Gurevych, Iryna (2018):
Automatic Recommendations for Data Coding: a use case from medical and teacher education.
In: Proceedings of the 14th eScience IEEE International Conference, pp. 364-365,
Amsterdam, Netherlands, 29.10.2018--01.11.2018, DOI: 10.1109/eScience.2018.00100,
[Conference or Workshop Item]

Eckart de Castilho, Richard ; Klie, Jan-Christoph ; Kumar, Naveen ; Boullosa, Beto ; Gurevych, Iryna (2018):
INCEpTION - Corpus-based Data Science from Scratch.
Digital Infrastructures for Research (DI4R) 2018, Lisbon, Portugal, 9-11 October 2018, [Conference or Workshop Item]

Eckart de Castilho, Richard ; Klie, Jan-Christoph ; Kumar, Naveen ; Boullosa, Beto ; Gurevych, Iryna (2018):
Linking Text and Knowledge using the INCEpTION annotation platform.
In: Proceedings of the 14th eScience IEEE International Conference, pp. 327-328,
The 14th eScience IEEE International Conference, Amsterdam, Netherlands, 29.10.2018--01.11.2018, DOI: 10.1109/eScience.2018.00077,
[Conference or Workshop Item]

Boullosa, Beto ; Eckart de Castilho, Richard ; Kumar, Naveen ; Klie, Jan-Christoph ; Gurevych, Iryna (2018):
Integrating Knowledge-Supported Search into the INCEpTION Annotation Platform.
Demo Papers, In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 127-132,
The 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31.10.2018--04.11.2018, [Conference or Workshop Item]

Klie, Jan-Christoph (2018):
INCEpTION: Interactive Machine-assisted Annotation.
In: Proceedings of the First Biennial Conference on Design of Experimental Search & Information Retrieval Systems, p. 105,
First Biennial Conference on Design of Experimental Search & Information Retrieval Systems, Bertinoro, Italy, 28.08.2018--31.08.2018, [Conference or Workshop Item]

Klie, Jan-Christoph ; Bugert, Michael ; Boullosa, Beto ; Eckart de Castilho, Richard ; Gurevych, Iryna (2018):
The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation.
In: Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pp. 5-9,
Association for Computational Linguistics, The 27th International Conference on Computational Linguistics (COLING 2018), Santa Fe, USA, 20.08.2018--26.08.2018, [Conference or Workshop Item]

Boullosa, Beto ; Eckart de Castilho, Richard ; Geyken, Alexander ; Lemnitzer, Lothar ; Gurevych, Iryna (2017):
A tool for extracting sense-disambiguated example sentences through user feedback.
In: Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp. 69-72,
Association for Computational Linguistics, Valencia, Spain, [Conference or Workshop Item]