Guiding Theme A1: Entity Linking/Cross-document Coreference Resolution
The task of entity linking disambiguates mentions of entities by connecting them to concepts in a knowledge base which is derived from Wikipedia (such as Yago or DBpedia). The related task of cross-document coreference resolution does not link mentions to concepts but identifies whether mentions in different documents refer to the same entity. This guiding theme addresses entity linking and cross-document coreference resolution together, since the tasks are related and may support each other. Different projects aim to solve them via graph- or network-based methods or by applying advanced machine and deep learning techniques for jointly modeling both tasks.
Research results of the first Ph.D. cohort
The work of the first phase on guiding theme A1 deals with entity linking (EL), the task of linking mentions of entities (“who”, “when”, “where”), to their corresponding entry in a knowledge base which is derived from Wikipedia. EL is a complex task that requires analyzing different kinds of textual information: entity mentions, their local context, and their global context (Heinzerling et al., 2015). To better understand the errors our initial system and the state of the art make, we implemented a tool for visual error analysis (Heinzerling and Strube, 2015). Based on this analysis, we developed models that aim to better exploit each of these types of information: using subword units to better understand entity mentions (Heinzerling and Strube, 2018), modeling local context via selectional preferences (Heinzerling et al., 2017a), and modeling global context with geographic and temporal coherence (Heinzerling et al., 2017b).
Ongoing project of the 2nd Ph.D. cohort
For the second phase, we consider essential to cope with other languages than English given the vast amount of texts written in different tongues. Therefore, a thorough analysis of how several methods can be extended on a multilingual setup is part of this research. The expected outcome of this investigation is the development of a system that can perform concept clustering and disambiguation on documents from heterogeneous sources, genres and languages. The approach will be evaluated on multiple data sets originating from different sources (e.g. news, web) and across multiple languages.
Moreover, fine-grained typing has gained relevance given its importance in context-sensitive and entity-focused downstream tasks such as relation extraction, coreference resolution and question answering. This subtask is closely related to Entity Linking, and the improvements could have a remarkable impact on our main goal. For these reasons, this line of research will also be explored by means of neural network models together with hierarchical type embeddings.
Within the AIPHES group, multiple collaborations arise as well. Entity linking can improve the identification of complex event structures in discourse as well as studying different aspects for opinion and sentiment analysis, which are the goals of the guiding themes A2 and A3 respectively. Since documents are redundant at the entity level, this task might as well serve as a preparatory step for multi-document summarization, by recognizing document overlap, or by the introduction of lexical semantic chains, which are topics that concern Area B.
- PI: Prof. Dr. Michael Strube
- First Cohort PhD student: Benjamin Heinzerling
- Second Cohort PhD student: Federico López
- Angela Fahrni and Michael Strube (2014). A latent variable model for discourse-aware concept and entity disambiguation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, 26-30 April 2014, pages 491-500.
- Angela Fahrni, Benjamin Heinzerling, Thierry Göckel, and Michael Strube (2014). HITS' monolingual and cross-lingual entity linking system at TAC. In Proceedings of the Text Analysis Conference, National, Institute of Standards and Technology, Gaithersburg, Maryland, USA, 18-19, November 2013.
Zopf, Markus ; Botschen, Teresa ; Falke, Tobias ; Heinzerling, Benjamin ; Marasovic, Ana ; Mihaylov, Todor ; P. V. S., Avinesh ; Loza Mencía, Eneldo ; Fürnkranz, Johannes ; Frank, Anette (2018):
What's Important in a Text? An Extensive Evaluation of Linguistic Annotations for Summarization.
Heinzerling, Benjamin ; Moosavi, Nafise Sadat ; Strube, Michael (2017):
Revisiting Selectional Preferences for Coreference Resolution.
In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, [Online-Edition: http://aclweb.org/anthology/D17-1138],
Heinzerling, Benjamin ; Strube, Michael (2017):
BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages.
In: CoRR, [Online-Edition: http://www.lrec-conf.org/proceedings/lrec2018/pdf/1049.pdf],
Heinzerling, Benjamin ; Strube, Michael ; Lin, Chin-Yew (2017):
Trust, but verify! Better entity linking through automatic verification.
In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, 3--7 April 2017, [Online-Edition: http://aclweb.org/anthology/E17-1078],
Heinzerling, Benjamin ; Strube, Michael (2015):
HITS at TAC KBP 2015: Entity discovery and linking, and event nugget detection.
In: Proceedings of the Text Analysis Conference, [Online-Edition: https://tac.nist.gov/publications/2015/participant.papers/TA...],
Heinzerling, Benjamin ; Strube, Michael (2015):
Visual Error Analysis for Entity Linking.
In: Proceedings of ACL-IJCNLP 2015 System Demonstrations, [Online-Edition: http://www.aclweb.org/anthology/P15-4007],