Guiding Theme C3: Deep Learning embeddings for adaptive language processing

Guiding Theme C3: Deep Learning embeddings for adaptive language processing

The focus of this guiding theme are deep learning embeddings, low-dimensional continuous representations of inputs, e.g. of words and phrases, that are obtained using deep neural networks. One of their benefits is that they are able to capture both semantic and syntactic regularities of words and phrases in a compact way. Embeddings are particularly useful as features for adaptive language processing, because they can be learned in an unsupervised fashion from large amounts of domain-specific text data, as well as from lexical knowledge bases such as WordNet. Moreover, since embeddings can be derived also for other modalities, such as for images, they enable linking up data from different modalities, for example images and text.

Research results of the first Ph.D. cohort

Guiding theme C3 aims at developing and examining deep learning embeddings for semantic language processing. In the context of AHIPHES, it is of interest to develop language processing tools which are adaptive to different sources and contexts. Within such processing, features are needed – which traditionally are hand-crafted in an expensive way or more recently are prepared unsupervisedly on huge sources of text or knowledge bases or even images.

In particular, we focus on two steps involving embeddings: first, learning embeddings for semantic tasks while learning them jointly from text sources and from knowledge bases and also from visual data. Second, applying these embeddings for semantic role labeling (SRL) and further applications as in summarization systems. We chose the main challenge in FrameNet SRL, Frame Identification, as the task to study the quality of embeddings for language understanding. We developed a frame identification system based on word representations (Hartmann et al., 2017). This system proved to be on par with state-of-the-art systems while outperforming them on out-of-domain data. Next, we extended our basic frame identification system towards multimodal scenarios (Botschen et al., 2018) to build upon insights from human language understanding which is grounded in several modalities. Furthermore, we evaluate our multimodal frame identification approach on multilingual data in order to investigate the strengths in different application scenarios. We found that multimodal (textual and visual) embeddings enhance the performance for English data, achieving new state-of-the-art results. Moreover, we worked on different embedding methods with respect to the task of Knowledge Base Completion (Botschen et al., 2017) to study a different task of representing knowledge. Concerning the case of FrameNet, we asked whether the relations between two frames are reflected in textual embeddings for frames and whether textual or knowledge-based embeddings are most helpful for predicting new relation between frames. Here, we found a clear advantage of knowledge-based embeddings.

Ongoing project of the 2nd Ph.D. cohort

One of the open challenges in multi-modal representation learning (for NLP tasks) is the combination of complementary information and the selection of relevant information from different modalities (Botschen et al., 2018). In the first phase, incorporating visual information was found to be beneficial for the tasks of Frame Identification and Knowledge Base Completion.

In the second phase, we focus on developing more powerful and meaningful multi-modal representations of images and texts. We aim at developing methods for such joint representations by learning the semantic similarity between the two modalities. Joint visual-semantic embeddings are learnt by mapping the representations of each modality to a shared low-dimensional space with deep neural networks. The quality of joint representations is crucial to the performance of multi-modal models, which find application in tasks such as cross-modal retrieval and image captioning.

We aim to address the following key challenges for learning robust multi-modal representations with the goal of obtaining embeddings that can generalize across tasks and datasets: (i) Identifying and developing supervised methods that leverage paired cross-domain information for robust joint representations, and (ii) combining supervised and unsupervised learning for modeling joint representations from limited paired multi-modal data.

The multi-modal embeddings obtained from the methods we develop can serve as input to the preference learning methods investigated in guiding theme C2 for improving constraint-based ranking. The developed embeddings can, moreover, be combined with the knowledge-based signals from guiding theme A2 for event analysis.


  • PI (Second Cohort): Prof. Dr. Stefan Roth
  • PI (First Cohort): Prof. Dr. Iryna Gurevych
  • First Cohort PhD student: Teresa Botschen
  • Second Cohort PhD student: Shweta Mahajan



Zopf, Markus ; Botschen, Teresa ; Falke, Tobias ; Heinzerling, Benjamin ; Marasovic, Ana ; Mihaylov, Todor ; P. V. S., Avinesh ; Loza Mencía, Eneldo ; Fürnkranz, Johannes ; Frank, Anette (2018):
What's Important in a Text? An Extensive Evaluation of Linguistic Annotations for Summarization.
S. 272-277, [Konferenzveröffentlichung]

Botschen, Teresa ; Sorokin, Daniil ; Gurevych, Iryna (2018):
Frame- and Entity-Based Knowledge for Common-Sense Argumentative Reasoning.
Short Papers, In: Proceedings of the 5th Workshop on Argument Mining held in conjunction with EMNLP 2018, S. 90-96,
Brussels, Belgium, 31.10.2018, [Konferenzveröffentlichung]

Botschen, Teresa ; Beinborn, Lisa ; Gurevych, Iryna (2018):
Multimodal Grounding for Language Processing.
In: Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), S. 2325-2339,
The 27th International Conference on Computational Linguistics (COLING 2018), Santa Fe, USA, 20.08.2018--26.08.2018, [Konferenzveröffentlichung]

Botschen, Teresa ; Gurevych, Iryna ; Klie, Jan-Christoph ; Sergieh, Hatem Mousselly ; Roth, Stefan (2018):
Multimodal Frame Identification with Multilingual Evaluation.
In: Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, S. 1481-1491,
Association for Computational Linguistics, New Orleans, USA, [Konferenzveröffentlichung]

Botschen, Teresa ; Mousselly-Sergieh, Hatem ; Gurevych, Iryna (2017):
Experimental study of multimodal representations for Frame Identification - How to find the right multimodal representations for this task?
In: Language-Learning-Logic Workshop (3L 2017),
London, UK, [Konferenzveröffentlichung]

Peyrard, Maxime ; Botschen, Teresa ; Gurevych, Iryna (2017):
Learning to Score System Summaries for Better Content Selection Evaluation.
In: Proceedings of the EMNLP workshop "New Frontiers in Summarization", S. 74-84,
Association for Computational Linguistics, Copenhagen, Denmark, September 2017, [Konferenzveröffentlichung]

Botschen, Teresa ; Mousselly-Sergieh, Hatem ; Gurevych, Iryna (2017):
Prediction of Frame-to-Frame Relations in the FrameNet Hierarchy with Frame Embeddings.
In: Proceedings of th 2nd Workshop on Representation Learning for NLP (RepL4NLP, held in conjunction with ACL 2017), S. 146-156,
Vancouver, Canada, [Konferenzveröffentlichung]

Hartmann, Silvana ; Kuznetsov, Ilia ; Martin, Teresa ; Gurevych, Iryna (2017):
Out-of-domain FrameNet Semantic Role Labeling.
In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), S. 471-482,
Association for Computational Linguistics, Valencia, Spain, [Konferenzveröffentlichung]

Martin, Teresa ; Botschen, Fiete ; Nagesh, Ajay ; McCallum, Andrew (2016):
Call for Discussion: Building a New Standard Dataset for Relation Extraction Tasks.
In: Proceedings of the 5th Workshop on Automated Knowledge Base Construction (AKBC) 2016 held in conjunction with NAACL 2016, S. 92-96,
San Diego, [Konferenzveröffentlichung]

go to TU-biblio search on ULB website