Language Technology for Digital Humanities

Lexical-Semantic Resources and Algorithms


Lexical-Semantic Resources and Algorithms is concerned with the analysis, design, and application of lexical-semantic resources (LSRs) for natural language processing. At the core of our work is a multi-year effort of developing the large-scale sense-linked unified resource UBY. UBY contains multiple expert-built and collaboratively constructed LSRs for English and German. Moreover, we are also interested in UBY’s applications to semantic processing tasks, such as Word Sense Disambiguation or Semantic Role Labeling, and in end user applications like Question Answering. Another important topic is utilizing LSRs in the domain of Digital Humanities.

Current Projects

  • UBY: UBY is a large-scale lexical-semantic resource based on the ISO standard Lexical Markup Framework (LMF), combining a wide range of information from expert-constructed and collaboratively constructed resources for English and German. It is further developed as part of CEDIFOR, in cooperation with the research area Text Mining & Analytics. Most UBY related software are available as open source on Github.
  • QA-EduInf: Community-based Question Answering for Educational Information: The project aims at using natural language processing techniques to analyze educational information and answer user questions on various educational topics. Since a large portion of users' questions have already been asked by other people in community question answering forums and answered by educational experts or crowds, we use the available question and answer archives to answer these questions and minimize human effort in searching through educational information. The project consists of different components including question classification, question and answer retrieval, answer quality assessment, and summarization.
  • Information Consolidation: A New Paradigm in Knowledge Search (DIP Project): The DIP project – an international cooperation with Bar-Ilan University and Israel Institute of Technology – aims at the next big step in information access technology. The goal is to support users in identifying and assimilating the large set of relevant statements found within multitudes of documents which are usually retrieved by the current search technologies. Novel methods for statement extraction, information consolidation, and inferring relations represent the core research areas within this project.

Past Projects

  • Educational Web 2.0 (EduWeb): In the EduWeb project, we seek to implement our vision of technology enhanced education of the 21st century. A vast amount of content is produced by many people every day, but despite their interconnection through the World Wide Web, their efforts are often isolated from each other. To overcome this problem, the UKP Lab will provide and explore new algorithms to simplify tedious, recurring tasks as well as improving the coordination with the community.
  • Integrating Collaborative and Linguistic Resource for Word Sense Disambiguation and Semantic Role Labelin (InCoRe): In the InCoRe project, we address the lack of coverage typically associated with lexical semantic resources. The major goal of this project is the integration of various expert-built and collaboratively created lexical semantic resources to a large-scale resource of unprecedented coverage and quality. The second major goal of InCoRe is to scale natural language processing technologies utilizing lexical semantic resources, specifically word sense disambiguation and semantic role labeling, to real-life applications based on the developed resource.
  • LOEWE Digital Humanities: This project deals with the analysis of contemporary corpora. At UKP, we are particularly researching the development and application of the linked lexical resource UBY in the context of humanities applications requiring structured semantic knowledge.
  • QA-EL: The project investigates novel applications of dynamic lexical-semantic resources (such as Wikipedia and other Web 2.0 sources) for information search in eLearning.
  • Semantic Information Retrieval 1,2 (SIR): This project systematically investigates the possible usage of semantic and lexical relationships between words or concepts for improving the information retrieval process. The main focus is on semantic relatedness measures using different knowledge sources (e.g. WordNet, GermaNet, or Wikipedia).

Data and Tools


  • UBY – the resource: Database dumps and related data
  • DKPro Uby: A Java framework for creating and accessing sense-linked lexical resources in accordance with the UBY-LMF lexicon model

Sense Alignments contained in UBY

Other Resources and Datasets

Other APIs and Tools

  • JOWKL: A Java-API for accessing the resource OmegaWiki
  • JWPL A Java-API for accessing Wikipedia, as well as its revision history
  • JWKTL: A Java-API for accessing Wiktionary
  • DKPro-WSD: An open-source library for performing Word Sense Disambiguation

Completed PhD Theses