Deep Learning in NLP

Language Technology for Digital Humanities

Overview

Under the heading of Language Technology for Digital Humanities, UKP Lab conducts projects at the boundary between Natural Language Processing, Computer Science on the one hand, and Humanities, Social Sciences, and Educational Research on the other hand. In particular, we work on making digital analysis methods more accessible to text-based humanities, implement tools to explore and annotate text corpora, and contribute to the infrastructures supporting Digital Humanities. Our research interests in this area include:

  • Creating user-friendly tools to explore and annotate text corpora
  • Analyzing corpora at the semantic level, e.g. opinion mining or identifying metaphoric language
  • Processing and analyzing historical texts
  • Interoperability with Digital Humanities infrastructures such as DARIAH and CLARIN

Current Projects

  • CEDIFOR: In this context , we aim to foster interdisciplinary work between Computer Science and Digital Humanities by providing know-how and research infrastructures for text analytics to humanities researchers in the Rhein-Main area, supporting them in their investigation of novel research questions. This project is conducted in collaboration with the Goethe University Frankfurt am Main and the German Institute for International Educational Research (DIPF).
  • CLARIN F-AG7 KP 3: In association with the CLARIN project, we are building the flexible, web-based annotation tool WebAnno and apply it to the annotation of non-standard varieties of German at the semantic level. This work is done in collaboration with the Language Technology Group in Darmstadt and with researchers from the University of Heidelberg.
  • DKPro: At UKP, we believe in supporting reproducible NLP research through re-usable and freely available software components. To this end, UKP created the award-winning DKPro repository of open-source software covering many aspects of NLP from pre-processing, lexical resource, machine-learning, to semantic analysis. As DKPro is growing and gaining popularity, it now starts evolving into a community project in which UKP collaborates e.g. with researchers from the University of Duisburg-Essen.
  • OpenMinTeD: OpenMinTeD aspires to enable the creation of an infrastructure that fosters and facilitates the discovery and use of text mining technologies and interoperable services. It examines several use cases identified by experts from different scientific areas, ranging from generic scholarly communication to literature related to life sciences, food and agriculture, and social sciences and humanities.
  • Processing of audiovisual content: The amount of audiovisual content is constantly increasing, specially in the educational domain, making tasks like transcription and visual analysis a very cumbersome activity for humanistic researchers. This project aims to create technology which facilitates the integration of manual and automatic analysis of audiovisual content.

Past Projects

  • CLARIN F-AG7 KP 1: In association with the CLARIN project, we developed the flexible web-based annotation tool WebAnno. The tool supports visual annotation of multiple linguistic layers, including custom defined layers. It is interoperable with CLARIN infrastructures such as WebLicht. The tool has been developed in closed cooperation with the CLARIN F-AG7 KP 2 project, which defines “best practices” for linguistic annotation on several language layers for different annotator status groups. This work has been done in collaboration with the Language Technology Group in Darmstadt.
  • DARIAH-DE I: The mission of the EU-ESFRI-Project DARIAH-EU is to enhance and support digitally-enabled research across the arts and humanities. In the first phase of the German contribution DARIAH-DE, UKP investigated possibilities of using the emerging DARIAH infrastructure by means of the use-case of setting up a digital archive and by means of integrating DARIAH and TextGrid services.
  • DARIAH-DE II: The mission of the EU-ESFRI-Project DARIAH-EU is to enhance and support digitally-enabled research across the arts and humanities. In the context of the second phase of the German contribution DARIAH-DE, UKP collaborates closely with researchers from the Julius Maximilians University of Würzburg to automatically detect and analyze narrative structures in German. These techniques are applied to a corpus of around 2.000 novels, which were written over the last centuries.
  • LOEWE Research Center “Digital Humanities” TP 2.2 “Text as an Instance”In this project, UKP collaborated very closely with linguists and computational linguists on the comparative analysis of non-canonical grammatical constructions in German and English. Due to the infrequence and ambiguity of such constructions, a dedicated analysis process and supporting tools needed to be developed for annotation. The result of this is the CSniper annotation tool that combines collaborative search and annotation into a user-friendly tool. This project has been conducted with researchers from the Department of Linguistics and Literature in Darmstadt as well as from the Goethe University in Frankfurt am Main.
  • LOEWE Research Center “Digital Humanities” TP 2.3 “Text as a Process”: This project analyzed the linguistic properties of collaboratively created text in the Web 2.0. For more details, please refer to the respective section in the Text Analytics area description.
  • Welt der Kinder: The digital humanities project “Welt der Kinder” is designed as a test model for future similar projects in historical sciences. By very close cooperation between historians, information scientists, and computer scientists, it aims to gain new insights about the way the world was conveyed to children in a period from 1850 until 1918 – a time in of accelerated production of knowledge that was equally dominated by globalization and nationalisation.

Completed PhD Theses

  • Dr. Oliver Ferschke
  • The Quality of Content in Open Online Collaboration Platforms: Approaches to NLP-supported Information Quality Management in Wikipedia
  • Technische Universität Darmstadt, 2014.
  • Reviewer: Prof. Dr. Iryna Gurevych
  • Co-reviewers: Prof. Dr. Hinrich Schütze (LMU München), Assoc. Prof. Carolyn P. Rosé (CMU Pittsburgh)
  • tubiblio.ulb.tu-darmstadt.de/65952/

  • Dr. Richard Eckart de Castilho
  • Natural Language Processing: Integration of Automatic and Manual Analysis
  • Technische Universität Darmstadt, 2014.
  • Reviewer: Prof. Dr. Iryna Gurevych
  • Co-reviewers: Prof. Dr. Andreas Henrich (Otto-Friedrich-Universität Bamberg), Prof. Christopher D. Manning, PhD. (Stanford University)
  • tubiblio.ulb.tu-darmstadt.de/67629/

  • Dr. Johannes Daxenberger
  • The Writing Process in Online Mass Collaboration: NLP-Supported Approaches to Analyzing Collaborative Revision and User Interaction
  • Technische Universität Darmstadt, 2016.
  • Reviewer: Prof. Dr. Iryna Gurevych
  • Co-reviewers: Karsten Weihe (TU Darmstadt) and Ofer Arazy (University of Alberta)
  • tubiblio.ulb.tu-darmstadt.de/77229/

Software

Coordinator

  • Dr. Richard Eckart de Castilho