Current Projects

The UKP lab is active in many research projects funded by various agencies such as the German Research Foundation, the Federal Ministry of Education and Research, the Hessian Ministry of Higher Education, Research, Science and the Arts, and the European Union.

AI Safety

AI, safe for human beings. The UKP Lab is pioneering research on AI Safety under the umbrella of the National Research Center for Applied Cybersecurity ATHENE. While generative AI has revolutionized how we consume and interact with information, huge safety risks emerge. UKP develops novel solutions to make Generative AI safe and compatible with societal values.

Learn more

ATHENE AVSV: Code Transformers and Knowledge Graphs for Vulnerability Detection

In the realm of software development, the latest code generation systems emerged as a game-changing technique that aims to revolutionize how we write software. Developers worldwide leverage code generation systems, such as Copilot, to accelerate coding.

Learn more

ATHENE REVISE: Safeguarding LLMs against Misleading Evidence Attacks (SafeLLMs)

In retrieval-augmented LLMs, attackers can introduce technically true but misleading tables, text, charts, or documents to manipulate the model into generating incorrect or biased responses. This project addresses the problem by detecting and correcting deceptive patterns in textual, tabular and visual data that can mislead humans and models alike. We further develop robust retrieval methods to find complementary evidence, helping LLMs generate more accurate and trustworthy answers and making them robust against cherry-picked evidence.

Learn more

ATHENE SenPAI: Security in Large Language Models

Attacks on large language models (LLMs) pose a substantial risk to the safe use of machine learning in production environments and the general acceptance of artificial intelligence as a safe technology. Thus, this project focuses on analyzing the potential attacks and proposing defenses against them. In particular, we aim to investigate the most prominent security threats of LLMs: prompt injection, backdoors, and private data leaks.

Learn more

ATHENE SenPAI: Protecting Private and Sensitive Information in Texts

This project aims to privatize texts with formal guarantees, using differential privacy (DP), while simultaneously preserving their utility for research. This would allow the scientific community to analyze texts in any form without breaching the privacy of their creators.

Learn more

CEDIFOR

CEDIFOR (Centre for the Digital Foundation of Research in the Humanities, Social, and Educational Sciences) is a Digital Humanities Centre, established in 2014. We intend to contribute to bridging the gap between research in the Humanities and computer based methods, and help researchers to master the characteristic problems in this process. We provide methodological expertise for advising researchers from the Humanities, Social, and Educational Sciences on adopting computer based methods in their research.

Learn more

emergenCITY

In 2050, roughly two-thirds of the world population are expected to live in urban areas. The sustainable growth in number and size of cities is only possible due to gains in efficiency in (critical) infrastructures such as energy, transportation, logistics, and water.

Learn more

ERC Advanced Grant: InterText

Natural language processing (NLP) fails to support the analysis of fine-grained relationships between texts – intertextual relationships. This is a crucial milestone for AI as it would allow analysing the origin and evolution of texts and ideas, and enable new applications of AI to text-based collaboration, from education to business. Funded by the European Research Council, the InterText project is developing the first-ever framework for exploring intertextuality in NLP. InterText will develop conceptual and applied models and data sets for the study of inline commentary, implicit linking and document versioning. The models will be evaluated in two case studies involving academic peer review and conspiracy theory debunking.

Learn more

Evidence

Dictionaries are an essential resource in many domains of research, education, and natural language processing (NLP). One crucial part of dictionaries are example sentences which illustrate real-world use cases of a lemma. However, finding good example sentences in large corpora imposes a heavy workload on lexicographers. In this project, we develop a novel system which eases the work of lexicographers by interactively assessing the goodness and diversity of dictionary examples.

Learn more

GeMTeX – German Medical Text Corpus

Creating the largest medical text corpus in the German language.

Learn more

INCEpTION

Towards an Infrastructure for the Distributed Exploration and Annotation of Large Corpora and Knowledge Bases

Learn more

KoPoCoV

The formation of opinion on the measures to be taken to combat the COVID 19 pandemic is similar to that in previous crises (e.g., the "refugee crisis" in 2015): Politicians, mass media and citizens quickly reach a consensus on the measures to be taken under the impression of an impending crisis. However, this consensus increasingly dissolves as the crisis progresses. This leads to a polarization of society and makes it much more difficult for those responsible to solve the problem. The aim is to investigate the opinions of these actors on the measures to combat the COVID-19 pandemic in an interdisciplinary collaboration with communication scholars at JGU Mainz.

Learn more

NLP for Mental Health

We work at the intersection of natural language processing (NLP) and mental health. We collaborate with healthcare experts to develop state of the art data-driven solutions in the mental health domain for both support-seekers and support-givers. Using NLP, our goal is to facilitate early detection, diagnosis and intervention, addressing the growing need for mental health support amid the constrained availability of mental health professionals.

Learn more

Picture: cromaconceptovisual / Pixabay

PEER

Peer review is the core of the modern academic quality control. Reviewing scientific manuscripts requires effort and expertise, and growing publication rates across research fields make traditional essay-based modes of peer reviewing hard to sustain. The PEER project investigates document-centered and machine-assisted alternatives to traditional peer review.

Learn more

SERMAS: Soccially-acceptable Extended Reality Models And Systems

The revolutionary opportunities opened by eXtended Reality (XR) technologies will only materialize if concepts, techniques, and tools are provisioned to ensure the social acceptance of XR systems; by this we mean that the XR system should not be just innovative and functionally complex, but also provide an experience that: satisfies the goals and needs of the user, is in compliance with the social context in which the system is being used, and is transparent, safe, secure, explainable and is trusted by the user.

Learn more

SQuARE

Automatic question answering (QA) facilitates extraction and identification of relevant knowledge in large data sources that would otherwise be hard to find for humans. Furthermore, numerous other natural language processing (NLP) tasks can be formulated within the scope of a QA framework, which positions QA as one of the most prominent NLP tasks. Due to the rapid progress in the field, the researchers are confronted with situations where state-of-the-art models are outdated just a few months after they have been published. In this project we aim to provide researchers with an extensible QA platform to explore, compare and combine state-of-the-art QA approaches, as well as to aid developing novel approaches by standardizing access to available data and model sources.

Learn more