Open Theses
  • Master Thesis

    Despite the increasing capacity of state-of-the-art dialogue models, such as GPT-4 or Vicuna, they still tend to lose context and generate harmful, biased or toxic data. Much work has been conducted recently to get a better understanding of such behavior and how to address it, which led to a variety of error taxonomies tailored to specific use cases. At UKP, we recently developed one of the first broadly applicable error taxonomies for dialogue systems, as well as a response type taxonomy to cover following user reactions. To continuously improve and extend our taxonomies, we are looking for a thesis student to develop a model-independent and automatic error and user reaction detection approach.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Dominic Petrak, M.Sc.

    Announcement as PDF

  • Bachelor Thesis

    Free-text human feedback, i.e., a user describing to the model what they did not like about the last generated utterance, is one of the most important sources of knowledge to iteratively improve language generation models, such as GPT or LLaMA, but datasets for research are scarce. For this reason, we used synthetic data generation methods to create a task-oriented dialogue dataset annotated with artificial free-text human feedback. However, since synthetic data is subject to various limitations, such as hallucinations or biases, we are looking for a thesis student to research and apply state-of-the-art data augmentation and cleansing methods to improve the quality of our data.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Dominic Petrak, M.Sc.

    Announcement as PDF

  • Bachelor Thesis, Master Thesis

    Most popular pretrained language models (PLMs) used in NLP can only process short inputs, limiting their use in many practical scenarios involving longer documents. To counteract this limitation, specialized architectures were developed to enable processing of longer sequences. Recently, short-text models such as SLED [1,2] which independently encode overlapping chunks of input text have been proposed to substitute long-text architectures. However, one major issue short-text models face is that individually-encoded chunks do not possess knowledge of their relative positioning in the document. The goal of this thesis is to extend the SLED approach by incorporating various methods of infusing structural information, such as relative positional encodings or section embeddings [3].

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Martin Tutek, PhD

    Announcement as PDF

  • Bachelor Thesis, Master Thesis

    Recently, dialog systems have seen an increase in popularity that coincides with the increasing success of neural language generators. However, many problems in language generation persist, such as the presence of hallucinated outputs that contain incorrect or unverifiable claims. One way of alleviating this issue is grounding the outputs in relevant knowledge [1], for example document bases [2], structured knowledge bases [3] or images [4]. The area offers plenty of open questions on models, decoding algorithms and evaluation methods.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Nico Daheim, M.Sc.

    Announcement as PDF

  • Bachelor Thesis, Master Thesis

    Anti-science arguments are a growing concern in today’s society. Vaccine hesitancy and climate change skepticism are some of the arguments that have promulgated rapidly. In this project, we aim to develop an efficient rebuttal mechanism to tackle these arguments. We would explore the idea of ‘attitude roots’ which have been studied by psychologists as the underlying worldview or opinions that help these anti-science arguments persist.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Sukannya Purkayastha, M.Sc.

    Announcement as PDF

  • Bachelor Thesis, Master Thesis

    Most recent state-of-the-art NLP models employing deep networks are inherently non-interpretable. Several explainability methods based on attentions and gradients have been used to make these models interpretable. However, these methods do not always correspond to human judgement and are often difficult to evaluate. In this thesis, our goal is to analyze the faithfulness of these explainability methods and develop a novel method for their evaluation such that it corresponds to human judgement.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Rachneet Sachdeva, M.Sc.

    Announcement as PDF

  • Bachelor Thesis, Master Thesis

    Measuring sentence similarity [1] is a classic topic in natural language processing (NLP). Semantic Textual Similarity (STS) [2] is a well-studied task that measures the equivalence of sentence pairs in terms of meaning by predicting similarity scores, while the idea of interpretable STS (iSTS) [3] is to explain why and how two sentences may be similar/different by supplementing STS with an explanatory text. Previous works on STS and iSTS analyze sentence pairs in an atomic fashion, without knowing the document-level context. The proposed thesis topic is based on the core idea that the meaning of a sentence should be defined by its contexts, and that the sentence similarity could be better determined and explained by taking contexts into consideration. This thesis will construct a document revision dataset containing alignments between sentences pairs with an alignment type and a similarity score. An iSTS system based on advanced sentence Transformer models such as [4] will be trained on this dataset which, given a pair of sentences and their corresponding contexts, explains what is similar and different, in the form of graded and typed sentence alignments. By systematic comparison of various systems with or without knowledge of document context, this thesis will answer the question of whether it is beneficial to measure sentence similarity in contexts.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Qian Ruan, M.Sc.

    Announcement as PDF

  • Bachelor Thesis, Master Thesis

    Unverified or false information is quickly propagated and amplified via social networks. If untrue, this information can lead to bad decisions and harmful consequences in the real world. To support human fact-checkers in the time-consuming verification process, numerous automated NLP fact-checking systems and datasets have been proposed. These systems typically compare textual claims with highly-credible evidence documents to infer the claim’s veracity. Recent works found problems with this setup when creating datasets with more realistic claims, because claims may not be specific enough, or because the available evidence may be inconclusive, to be verified.

    The goal of this thesis is to better understand the challenges when applying automated fact-checking on realistic claims, and to identify ways to close this gap.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Max Glockner, M.Sc.

    Announcement as PDF

  • Bachelor Thesis, Master Thesis

    Commonsense Question Answering (CommonsenseQA) is a task that aims to select the best answer to a question based on common sense. A recent study [1] shows that by first utilizing knowledge eliciting demonstrations and generating knowledge from large pre-trained language models (PLMs), incorporating generated knowledge into QA systems can significantly improve the ability of QA models to commonsense questions. However, this approach has the following drawbacks: 1) demonstration examples are hand-picked, 2) demonstrations are data-specific, 3) quality of generated knowledge and final model performance is significantly impacted by how demonstrations are created. The goal of this thesis is to explore better, dataset-agnostic ways to generate knowledge for CommonsenseQA using prompts.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Cecilia Liu, M.Sc.

    Announcement as PDF

  • Master Thesis

    Fine-tuning a pre-trained language model (PLM) on downstream tasks has been a standard approach in NLP. However, PLMs suffer from catastrophic forgetting when adapting to a sequence of tasks. In a real-world scenario, data is collected in a stream fashion – particularly in the multimodal (image and text) multilingual setting, where new visual concepts can emerge and new languages can be incorporated later on, which can result in catastrophic forgetting. In this thesis, we aim to 1) understand if cross-lingual multimodal retrieval suffers from catastrophic forgetting, 2) how the continual learning setting affects changes in representations, 3) if current continual learning methods alleviate catastrophic forgetting in cross-lingual multimodal retrieval, 4) how to design new methods to alleviate catastrophic forgetting in the problem setup.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Cecilia Liu, M.Sc.

    Announcement as PDF

  • Master Thesis

    Peer review is the core mechanism of scientific quality control: a study is evaluated by multiple anonymous researchers – peers – who independently decide if the work is methodologically sound, novel, and meets the quality standards of the field. Despite its many advantages, peer review is prone to bias, strategic and heuristic behaviour, and it is not uncommon for scientifically valid work to be rejected and for spurious findings to be accepted and published. To study the quality of peer reviews, the ACL-2018 conference asked the authors of submissions to rate the quality of the reviewing feedback they received, resulting in a unique dataset of reviews coupled with review quality scores. Yet, since this evaluation of peer reviews was done by the very authors whose work was evaluated, the review quality scores are themselves biased. This thesis will explore review quality assessment from theoretical perspective, perform in-detail data analysis of the ACL-2018 dataset, and develop prototype models for automatic review quality assessment using state-of-the-art NLP.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Nils Dycke, M.Sc.

    Announcement as PDF

  • Task-oriented dialogue systems are designed to support users achieve predefined goals or tasks such as restaurant reservation or navigation inquiry. They often use a pipeline approach that employs multiple modules to perform natural language understanding, dialogue action decision making and response generation. Conventional task-oriented dialogue systems train these modules independently, which can lead to error propagation when the full dialog context is not provided in the subsequent modules. To address the limitation of the conventional pipeline, recent work has explored large pretrained models in the sequence-to-sequence setting for end-to-end task-oriented dialogue systems [1,2]. Despite of the efforts of recent studies, several challenges still remain, including coherence and consistent response generation, mitigating inappropriate response, better strategies for few-shot learning, learning new knowledge or dialogue skills, and better evaluation metrics. This project aims to investigate different approaches for alleviating these challenges.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Thy Thy Tran, PhD

    Announcement as PDF

  • Mental health issues are one of the most common illnesses. Thus, in the last years, mental health has become a more prominent problem domain in NLP. In this interdisciplinary research, human language is examined as a tool to better understand emotional and mental states to reduce the emotional suffering. The research directions are manifold. They range from the detection of mental illnesses or suicide risk in social media, the analysis of psychiatric or psycho-therapeutical dialogues, to the development of online therapeutic dialogue systems.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Tobias Mayer, PhD

    Announcement as PDF

  • The prediction of the outcome of a (medical) treatment can be conducted considering various information. Depending on the treatment, information can come in different data modalities, e.g., audio or video recording, textual records, or bio-markers. To train an automatic prediction model, it can be beneficial to combine data with different modalities. However, it is important to study the effects of different modality combinations. In particular, the role of textual and audio data as additional diagnostic indicators in therapy will be the focus of this investigation.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Tobias Mayer, PhD

    Announcement as PDF

  • A very important step in conducting research is reviewing existing literature. This allows to develop novel ideas, identify current gaps and eventually produce impactful research. However, the number of publications grows rapidly and is far too big for humans to study in detail. In order to overcome this challenge, innovative tools that allow to automate part of the literature review process have to be developed. One such technology is QA, where the answer to a question is automatically produced based on scientific background knowledge. This task is challenging due to scarce data, complex nature of the questions and underlying texts, and long documents.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Tim Baumgärtner, M.Sc.

    Announcement as PDF

  • Although increasingly complex models like GPT-3 and BERT continue to set new state-of-the-arts in many natural language processing tasks, training such models requires a vast amount of data and resources. Increasing the complexity and data even further poses an essential problem due to the limits of currently available hardware, and more- over, is often only possible for large tech-companies. The goal of this thesis is to explore and evaluate various approaches that specifically opt for efficient model training in low-resource scenarios. By investigating approaches from meta-learning [1], curriculum learning [2] and active learning [3] on a wide range of NLP tasks, our goal is to better understand the mechanisms of efficiently training deep neural networks.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Ji-Ung Lee, M.Sc.

    Announcement as PDF

  • Master Thesis

    Applications of open-domain conversational agents are becoming widespread. However, training such agents to generate high-quality responses is still a big challenge as the quality of responses depends on various factors. Recent methods train agents directly by gold responses from training sets. These methods have been shown generating low-quality responses at evaluation. In this thesis, we propose to train a function that quantifies the quality of the generated responses by a deep preference learning method. Then, we use this function as a reward estimator in a reinforcement learning model to train agents.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Dr. Mohsen Mesgar

    Announcement as PDF

  • Master Thesis

    Applications of conversational agents are becoming widespread. However, training such agents to generate high-quality responses is still a big challenge as the quality of responses depends on various factors. One of these factors is coherence. In this thesis, we built upon one of our existing models to measure the coherence of a response to its preceding dialog utterances using BERT-based language models.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Dr. Mohsen Mesgar

    Announcement as PDF

  • Bachelor Thesis, Master Thesis

    Modern science revolves around publications. The worldwide acceleration of research and the democratization of scientific publishing have led to an unprecedented increase in publication volumes. Today anyone can put a paper on arXiv and get it cited, but how do we know if it actually is good research worth building upon?

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Dr. rer. nat. Ilia Kuznetsov

    Announcement as PDF