Open Theses

If you are interested in writing a thesis with us, please send your CV, your grade transcripts (including the Bachelor's transcript if you are a Master's student), a motivation letter and at least three potential topics from the list below to thesis@ukp.tu-darmstadt.de. Due to the current volume of applications, we are temporarily suspending the acceptance of new applications and prioritizing those already in queue.

  • Master Thesis

    This is a broader topic which aims to propose a privacy-preserving strategy to generate synthetic text data with LLMs while resembling the sensitive data distributions. The thesis touches on topics like memorization, text anonymization and differential privacy. The student should ideally start by benchmarking existing synthetic data generation techniques. Then, the work involves implementing differential privacy mechanisms with LLMs and evaluating the privacy-utility tradeoff. More structured RQs can be discussed depending on the student's interests.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Anmol Goel, M.Sc.

  • Bachelor Thesis

    LLMs are recently being used for emotional support and therapy. However, current LLMs also show undesirable behaviours like sycophancy, where models tailor their responses to follow a human user's view even when that view is not objectively correct. This is especially problematic for settings like mental health. The student will work on the investigation of sycophancy in medical LLMs by first benchmarking the prevalence of sycophancy in SoTA LLMs. They will then generate synthetic preference data to reduce sycophancy and evaluate the model.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Anmol Goel, M.Sc.

  • Bachelor Thesis

    The goal is to train toy language models on various formal languages across the Chomsky hierarchy and characterize the training dynamics. Notions like grokking, double descent (on gradient updates, training data, parameter size), formation of circuits, etc. will be investigated. By default, Transformer-based models will be of key focus; however, state-space models and traditional RNNs will be experimented with as well.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Subhabrata Dutta, PhD

  • Bachelor Thesis

    The goal is to investigate the representation of document structure of LLMs. There will be two kinds of experiments. 1. Checking how well LLMs represent document structure by asking them to retrieve specific sections of documents or identifying the relations between certain segments. 2. Checking whether adding structural markers to the document input (such as markdown) improves performance on long document downstream tasks.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Jan Buchmann, M.Sc.

  • Master Thesis

    This is a broader topic complex consisting of multiple concrete research questions to be addressed. The key idea evolves around the use of Pearl's structural causal models as text generation processes and exploring the potentials of contemporary LMs for this purpose. The topic comprises artificial data generation and causal estimation using LLMs.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Nils Dycke, M.Sc.

  • Master Thesis

    This thesis aimed to utilize prior learning on the early period of training and OOD generalization to improve the efficiency of training and create efficient architecture. The student should have a strong math/ML background.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Cecilia Liu, M.Sc.

  • Master Thesis

    Language models represent words and sentences as long vectors capturing their meaning, so-called “embeddings”. This representation has proven successful for various natural language tasks. However, encoding these high-dimensional embeddings onto a quantum computer is challenging and resource-intensive, often requiring a large number of qubits and complex gate operations. This project aims to develop new methodologies for representing information on quantum devices, focusing on achieving effective performance in natural language tasks while reducing hardware costs, such as minimizing the number of gates and qubits needed. By optimizing these representations, the goal is to make quantum computing more practical and efficient for language-related applications. This thesis requires a background in quantum computing / quantum information theory.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Federico Tiblias, M.Sc.

  • Bachelor Thesis

    Our latest work has shown promising results for a quantum-native text classification algorithm. The model exploits a novel encoding procedure to represent inputs, and has been tested on a sentiment analysis task. In this project we expand the scope by evaluating the model on different tasks, seek ways to extend it beyond binary classification, and possibly evaluate its performance on quantum hardware. This thesis requires a background in quantum computing / quantum information theory.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Federico Tiblias, M.Sc.

  • Bachelor Thesis

    The thesis explores methodologies for identifying text modifications made by large language models (LLMs). This research aims to develop robust algorithms and tools to distinguish human-authored content from revisions or completions generated by AI, ensuring the integrity and originality of textual documents in various applications.

    Supervisors: Prof.‘in Dr. Iryna Gurevych, Qian Ruan, M.Sc.