Multimodal Grounded Learning

The Multimodal Grounded Learning lab was founded in 2023 by Prof. Anna Rohrbach and is part of the Department of Computer Science at TU Darmstadt. Together with the Multimodal Reliable AI lab, we form the Multimodal AI Lab. The lab aims to develop multimodal AI models that can communicate with humans and, importantly, are grounded in reality.

We are interested in a variety of problems, such as image and video description, visual grounding, text-to-image synthesis, multimodal fact-checking, and beyond. To learn about our previous work, please see Prof. Anna Rohrbach’s Google Scholar page.

Research Project Areas

Multimodal Fact-Checking

Disinformation is arguably one of the biggest threats for society and democracy. Multimodal fact-checking aims to verify claims (e.g. from social media posts) that, beyond text, also include images or videos. The fact-checking process includes the interpretation of the actual claim, the retrieval of textual and visual evidence from the open web, and reasoning to find if the claim holds true. We are pushing the boundaries of current multimodal LLM approaches to take fact-checking to an effective and scalable level.
Multimodal Deepfake Detection

Deepfake technology poses a significant risk to societal trust and democracy. Multimodal deepfake detection focuses on identifying falsified content (e.g., videos, audio, and images) that manipulate both visual and auditory elements. The detection process involves analyzing the suspicious media, extracting relevant features from both visual and audio data, and applying advanced algorithms to determine authenticity. Our goal is to enhance current multimodal approaches to deepfake detection, making it more accurate and scalable to counteract the growing threat of synthetic media.

Research Project Areas

Multimodal Fact-Checking

Multimodal Deepfake Detection