Multimodal AI - Integrated Course – Multimodal AI

Multimodal Artificial Intelligence

Integrated Course

The course provides an introduction to Multimodal Artificial Intelligence, a research area at the intersection of Computer Vision, Natural Language Processing, and Deep Learning. We will cover approaches to modeling multiple input and output modalities (with an emphasis on text, images, and video): from the early ones to the modern-day cutting-edge AI technology.

Course Topics

Multimodal Architectures: e.g. joint Embedding Models, Multimodal Transformers, Neural Modular Approaches – Applications such as Image and Video Description, Visual Question Answering, Text-to-Image Synthesis, Vision and Language Navigation, Multimodal Dialog
Multimodal Generative Models
Foundational Multimodal Large Language Models (LLMs): open issues such as Bias, Compositionality, Explainability, and Scaling Laws
Emergent Topics in Multimodal AI

Organization

Course type	Integrated Course
Course materials (Moodle)	Multimodal Artificial Intelligence 2025
Registration and detailed info (TUCan)	20-00-1193 – Multimodal Artificial Intelligence
Last offered	Summer 2024
Next offering	Summer 2025
Lecturer(s)	Prof. Dr. Anna Rohrbach Prof. Dr. Marcus Rohrbach
Assistants	Hector Garcia Rodriguez Jonas Grebe
Exam	TBA
CP (Credit Points)	6
Language	English
Recommended prerequisites	At least one course with introductions to AI or Deep Learning or a related course in Computer Vision or Natural Language Processing, or one of the several offered practical courses, is recommended.