Course Topics
- Multimodal Architectures: e.g. joint Embedding Models, Multimodal Transformers, Neural Modular Approaches – Applications such as Image and Video Description, Visual Question Answering, Text-to-Image Synthesis, Vision and Language Navigation, Multimodal Dialog
- Multimodal Generative Models
- Foundational Multimodal Large Language Models (LLMs): open issues such as Bias, Compositionality, Explainability, and Scaling Laws
- Emergent Topics in Multimodal AI
Organization
| Course type | Integrated Course | 
| Course materials (Moodle) | Multimodal Artificial Intelligence 2025 | 
| Registration and detailed info (TUCan) | 20-00-1193 – Multimodal Artificial Intelligence | 
| Last offered | Summer 2024 | 
| Next offering | Summer 2025 | 
| Lecturer(s) | Prof. Dr. Anna Rohrbach Prof. Dr. Marcus Rohrbach | 
| Assistants | Hector Garcia Rodriguez Jonas Grebe | 
| Exam | TBA | 
| CP (Credit Points) | 6 | 
| Language | English | 
| Recommended prerequisites | At least one course with introductions to AI or Deep Learning or a related course in Computer Vision or Natural Language Processing, or one of the several offered practical courses, is recommended. | 
 
