Course Topics
- Multimodal Architectures: e.g. joint Embedding Models, Multimodal Transformers, Neural Modular Approaches – Applications such as Image and Video Description, Visual Question Answering, Text-to-Image Synthesis, Vision and Language Navigation, Multimodal Dialog
- Multimodal Generative Models
- Foundational Multimodal Large Language Models (LLMs): open issues such as Bias, Compositionality, Explainability, and Scaling Laws
- Emergent Topics in Multimodal AI
Organization
| Course type | Integrated Course |
| Course materials (Moodle) | Multimodal Artificial Intelligence 2025 |
| Registration and detailed info (TUCan) | 20-00-1193 – Multimodal Artificial Intelligence |
| Last offered | Summer 2024 |
| Next offering | Summer 2025 |
| Lecturer(s) |
Prof. Dr. Anna Rohrbach Prof. Dr. Marcus Rohrbach |
| Assistants |
Hector Garcia Rodriguez Jonas Grebe |
| Exam | TBA |
| CP (Credit Points) | 6 |
| Language | English |
| Recommended prerequisites | At least one course with introductions to AI or Deep Learning or a related course in Computer Vision or Natural Language Processing, or one of the several offered practical courses, is recommended. |