Multimodal Artificial Intelligence
Integrated Course

The course provides an introduction to Multimodal Artificial Intelligence, a research area at the intersection of Computer Vision, Natural Language Processing, and Deep Learning. We will cover approaches to modeling multiple input and output modalities (with an emphasis on text, images, and video): from the early ones to the modern-day cutting-edge AI technology.

Course Topics

  • Multimodal Architectures: e.g. joint Embedding Models, Multimodal Transformers, Neural Modular Approaches – Applications such as Image and Video Description, Visual Question Answering, Text-to-Image Synthesis, Vision and Language Navigation, Multimodal Dialog
  • Multimodal Generative Models
  • Foundational Multimodal Large Language Models (LLMs): open issues such as Bias, Compositionality, Explainability, and Scaling Laws
  • Emergent Topics in Multimodal AI

Organization

Course type Integrated Course
Course materials (Moodle) Multimodal Artificial Intelligence 2025
Registration and detailed info (TUCan) 20-00-1193 – Multimodal Artificial Intelligence
Last offered Summer 2024
Next offering Summer 2025
Lecturer(s) Prof. Dr. Anna Rohrbach
Prof. Dr. Marcus Rohrbach
Assistants Hector Garcia Rodriguez
Jonas Grebe
Exam TBA
CP (Credit Points) 6
Language English
Recommended prerequisites At least one course with introductions to AI or Deep Learning or a related course in Computer Vision or Natural Language Processing, or one of the several offered practical courses, is recommended.