Course Topics
- Multimodal Architectures: e.g. joint Embedding Models, Multimodal Transformers, Neural Modular Approaches – Applications such as Image and Video Description, Visual Question Answering, Text-to-Image Synthesis, Vision and Language Navigation, Multimodal Dialog
- Multimodal Generative Models
- Foundational Multimodal Large Language Models (LLMs): open issues such as Bias, Compositionality, Explainability, and Scaling Laws
- Emergent Topics in Multimodal AI