Multi-Modal Commonsense Reasoning
When we as humans reason about our world, we use information from multiple modalities (vision, sound, text, smell, etc.) to reach conclusions. These conclusions are often based on implicit experiences of our past and can be regarded as common sense reasoning. This thesis has the ambition of understanding what commonsense entails and how we can train an AI that is able to perform true reasoning.