Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

By Paul Pu Liang et al
Published on Oct. 10, 2022
Read the original document by opening this link in a new tab.

Table of Contents

1 INTRODUCTION
2 FOUNDATIONAL PRINCIPLES IN MULTIMODAL RESEARCH
2.1 Principle 1: Modalities are Heterogeneous
2.2 Principle 2: Modalities are Connected
2.3 Principle 3: Modalities Interact

Summary

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. This paper provides an overview of the computational and theoretical foundations of multimodal machine learning, defining key principles and proposing a taxonomy of core technical challenges. The document discusses the challenges posed by the heterogeneity of data sources and interconnections found between modalities in multimodal research. It also presents a taxonomy of six core technical challenges encompassing representation, alignment, reasoning, generation, transference, and quantification, covering historical and recent trends in multimodal learning.
×
This is where the content will go.