Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models
By Zhiqiu Lin et al
Published on Aug. 3, 2023
Read the original document by opening this link in a new tab.
Table of Contents
1. Introduction
2. Related Works
3. Cross-Modal Adaptation
4. Vision-Language Adaptation
Summary
The document discusses the importance of cross-modal few-shot learning in building better classifiers by leveraging multimodal information. It introduces a method for cross-modal adaptation that utilizes different modalities to improve performance. The approach demonstrates state-of-the-art results by incorporating textual labels as additional training samples, enhancing existing methods, and extending the concept to audio modalities. The paper also reviews related works, presents a mathematical formalization for cross-modal learning, and explores vision-language adaptation in a multimodal setting.