Large-Scale Multi-Modal Pre-Trained Models: A Comprehensive Survey

By Xiao Wang et al
Published on April 10, 2024
Read the original document by opening this link in a new tab.

Table of Contents

1 Introduction
2 Background
3 Multi-Modal Pre-training
- Task Definition and Key Challenges
- Advantages of MM-PTMs
- Pre-training

Summary

This paper provides a comprehensive survey of large-scale multi-modal pre-trained models. It discusses the urgent demand for generalized deep models and the success of models like BERT, ViT, GPT in single domains. The survey covers the background of multi-modal pre-training, key challenges in acquiring and cleaning multi-modal data, designing network architectures, pre-training objectives, and the importance of computing power. It also highlights the advantages of multi-modal pre-trained models in addressing practical application scenarios and extracting common features across modalities. Overall, the paper aims to provide new insights and support researchers in tracking cutting-edge developments in multi-modal pre-training.
×
This is where the content will go.