Multimodal Deep Learning

By Matthias Aßenmacher et al
Published on Jan. 12, 2023
Read the original document by opening this link in a new tab.

Table of Contents

Preface
Foreword
1 Introduction
1.1 Introduction to Multimodal Deep Learning
1.2 Outline of the Booklet
2 Introducing the modalities
2.1 State-of-the-art in NLP
2.2 State-of-the-art in Computer Vision
2.3 Resources and Benchmarks for NLP, CV and multimodal tasks
3 Multimodal architectures
3.1 Image2Text
3.2 Text2Image
3.3 Images supporting Language Models
3.4 Text supporting Vision Models
3.5 Models for both modalities
4 Further Topics
4.1 Including Further Modalities
4.2 Structured + Unstructured Data
4.3 Multipurpose Models
4.4 Generative Art
5 Conclusion
6 Epilogue
7 Acknowledgements

Summary

The book 'Multimodal Deep Learning' provides an overview of state-of-the-art approaches in the fields of Natural Language Processing (NLP) and Computer Vision (CV), focusing on multimodal deep learning. It covers topics such as the introduction to multimodal deep learning, modalities like NLP and CV, multimodal architectures, further topics including various modalities, and applications in generative art. The book emphasizes collaboration and aims to create a comprehensive understanding of how different modalities can be combined in deep learning models.
×
This is where the content will go.