Read the original document by opening this link in a new tab.
Table of Contents
Preface
Foreword
1 Introduction
1.1 Introduction to Multimodal Deep Learning
1.2 Outline of the Booklet
2 Introducing the modalities
2.1 State-of-the-art in NLP
2.2 State-of-the-art in Computer Vision
2.3 Resources and Benchmarks for NLP, CV and multimodal tasks
3 Multimodal architectures
3.1 Image2Text
3.2 Text2Image
3.3 Images supporting Language Models
3.4 Text supporting Vision Models
3.5 Models for both modalities
4 Further Topics
4.1 Including Further Modalities
4.2 Structured + Unstructured Data
4.3 Multipurpose Models
4.4 Generative Art
5 Conclusion
6 Epilogue
7 Acknowledgements
Summary
The book 'Multimodal Deep Learning' provides an overview of state-of-the-art approaches in the fields of Natural Language Processing (NLP) and Computer Vision (CV), focusing on multimodal deep learning. It covers topics such as the introduction to multimodal deep learning, modalities like NLP and CV, multimodal architectures, further topics including various modalities, and applications in generative art. The book emphasizes collaboration and aims to create a comprehensive understanding of how different modalities can be combined in deep learning models.