OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

By Peng Wang et al

Published on June 1, 2022

Read the original document by opening this link in a new tab.

ABSTRACT
1. Introduction
2. Related Work
3. OFA
3.1 I/O & Architecture
3.2 Tasks & Modalities
3.3 Pretraining Datasets
3.4 Training & Inference
3.5 Scaling Models

Summary

In this work, the authors propose OFA, a Task-Agnostic and Modality-Agnostic framework that supports Task Comprehensiveness. OFA unifies a diverse set of cross-modal and unimodal tasks in a simple sequence-to-sequence learning framework. The model architecture is based on Transformer and designed to handle tasks such as visual grounding, image captioning, and language modeling. OFA is pretrained on 20M publicly available image-text pairs and achieves state-of-the-art performance in various tasks. The authors emphasize the importance of unifying architectures, tasks, and modalities to achieve better generalization and performance in downstream tasks.

This is where the content will go.

Innervu Knowledge Navigator

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

By Peng Wang et al

Published on June 1, 2022

Read the original document by opening this link in a new tab.

Table of Contents

Summary