Read the original document by opening this link in a new tab.
Table of Contents
1. Introduction
2. Related Work
3. Deformable Attention Transformer
3.1. Preliminaries
3.2. Deformable Attention
3.3. Model Architectures
Summary
Transformers have recently shown superior performances on various vision tasks. The large receptive field endows Transformer models with higher representation power over CNN counterparts. To mitigate issues related to attention computation, a novel deformable self-attention module is proposed. Deformable Attention Transformer is a backbone model for image classification and dense prediction tasks, achieving improved results on benchmarks.