Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows

By Ze Liu et al

Published on Aug. 17, 2021

Read the original document by opening this link in a new tab.

1. Introduction
2. Related Work
3. Method
3.1. Overall Architecture
3.2. Shifted Window based Self-Attention

Summary

This paper presents a new vision Transformer, called Swin Transformer, that serves as a general-purpose backbone for computer vision. It addresses challenges in adapting Transformer from language to vision by proposing a hierarchical Transformer with representation computed using Shifted windows. The hierarchical architecture allows modeling at various scales and has linear computational complexity with respect to image size. Swin Transformer demonstrates strong performance in various vision tasks including image classification, object detection, and semantic segmentation. The shifted window approach enhances modeling power and efficiency in self-attention computation. The paper introduces an efficient batch computation approach for self-attention in shifted window partitioning, showing promising results in image classification, object detection, and semantic segmentation tasks.

This is where the content will go.

Innervu Knowledge Navigator

Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows

By Ze Liu et al

Published on Aug. 17, 2021

Read the original document by opening this link in a new tab.

Table of Contents

Summary