Global Self-Attention Networks for Image Recognition
By Shen Zhuoran et al
Published on Oct. 14, 2020
Read the original document by opening this link in a new tab.
Table of Contents
1. Introduction
2. Related Works
2.1 Auxiliary Visual Attention
2.2 Backbone Visual Attention
3. Global Self-Attention Network
3.1 Global Self-Attention Module
3.1.1 Content Attention Layer
3.1.2 Positional Attention Layer
3.2 GSA Networks
3.3 Justifications
Summary
The document discusses the introduction of a new global self-attention module, known as the GSA module, for image recognition tasks. It addresses the limitations of existing attention mechanisms by incorporating both content-based and positional-based attention layers. The GSA network, which uses GSA modules instead of convolutions, shows superior performance in modeling long-range pixel interactions. Experimental results demonstrate the effectiveness of GSA networks on the CIFAR-100 and ImageNet datasets compared to traditional convolution-based networks. The proposed GSA module's direct global attention operation for content attention and axial attention mechanism for positional attention provide significant improvements over existing methods.