Star-Transformer

By Qipeng Guo et al
Read the original document by opening this link in a new tab.

Table of Contents

Abstract
1 Introduction
2 Related Work
3 Model
3.1 Architecture
3.2 Implementation
3.3 Output
4 Comparison to the standard Transformer
5 Experiments
5.1 Masked Summation

Summary

Star-Transformer is a lightweight alternative to the standard Transformer in natural language processing tasks. It introduces a star-shaped topology with radial and ring connections to reduce model complexity and improve efficiency. The experiments show significant improvements over the standard Transformer on modestly sized datasets. The model preserves the ability to handle long-range dependencies and achieves better performance in various NLP tasks.
×
This is where the content will go.