Read the original document by opening this link in a new tab.
Table of Contents
Abstract
1 Introduction
2 Related Work
3 Model
3.1 Architecture
3.2 Implementation
3.3 Output
4 Comparison to the standard Transformer
5 Experiments
5.1 Masked Summation
Summary
Star-Transformer is a lightweight alternative to the standard Transformer in natural language processing tasks. It introduces a star-shaped topology with radial and ring connections to reduce model complexity and improve efficiency. The experiments show significant improvements over the standard Transformer on modestly sized datasets. The model preserves the ability to handle long-range dependencies and achieves better performance in various NLP tasks.