Simplifying Transformer Blocks

By Bobby He et al.
Published on Nov. 3, 2023
Read the original document by opening this link in a new tab.

Table of Contents

1. Introduction
2. Related Work
3. Preliminaries
4. Simplifying Transformer Blocks

Summary

The paper discusses simplifying transformer blocks by removing non-essential components such as skip connections, value parameters, projection parameters, and sequential sub-blocks. The study aims to improve training speed and efficiency of transformer architectures. The authors present experimental findings and theoretical insights to support their simplification approach. By removing unnecessary components, the simplified transformers show comparable performance to standard transformers while reducing parameter count and increasing training throughput. The study highlights the importance of signal propagation theory and empirical observations in designing efficient transformer blocks.
×
This is where the content will go.