Read the original document by opening this link in a new tab.
Table of Contents
1. Introduction
2. Locality-Sensitive Hashing Attention
3. Multi-Round LSH Attention
4. Causal Masking for Shared-QK Attention
5. Analysis on a Synthetic Task
6. Reversible Transformer
Summary
The Reformer model introduces two techniques to improve the efficiency of Transformers: replacing dot-product attention with locality-sensitive hashing and using reversible residual layers. These techniques make the Reformer model perform on par with Transformer models while being more memory-efficient and faster on long sequences. The paper also discusses LSH attention, multi-round LSH attention, and reversible Transformer to reduce memory and time complexity in Transformer models.