Reformer: The Efficient Transformer

By Nikita Kitaev et al
Published on Feb. 18, 2020
Read the original document by opening this link in a new tab.

Table of Contents

1. Introduction
2. Locality-Sensitive Hashing Attention
3. Multi-Round LSH Attention
4. Causal Masking for Shared-QK Attention
5. Analysis on a Synthetic Task
6. Reversible Transformer

Summary

The Reformer model introduces two techniques to improve the efficiency of Transformers: replacing dot-product attention with locality-sensitive hashing and using reversible residual layers. These techniques make the Reformer model perform on par with Transformer models while being more memory-efficient and faster on long sequences. The paper also discusses LSH attention, multi-round LSH attention, and reversible Transformer to reduce memory and time complexity in Transformer models.
×
This is where the content will go.