Embedding Recycling for Language Models

By Jon Saad-Falcon et al
Read the original document by opening this link in a new tab.

Table of Contents

1. Introduction
2. Related Work
3. Methods
4. Experimental Setup
5. Results
5.1 Standard Fine-tuning
5.2 Adapters

Summary

The document discusses the technique of embedding recycling (ER) for reducing computational costs in training and inference for language models. It explores layer recycling with two techniques: standard fine-tuning and parameter-efficient adapters. The experiments evaluate the effectiveness of embedding recycling across various tasks, datasets, and transformer models. Results show that reduced models with cached embeddings perform similarly to fully fine-tuned models for text classification and named-entity recognition tasks. However, fully fine-tuned models outperform reduced models for question-answering tasks.
×
This is where the content will go.