Dissecting Recall of Factual Associations in Auto-Regressive Language Models

By Mor Geva et al
Read the original document by opening this link in a new tab.

Table of Contents

Abstract
1 Introduction
2 Background and Notation
3 Experimental Setup
4 Overview: Experiments & Findings
5 Localizing Information Flow via Attention Knockout
6 Intermediate Subject Representations
7 Attribute Extraction
8 Conclusion

Summary

Transformer-based language models (LMs) capture factual knowledge in their parameters. This paper investigates how factual associations are stored and extracted internally in LMs through information flow analysis. The study reveals a three-step internal mechanism for attribute extraction, involving subject enrichment, relation propagation, and attribute extraction via attention heads. The findings shed light on knowledge localization and model editing in LMs.
×
This is where the content will go.