Extending Context Window of Large Language Models via Position Interpolation

By Shouyuan Chen et al.
Read the original document by opening this link in a new tab.

Table of Contents

1 INTRODUCTION
2 METHOD
2.1 BACKGROUND: ROTARY POSITION EMBEDDING (ROPE)
2.2 DIRECT EXTRAPOLATION
2.3 PROPOSED APPROACH: POSITION INTERPOLATION (PI)
3 EXPERIMENTS
3.1 SETUP
3.2 LONG SEQUENCE LANGUAGE MODELING
4 CONCLUSION

Summary

The paper introduces Position Interpolation (PI) to extend context window sizes of large language models. PI linearly down-scales input positions to match the original context window size, avoiding catastrophic attention score increases. Theoretical and experimental results show that PI enables effective extension of context windows with minimal fine-tuning. Models extended with PI exhibit strong performance in various tasks requiring long contexts. The method retains original model architecture and requires only short fine-tuning periods. Experimental results demonstrate the effectiveness of PI in extending context windows of LLaMA models up to 32768 tokens.
×
This is where the content will go.