LLm2Vec: Large Language Models Are Secretly Powerful Text Encoders
By Parishad Behnamghader et al
Published on April 9, 2024
Read the original document by opening this link in a new tab.
Table of Contents
1 Introduction
2 LLM2Vec 2.1 Three simple ingredients
2.2 Transforming decoder-only LLMs with LLM2Vec Models
3 LLM2Vec-transformed models are strong unsupervised text embedders
3.1 Evaluation on word-level tasks
3.2 Evaluation on sequence-level tasks
4 How does LLM2Vec affect a model?
4.1 LLM2Vec helps models to capture information from future tokens
Summary
Large decoder-only language models (LLMs) are the state-of-the-art models on most NLP tasks. However, the community has been slow to adopt these models for text embedding tasks. In response, LLM2Vec is introduced as an unsupervised approach to transform any decoder-only LLM into a strong text encoder. This process involves three simple steps: enabling bidirectional attention, masked next token prediction, and unsupervised contrastive learning. Results show that LLM2Vec outperforms encoder-only models on word-level tasks and achieves state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB) when combined with supervised contrastive learning. The paper provides detailed analysis and empirical results demonstrating the effectiveness of LLM2Vec in transforming LLMs into universal text encoders.