Mistral 7B

By Albert Q. Jiang et al
Published on Oct. 10, 2023
Read the original document by opening this link in a new tab.

Table of Contents

1 Introduction
2 Architectural details
3 Results
4 Instruction Finetuning Model
5 Adding guardrails for front-facing applications
6 Conclusion
Acknowledgements
References

Summary

Mistral 7B is a 7-billion-parameter language model engineered for superior performance and efficiency. It outperforms other models across various benchmarks. The model leverages grouped-query attention (GQA) and sliding window attention (SWA) for faster inference and handling long sequences effectively. Mistral 7B is released under the Apache 2.0 license and is designed for ease of fine-tuning across different tasks. The model demonstrates high performance in code generation, mathematics, and reasoning benchmarks. Additionally, Mistral 7B - Instruct model outperforms other chat models in evaluations. The document also discusses the system prompts for enforcing guardrails and content moderation capabilities of Mistral 7B. The work on Mistral 7B opens up new perspectives for compressing knowledge effectively in language models.
×
This is where the content will go.