Read the original document by opening this link in a new tab.
Table of Contents
Abstract
1 Introduction
22 Technical specifications
2.1 Architecture
2.2 Training data
2.3 Training details
2.4 Filtered web data
33 Benchmark results
4 Addressing Toxicity and Biases
Summary
This paper continues the investigation into the power of Transformer-based language models, focusing on common sense reasoning. The new 1.3 billion parameter model named phi-1.5 exhibits performance comparable to larger models, especially in reasoning tasks. The model was trained on a dataset of 30 billion tokens, mostly synthetic 'textbook-like' data. The study also evaluates the impact of web data and synthetic data on model performance across various benchmarks. Additionally, the paper addresses the challenges of toxicity and biases in language models, showcasing how the use of synthetic data can mitigate these issues.