Read the original document by opening this link in a new tab.
Table of Contents
1. Abstract
2. Introduction
3. Data Collection
4. Domains
5. Dataset Splits
6. Dataset Analysis
Summary
The GPQA document presents a challenging dataset of multiple-choice questions written by domain experts in biology, physics, and chemistry. The questions are Google-proof, difficult for both experts and AI systems. The document discusses the need for scalable oversight methods in evaluating AI systems. The dataset is categorized into biology, physics, and chemistry domains. Two curated subsets, GPQA main and GPQA Diamond, are recommended for experiments based on question objectivity and difficulty analysis.