Gpqa: A Graduate-Level Google-Proof Q&A Benchmark

By David Rein et al
Published on Nov. 20, 2023
Read the original document by opening this link in a new tab.

Table of Contents

1. Abstract
2. Introduction
3. Data Collection
4. Domains
5. Dataset Splits
6. Dataset Analysis

Summary

The GPQA document presents a challenging dataset of multiple-choice questions written by domain experts in biology, physics, and chemistry. The questions are Google-proof, difficult for both experts and AI systems. The document discusses the need for scalable oversight methods in evaluating AI systems. The dataset is categorized into biology, physics, and chemistry domains. Two curated subsets, GPQA main and GPQA Diamond, are recommended for experiments based on question objectivity and difficulty analysis.
×
This is where the content will go.