This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.

GDPval

GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks

AI-focusedPublicNeither

Specific Type: AI benchmarking
Dataset Type: Cross-sectional
Institution: OpenAI
Institution Type: AI Lab
Level of Focus: Task capability; Occupation
Most Granular Level: Individual professional task level
Perspective: Neither
Time Coverage: 2025
Frequency: One-time release
Sample Size: 1,320 tasks across 44 occupations in 9 GDP sectors
Geographic Detail: US-focused (GDP sectors)
Occupational Classification: BLS Work Activities mapped to 44 occupations
Industrial Classification: 9 largest US GDP sectors

Key Variables

Task completion quality relative to human experts; cost comparison; time savings; deliverable accuracy across professional occupations

AI/Tech Tracking

Directly measures AI performance on economically valuable professional tasks; tracks frontier model progress on real work products including documents, slides, diagrams, and spreadsheets

Access Details

220-task gold subset publicly available; automated grader available at evals.openai.com

Notes

Tasks designed by industry professionals averaging 14 years of experience; represents a shift toward benchmarks that measure economic productivity rather than abstract capabilities; frontier models approaching expert-level quality on many tasks

Key Papers

Patwardhan et al. (2025)