This site is a work in progress and has not been widely shared. Content may contain errors. Feedback is welcome.
This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.
Back to datasets

APEX

APEX: AI Productivity Index

AI-focusedPublicNeither
Visit Dataset
Specific Type
AI benchmarking
Dataset Type
Cross-sectional
Institution
Mercor
Institution Type
Private Data Provider
Level of Focus
Task capability; Occupation
Most Granular Level
Individual professional task level
Perspective
Neither
Time Coverage
2025-present
Frequency
Static benchmark with periodic updates
Sample Size
400 test cases (v1-extended) across 4 professional domains, created by 76 domain experts
Geographic Detail
Global
Occupational Classification
Professional domains (investment banking, management consulting, law, primary medical care)
Key Variables
AI task completion quality; expert-level comparison; productivity measurement across professional domains
AI/Tech Tracking
Evaluates frontier AI models on extended professional tasks that require sustained reasoning and domain expertise; graded by domain experts
Access Details
Leaderboard and results publicly available
Notes
Focuses on tasks requiring significant expert time (1-8 hours), distinguishing it from benchmarks that test quick-answer capabilities; represents the trend toward economically grounded AI evaluation

Key Papers

Vidgen et al. (2025)