This site is a work in progress and has not been widely shared. Content may contain errors. Feedback is welcome.
This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.
Back to datasets

MMLU

Massive Multitask Language Understanding (MMLU)

AI-focusedPublicNeither
Visit Dataset
Specific Type
AI benchmarking
Dataset Type
Cross-sectional
Institution
UC Berkeley; University of Chicago; New York University
Institution Type
Academia
Level of Focus
Task capability; Professional knowledge domains
Most Granular Level
Subject-specific question level
Perspective
Neither
Time Coverage
2020-present
Frequency
Static benchmark with extensions
Sample Size
15908 questions across 57 subjects
Geographic Detail
Global
Occupational Classification
Not specified
Industrial Classification
Not specified
Other Classification
Academic subject classification
Key Variables
Professional knowledge accuracy; reasoning capability; domain expertise across 57 subjects
AI/Tech Tracking
Professional-level knowledge across law; medicine; economics; STEM; humanities
Access Details
Available through Hugging Face and Papers with Code
Notes
Covers elementary to professional-level knowledge; includes legal reasoning; medical knowledge; financial understanding

Key Papers

Hendrycks et al. (2021) ICLR