This site is a work in progress and has not been widely shared. Content may contain errors. Feedback is welcome.

This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.

Back to datasets

MMLU

Massive Multitask Language Understanding (MMLU)

AI-focusedPublicNeither

Specific Type: AI benchmarking
Dataset Type: Cross-sectional
Institution: UC Berkeley; University of Chicago; New York University
Institution Type: Academia
Level of Focus: Task capability; Professional knowledge domains
Most Granular Level: Subject-specific question level
Perspective: Neither
Time Coverage: 2020-present
Frequency: Static benchmark with extensions
Sample Size: 15908 questions across 57 subjects
Geographic Detail: Global
Occupational Classification: Not specified
Industrial Classification: Not specified
Other Classification: Academic subject classification

Key Variables

Professional knowledge accuracy; reasoning capability; domain expertise across 57 subjects

AI/Tech Tracking

Professional-level knowledge across law; medicine; economics; STEM; humanities

Access Details

Available through Hugging Face and Papers with Code

Notes

Covers elementary to professional-level knowledge; includes legal reasoning; medical knowledge; financial understanding

Key Papers

Hendrycks et al. (2021) ICLR