This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.
Back to datasetsKey Variables Professional knowledge accuracy; reasoning capability; domain expertise across 57 subjects AI/Tech Tracking Professional-level knowledge across law; medicine; economics; STEM; humanities Access Details Available through Hugging Face and Papers with Code Notes Covers elementary to professional-level knowledge; includes legal reasoning; medical knowledge; financial understanding
MMLU
Massive Multitask Language Understanding (MMLU)
AI-focusedPublicNeither
Visit Dataset- Specific Type
- AI benchmarking
- Dataset Type
- Cross-sectional
- Institution
- UC Berkeley; University of Chicago; New York University
- Institution Type
- Academia
- Level of Focus
- Task capability; Professional knowledge domains
- Most Granular Level
- Subject-specific question level
- Perspective
- Neither
- Time Coverage
- 2020-present
- Frequency
- Static benchmark with extensions
- Sample Size
- 15908 questions across 57 subjects
- Geographic Detail
- Global
- Occupational Classification
- Not specified
- Industrial Classification
- Not specified
- Other Classification
- Academic subject classification
Key Papers
Hendrycks et al. (2021) ICLR