This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.
Back to datasetsKey Variables Legal reasoning accuracy; IRAC framework performance; issue-spotting; rule application AI/Tech Tracking Legal reasoning capability across multiple legal domains Access Details Available on GitHub and Hugging Face Notes Collaboratively built by 40+ legal professionals; covers all major legal reasoning types
LegalBench
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
AI-focusedPublicNeither
Visit Dataset- Specific Type
- AI benchmarking
- Dataset Type
- Cross-sectional
- Institution
- Stanford University; University of Chicago; Harvard
- Institution Type
- Academia
- Level of Focus
- Task capability
- Most Granular Level
- Legal reasoning task level
- Perspective
- Neither
- Time Coverage
- 2023-present
- Frequency
- Static benchmark with periodic updates
- Sample Size
- 162 legal reasoning tasks
- Geographic Detail
- Global
- Occupational Classification
- Not specified
- Industrial Classification
- Not specified
- Other Classification
- Legal domain classification (6 reasoning types)
Key Papers
Guha et al. (2023) NeurIPS