This site is a work in progress and has not been widely shared. Content may contain errors. Feedback is welcome.
This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.
Back to datasets

LegalBench

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

AI-focusedPublicNeither
Visit Dataset
Specific Type
AI benchmarking
Dataset Type
Cross-sectional
Institution
Stanford University; University of Chicago; Harvard
Institution Type
Academia
Level of Focus
Task capability
Most Granular Level
Legal reasoning task level
Perspective
Neither
Time Coverage
2023-present
Frequency
Static benchmark with periodic updates
Sample Size
162 legal reasoning tasks
Geographic Detail
Global
Occupational Classification
Not specified
Industrial Classification
Not specified
Other Classification
Legal domain classification (6 reasoning types)
Key Variables
Legal reasoning accuracy; IRAC framework performance; issue-spotting; rule application
AI/Tech Tracking
Legal reasoning capability across multiple legal domains
Access Details
Available on GitHub and Hugging Face
Notes
Collaboratively built by 40+ legal professionals; covers all major legal reasoning types

Key Papers

Guha et al. (2023) NeurIPS