This site is a work in progress and has not been widely shared. Content may contain errors. Feedback is welcome.

This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.

Back to datasets

LegalBench

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

AI-focusedPublicNeither

Specific Type: AI benchmarking
Dataset Type: Cross-sectional
Institution: Stanford University; University of Chicago; Harvard
Institution Type: Academia
Level of Focus: Task capability
Most Granular Level: Legal reasoning task level
Perspective: Neither
Time Coverage: 2023-present
Frequency: Static benchmark with periodic updates
Sample Size: 162 legal reasoning tasks
Geographic Detail: Global
Occupational Classification: Not specified
Industrial Classification: Not specified
Other Classification: Legal domain classification (6 reasoning types)

Key Variables

Legal reasoning accuracy; IRAC framework performance; issue-spotting; rule application

AI/Tech Tracking

Legal reasoning capability across multiple legal domains

Access Details

Available on GitHub and Hugging Face

Notes

Collaboratively built by 40+ legal professionals; covers all major legal reasoning types

Key Papers

Guha et al. (2023) NeurIPS