This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.

HumanEval

HumanEval: Evaluating Large Language Models Trained on Code

AI-focusedPublicNeither

Key Variables

Code generation accuracy; functional correctness; programming capability

AI/Tech Tracking

Python programming capability; basic algorithmic reasoning

Access Details

Available on GitHub and Papers with Code

Notes

Function-level code generation; uses pass@k metric for evaluation; saturated by current models

Key Papers

Who Is Using AI to Code? Global Diffusion and Impact of Generative AI

Daniotti, Wachs, Feng, Neffke (2026)

Daniotti et al. (2026); Chen et al. (2021) OpenAI