This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.
Back to datasetsKey Variables Conversation content, user demographics, language detection, toxicity flags, timestamps AI/Tech Tracking ChatGPT/GPT-4 usage patterns Access Details Available on Hugging Face under ODC-BY license Notes Contains diverse languages and some toxic content; primarily research/training focused
WildChat
WildChat: 1M ChatGPT Interaction Logs in the Wild
AI-focusedPublicWorker-side
Visit Dataset- Specific Type
- AI usage "In the wild"
- Dataset Type
- Cross-sectional
- Institution
- Allen Institute for AI
- Institution Type
- Academia
- Level of Focus
- Individual conversations
- Most Granular Level
- Conversation level with demographic data
- Perspective
- Worker-side
- Time Coverage
- 2023
- Frequency
- One-time static snapshot
- Sample Size
- 1M conversations, 2.5M interaction turns, 204K unique IPs
- Geographic Detail
- State and country level
- Occupational Classification
- Not specified
- Industrial Classification
- Not specified
- Other Classification
- Geographic (state, country), Language tags
Key Papers
How People Use ChatGPT
Chatterji, Cunningham, Deming, Hitzig, Ong, Shan, Wadman (2025)
Who Is Using AI to Code? Global Diffusion and Impact of Generative AI
Daniotti, Wachs, Feng, Neffke (2026)
Clio: Privacy-Preserving Insights into Real-World AI Use
Tamkin, McCain, Handa, Durmus, Lovitt, Rathi, Huang, Mountfield, Hong, Ritchie, Stern, Clarke, Goldberg, Sumers, Mueller, McEachen, Mitchell, Carter, Clark, Kaplan, Ganguli (2024)
Chatterji et al. (2025); Daniotti et al. (2026); Tamkin et al. (2024); Zhao et al. (2024) - "WildChat: 1M ChatGPT Interaction Logs in the Wild"; Feuer & Hegde (2025) - "WildChat-50m: A Deep Dive Into the Role of Synthetic Data in Post-Training"