State of AI: An Empirical 100 Trillion Token Study with OpenRouter
Aubakirova, Atallah, Clark, Summerville, Midha
2026arXiv preprint
Adoption / usageComputer Science / AI
LLM / Generative AISoftware / codingPlatforms / gig economyHuman-AI collaboration
AbstractThe past year has marked a turning point in the evolution and real-world use of large language models (LLMs). With the release of the first widely adopted reasoning model, o1, on December 5th, 2024, the field shifted from single-pass pattern generation to multi-step deliberation inference, accelerating deployment, experimentation, and new classes of applications. As this shift unfolded at a rapid pace, our empirical understanding of how these models have actually been used in practice has lagged behind. In this work, we leverage the OpenRouter platform, which is an AI inference provider across a wide variety of LLMs, to analyze over 100 trillion tokens of real-world LLM interactions across tasks, geographies, and time. In our empirical study, we observe substantial adoption of open-weight models, the outsized popularity of creative roleplay (beyond just the productivity tasks many assume dominate) and coding assistance categories, plus the rise of agentic inference. Furthermore, our retention analysis identifies foundational cohorts: early users whose engagement persists far longer than later cohorts. We term this phenomenon the Cinderella "Glass Slipper" effect. These findings underscore that the way developers and end-users engage with LLMs "in the wild" is complex and multifaceted. We discuss implications for model builders, AI developers, and infrastructure providers, and outline how a data-driven understanding of usage can inform better design and deployment of LLM systems.
SummaryAubakirova et al. analyze over 100 trillion tokens of anonymized metadata from the OpenRouter LLM inference platform (2023-2025) using descriptive statistics and automated content classification to characterize real-world patterns in model adoption, task categories, geographic distribution, and user retention across open source and proprietary LLMs.
Main FindingOpen source models now account for approximately 30% of LLM token usage (up from negligible in 2024), with roleplay (52%) and programming (15-20%) dominating OSS workloads; reasoning models represent over 50% of all usage by late 2025; programming tasks drive the largest prompt token growth (averaging 20K+ tokens); early user cohorts exhibit persistent retention ("Cinderella Glass Slipper effect") while later cohorts show high churn; Asia's share of usage grew from 13% to 31% over the period.
Primary Datasets
OpenRouter API usage logs (100 trillion tokens)
Secondary Datasets
Google Cloud Natural Language API (classifyText) for content categorization
- Key Methods
- Large-scale descriptive analysis of API platform usage logs; token volume analysis by model, category, geography, and time; cohort retention analysis; automated content classification using Google Cloud Natural Language API on 0.25% sample of prompts
- Sample Period
- 2023-2025
- Geographic Coverage
- International
- Sample Size
- Over 100 trillion tokens; billions of prompt-completion pairs; detailed analyses focus on November 2024 - November 2025 (13 months); category analyses cover May - November 2025
- Level of Analysis
- Individual, Task, Region
- Occupation Classification
- None
- Industry Classification
- None
NotesarXiv:2601.10088. Analyzes 100 trillion tokens of real-world AI usage across models via OpenRouter platform.
[Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability.
[Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability.
[Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability.
[Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability.
[Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability.
[Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability.
[Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability.
[Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability.
[Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability.
[Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability.
[Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability.