This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.

State of AI: An Empirical 100 Trillion Token Study with OpenRouter

Aubakirova, Atallah, Clark, Summerville, Midha

2026arXiv preprint

Adoption / usageComputer Science / AI

LLM / Generative AISoftware / codingPlatforms / gig economyHuman-AI collaboration

Abstract

The past year has marked a turning point in the evolution and real-world use of large language models (LLMs). With the release of the first widely adopted reasoning model, o1, on December 5th, 2024, the field shifted from single-pass pattern generation to multi-step deliberation inference, accelerating deployment, experimentation, and new classes of applications. As this shift unfolded at a rapid pace, our empirical understanding of how these models have actually been used in practice has lagged behind. In this work, we leverage the OpenRouter platform, which is an AI inference provider across a wide variety of LLMs, to analyze over 100 trillion tokens of real-world LLM interactions across tasks, geographies, and time. In our empirical study, we observe substantial adoption of open-weight models, the outsized popularity of creative roleplay (beyond just the productivity tasks many assume dominate) and coding assistance categories, plus the rise of agentic inference. Furthermore, our retention analysis identifies foundational cohorts: early users whose engagement persists far longer than later cohorts. We term this phenomenon the Cinderella "Glass Slipper" effect. These findings underscore that the way developers and end-users engage with LLMs "in the wild" is complex and multifaceted. We discuss implications for model builders, AI developers, and infrastructure providers, and outline how a data-driven understanding of usage can inform better design and deployment of LLM systems.

Summary

Aubakirova et al. analyze over 100 trillion tokens of anonymized metadata from the OpenRouter LLM inference platform (2023-2025) using descriptive statistics and automated content classification to characterize real-world patterns in model adoption, task categories, geographic distribution, and user retention across open source and proprietary LLMs.

Main Finding

Open source models now account for approximately 30% of LLM token usage (up from negligible in 2024), with roleplay (52%) and programming (15-20%) dominating OSS workloads; reasoning models represent over 50% of all usage by late 2025; programming tasks drive the largest prompt token growth (averaging 20K+ tokens); early user cohorts exhibit persistent retention ("Cinderella Glass Slipper effect") while later cohorts show high churn; Asia's share of usage grew from 13% to 31% over the period.

Primary Datasets

OpenRouter

AI-focused

OpenRouter API usage logs (100 trillion tokens)

Secondary Datasets

Google Cloud Natural Language API (classifyText) for content categorization

Key Methods: Large-scale descriptive analysis of API platform usage logs; token volume analysis by model, category, geography, and time; cohort retention analysis; automated content classification using Google Cloud Natural Language API on 0.25% sample of prompts
Sample Period: 2023-2025
Geographic Coverage: International
Sample Size: Over 100 trillion tokens; billions of prompt-completion pairs; detailed analyses focus on November 2024 - November 2025 (13 months); category analyses cover May - November 2025
Level of Analysis: Individual, Task, Region
Occupation Classification: None
Industry Classification: None

Notes

arXiv:2601.10088. Analyzes 100 trillion tokens of real-world AI usage across models via OpenRouter platform. [Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability. [Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability. [Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability. [Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability. [Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability. [Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability. [Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability. [Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability. [Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability. [Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability. [Claude classification]: arXiv:2601.10088. Platform-specific observational study of OpenRouter API usage. Content classification via Google Cloud Natural Language API applied to 0.25% opt-in sample of prompts. Study focuses on model adoption patterns, task categorization, and user retention rather than economic outcomes. Introduces "Cinderella Glass Slipper" framework for understanding persistent user cohorts. Cost metrics reflect blended effective rates including caching, not list prices. BYOK (bring-your-own-key) activity excluded. Category-level analyses limited to May-November 2025 due to data availability.