The Empirical Economics of AI

Introduction

AI capabilities are advancing rapidly, and with them, concerns about economic disruption. Business leaders have predicted a “white-collar bloodbath,” and 53–75% of the US public believes AI will increase unemployment, with about one third worried about having their own job automated (Hatz et al., 2025). Governments are responding: the US AI Action Plan calls for “a serious workforce response to help workers navigate that transition,” while international bodies from the OECD to the ILO are developing their own monitoring frameworks. Most evidence to date points to relatively limited aggregate effects, with studies generally finding modest productivity gains that have not led to substantial displacement (Chandar, 2025; Humlum, 2025; Hampole et al., 2025). Yet this generally muted picture may mask heterogeneous effects—Brynjolfsson et al. (2025) document a 13% decline in highly-exposed occupations, though Frank et al. (2026) show that AI-exposed occupations were already deteriorating before ChatGPT’s release, complicating causal attribution.

Making sense of AI’s economic implications requires data. And there is no shortage of efforts: government surveys track firm-level adoption, AI companies release task-level usage logs, researchers run controlled experiments, and economists develop exposure indices to predict which occupations are most affected. Yet the overall landscape of information being collected remains fragmented and difficult to navigate. Datasets use different taxonomies, operate at different levels of aggregation, and measure different facets of the same underlying phenomenon. Valuable efforts to systematize this landscape exist—such as Crane et al. (2025)—but they remain limited in scope.

This project provides a broad mapping of the empirical sources available for understanding AI’s economic impact. Rather than focusing narrowly on one country or one type of data, the goal is to catalog the full range of empirically-minded projects: adoption surveys, company-side usage data, AI benchmarks, economics-minded evaluations like exposure indices, experimental productivity studies, worker resilience measures such as expertise and adaptive capacity, labor market datasets, and indirect proxies. The aim is twofold: first, to give researchers a bird’s-eye view of what exists and how the pieces fit together; and second, to identify the gaps where better data collection would be most valuable. As a future goal, I plan to move toward data harmonization—linking sources across taxonomies to enable more systematic analysis, building on Manning and Aguirre (2026).

This site serves as the project’s companion directory: a searchable catalog of the datasets and papers identified in the research.

What empirical sources do we have?

The empirical landscape for studying AI’s economic impact is broad but fragmented. It spans at least seven categories of data, each providing visibility into a different dimension of the question. Adoption surveys tell us how widely AI is being used and by whom. Company-side usage logs reveal what people actually do with AI at the task level. Exposure indices project which occupations face the greatest potential disruption. Benchmarks track whether AI systems can perform economically valuable work. Controlled experiments provide causal estimates of productivity effects. Traditional labor market data allows researchers to detect whether AI adoption is showing up in employment, wages, and hiring patterns. And indirect proxies—job postings, patent filings, search trends—offer real-time coverage where surveys lag. Each source has blind spots, but in combination they trace where AI is entering the economy and at what cost.

Adoption surveys

Several organizations directly survey whether and how people and firms are using AI. On the government side, the US Census Bureau’s Business Trends and Outlook Survey (BTOS) surveys approximately 165,000 firms bi-weekly and reports that 5% of companies used AI in the past two weeks, rising to 20% when weighted by number of employees (Bonney et al., 2024). The Annual Business Survey (ABS) provides broader coverage across roughly 850,000 firms. In academia, the Real-Time Population Survey (Bick, Blandin, and Deming, 2025) tracks individual worker AI usage since June 2024 with a pooled sample of roughly 10,000 respondents—revealing, among other things, that workers report substantially higher AI usage rates than their employers do.

Independent organizations add further data points. Pew Research’s 2025 survey of 10,113 workers found that approximately 16% report AI exposure at work (Lin and Parker, 2025). McKinsey, Gallup, and Morning Consult conduct their own regular surveys, consistently documenting roughly twofold increases in reported AI usage from 2022 to 2024. While valuable, these surveys generally use their own categorization schemes rather than standard occupational taxonomies, making cross-survey comparison difficult. Nonetheless, adoption surveys remain the most direct way to measure how widely AI has diffused across workers and firms—and the persistent gap between worker-reported and employer-reported usage rates is itself an important finding, indicating that much AI adoption is happening informally and outside employer monitoring systems.

Company-side usage data

AI companies are in a unique position to observe how people actually use their products. Three major efforts provide direct, task-level usage data that can be mapped to occupations. Anthropic’s Economic Index (Handa et al., 2025) analyzes millions of Claude conversations, filtering for economically relevant activities and distinguishing between augmentative and automative AI use. OpenAI’s study (Chatterji et al., 2025) provides the first large-scale analysis of real ChatGPT usage, classifying conversations by task type and finding that non-work usage accounts for the majority of interactions. Microsoft’s Copilot research (Tomlinson et al., 2025) examines the occupational implications of generative AI through actual Bing Copilot usage patterns. These datasets are valuable because they capture revealed behavior—what people actually do with AI—rather than self-reported survey responses. They are also limited by representing only one platform’s user base.

Exposure measures

A distinct strand of research estimates which occupations or tasks are most exposed to AI by mapping AI capabilities to occupational task descriptions. Early efforts include Brynjolfsson et al. (2018), who applied a machine learning rubric to individual tasks in O*NET to assess their suitability for machine learning, and Webb (2019), who matched patent text to occupational task descriptions to identify jobs most exposed to AI-driven automation. Felten et al. (2021) took a different approach, linking measured AI progress on specific applications—such as image recognition or translation—to the occupational abilities those applications require. Eloundou et al. (2023), “GPTs are GPTs,” adapted this framework to large language models, combining human expert and GPT-4 assessments to classify occupational exposure—an effort that has become foundational for the subsequent literature on LLM-specific exposure. More recently, Hampole et al. (2025) extend these approaches by constructing a panel that tracks exposure over time as AI capabilities evolve, and the ILO has released a global occupational exposure index. For a review of these approaches and their relative merits, see Manning (2024). These measures serve as forward-looking indicators of where AI might have the largest effects and are widely used in empirical studies correlating exposure with labor market outcomes.

AI benchmarks and capability tracking

AI benchmarks have increasingly moved toward measuring economic productivity rather than abstract capabilities. Earlier benchmarks like MMLU test broad knowledge, but newer ones target real-world professional tasks. SWE-bench evaluates whether models can resolve actual GitHub issues from production codebases—a direct proxy for software engineering work. LegalBench and MedQA test professional-level reasoning in law and medicine. This trend has accelerated with purpose-built economic evaluations. APEX (the AI Productivity Index) evaluates frontier models on tasks requiring 1–8 hours of expert time across investment banking, management consulting, law, and primary medical care. OpenAI’s GDPval covers 1,320 tasks across 44 occupations in the nine largest sectors of US GDP, with tasks designed by industry professionals averaging 14 years of experience. These benchmarks directly measure whether models can produce work products that have market value.

Broader capability tracking efforts, including the AI Index and Epoch AI, provide complementary data on compute trends, model scaling, and the overall trajectory of AI development.

Experimental evidence

A growing body of controlled experiments measures the productivity effects of AI access in specific settings. These studies randomly assign AI tools to workers and measure the impact on output quality, speed, or both. Examples span consulting (Dell’Acqua et al., 2023), customer support (Brynjolfsson, Li, and Raymond, 2023), writing (Noy and Zhang, 2023), and software engineering (Peng et al., 2023), among others. A recurring finding across these studies is that AI access compresses the productivity distribution: lower-performing workers tend to gain the most, while top performers see smaller or sometimes negative effects when they over-rely on AI suggestions. Time savings on affected tasks can be substantial, and quality effects depend on the task and the user’s baseline expertise. The experimental literature provides the strongest causal evidence available on AI’s productivity effects, but each study covers a single occupation, firm, or task type, and it remains an open question how well these results generalize to broader populations and settings.

Labor market and economic data

Understanding AI’s economic impact also requires baseline data on labor markets themselves. Core labor force surveys—the Current Population Survey (CPS), American Community Survey (ACS), and Survey of Income and Program Participation (SIPP)—provide employment, wage, and demographic data at the occupational level. Firm surveys like the Job Openings and Labor Turnover Survey (JOLTS), the Quarterly Census of Employment and Wages (QCEW), and the Occupational Employment and Wage Statistics (OEWS) add hiring, separations, and wage data at the industry and occupation level. Educational data from IPEDS tracks enrollment and completion trends in AI-relevant fields. Researchers are also increasingly leveraging private-sector payroll data—from processors like ADP and Gusto—which offer high-frequency, granular views of hiring, separations, and wage changes that government surveys cannot match in timeliness. Brynjolfsson, Chandar, and Chen (2025), for example, use ADP payroll data covering millions of workers to document a 13% entry-level employment decline in AI-exposed occupations. Large-scale resume data offers a complementary view: Hosseini Maasoum and Lichtinger (2025) analyze 62 million resumes and find that junior employment falls 7.7% in firms adopting generative AI, with senior workers unaffected—a pattern they term “seniority-biased technological change.” These datasets do not directly measure AI usage, but they are essential for linking adoption and exposure patterns to actual labor market outcomes. For a broader review of how the freelancing and platform economy literature has documented early shifts in demand following the release of generative AI tools, see Teutloff et al. (2025).

Proxies and supplemental sources

Several indirect measures help triangulate AI adoption trends. Job posting platforms—Burning Glass/Lightcast, Indeed—track AI skill demands and hiring patterns in real time across millions of postings. LinkedIn’s Economic Graph provides a complementary view, capturing not only job postings but also shifts in the skills that workers list on their profiles, hiring flows between firms and industries, and the diffusion of AI-related skills across occupations and geographies. Google Trends data captures search interest in AI-related terms. Patent data from the USPTO includes AI and machine learning classifications for innovation tracking. H-1B visa applications reveal concentrated demand for AI talent. Occupational information systems like O*NET, which catalogs skills, work activities, and task descriptions for the US workforce, and skills taxonomies like ESCO for Europe, provide the infrastructure that many exposure measures and adoption studies rely on.

Supplemental occupation-level measures

Beyond AI exposure and adoption, several occupation-level measures help characterize how workers might be affected by AI-driven change. Task routineness indices—building on Autor, Levy, and Murnane (2003)—classify occupations by the degree to which their tasks follow predictable, codifiable patterns, which has long been a predictor of automation vulnerability. Experience and tenure data from surveys like the CPS and SIPP capture how long workers have been in their current roles, which may shape their ability to transition. Adaptive capacity measures, such as those developed in Manning and Aguirre (2026), combine skill transferability, wealth, and demographic factors to estimate how well positioned workers in different occupations are to adjust to displacement. These measures complement exposure indices by shifting the question from “which jobs are most exposed?” to “which workers are least equipped to adapt?”—a distinction that matters for policy design.

What is missing? Recommendations for better data collection

Despite this growing ecosystem of data sources, the current landscape has significant gaps. Several stand out. There is no systematic panel tracking AI adoption at the occupation level over time using standard occupational codes, making it difficult to construct reliable trend lines. Displacement is inherently hard to observe: workers who lose jobs do not appear in employer surveys, and attrition, hiring freezes, and role restructuring are difficult to distinguish from normal turnover in aggregate data. Task-level usage data—the most granular source available—comes from only a handful of platforms, each with its own user base and classification methodology. International coverage remains sparse; most high-quality data comes from the United States, and it is unclear how patterns generalize to countries with different labor market institutions. The link between forward-looking exposure predictions and observed labor market outcomes has received surprisingly little empirical validation. And firm-level adoption data and worker-level outcome data are rarely linked, making it difficult to trace the causal chain from AI deployment to changes in employment, wages, or task composition within specific workplaces.

The recommendations below are organized by stakeholder group and aim to be concrete—proposing specific survey questions, reporting formats, or coordination mechanisms—rather than general calls for “more data.”

For governments

Strategic coordination. In the US, efforts could be coordinated through the Department of Labor’s AI Workforce Research Hub, aligning with the AI Action Plan’s goals. Following the precedent set by personal computers and internet adoption, AI questions could be added to baseline surveys with extensive coverage (ACS, SIPP) and more frequent surveys (CPS). Internationally, the OECD and ILO could help develop common survey modules that allow cross-country comparison.

Priority questions. The CPS Monthly is perhaps the most important venue in the US. Even within its narrow scope constraints, 1–2 core tracking questions would be highly valuable. A basic usage question—“In the past month, did you use AI tools for work tasks?”—paired with an impact question—“Has AI use changed your job responsibilities?”—would provide a monthly national pulse on AI in the workforce. Annual supplements could go deeper: mapping AI usage to specific occupational activities, asking about productivity perceptions, and tracking spending on AI subscriptions.

Frequency over detail. For a fast-moving technology like AI, frequency of measurement matters more than survey depth. Monthly data from even simple questions is more useful for tracking rapid adoption changes than detailed annual supplements.

For AI labs

AI labs are in a unique position to understand how people use AI at the task level and to communicate capability advancements to external evaluators. Several practices would increase the value of their data contributions. Providing access to aggregated data at task and occupation levels—aiming for the least aggregated yet privacy-preserving format—maximizes downstream utility. Transparency about methodological choices matters: how conversations are classified, how “augmentative” versus “automative” use is defined, and how sensitive results are to these operationalizations. Automated periodic data releases would enable consistent monitoring and trend-building over time. Finally, transparency about the universe of deployment—such as reporting the percentage of inference tokens analyzed over the total—would help researchers assess representativeness.

For independent surveys and researchers

The most impactful improvement for independent surveys would be reporting data using standard occupational and industry codes—SOC, NAICS, or ISCO—or mapping results to O*NET skill categories. Even aggregated, broad-level mappings are better than creating entirely new taxonomies. Cross-survey methodological consistency matters for building time series. To the extent possible, survey sponsors could publicly communicate their intent to continue running surveys periodically; one-time surveys have limited value for tracking a fast-changing phenomenon.

For the experimental literature, replication and extension across occupations would be particularly valuable. The current evidence is concentrated in a handful of settings—consulting, customer support, writing, coding—and broader coverage would help establish which findings generalize.

Looking ahead

Despite growing concerns about AI’s economic implications and an expanding research literature, the empirical picture remains incomplete. Current evidence points to relatively modest aggregate effects, though this muted picture may mask heterogeneous impacts across experience levels and occupations. Several factors suggest the landscape could shift quickly: AI capabilities are improving rapidly, adoption rates are accelerating, and the lag between technology deployment and observable labor market effects may mean that displacement dynamics are not yet fully visible.

There is, however, meaningful progress researchers can make with existing data. Cross-referencing adoption surveys with exposure indices can reveal whether the occupations predicted to be most affected are actually the ones where workers report the highest AI usage. Linking company-side usage data to occupational taxonomies allows researchers to compare revealed usage patterns against theoretical predictions. Combining experimental estimates of task-level productivity gains with labor market data on employment trends can help distinguish whether AI is augmenting workers, displacing them, or both. And tracking adoption patterns over time across multiple surveys—even imperfectly—can begin to establish whether we are in a period of gradual diffusion or accelerating transformation. This project aims to map the full landscape of available sources so that researchers can identify where evidence converges, where it conflicts, and where it is simply absent.

As a next step, I plan to move toward data harmonization: linking sources across taxonomies (SOC, NAICS, O*NET, ISCO) to enable more systematic cross-source analysis. This companion site—which catalogs the datasets and papers identified in the research—will be updated as new sources become available. Suggestions for additional datasets or papers can be submitted through the site’s submission form.

How to Cite

If you find this mapping useful, please cite it as:

APA

Aguirre, T., & Manning, S. (2026). The empirical economics of AI: What’s the state of the art of data collection? Centre for the Governance of AI. Retrieved from https://empirical-economics-of-ai.vercel.app

BibTeX

@misc{aguirremanning2026empirical,
  author    = {Aguirre, Tom\'as and Manning, Sam},
  title     = {The Empirical Economics of {AI}:
               What's the State of the Art
               of Data Collection?},
  year      = {2026},
  note      = {Centre for the Governance of AI},
  url       = {https://empirical-economics-of-ai.vercel.app}
}

References

Autor, D. H., Levy, F., & Murnane, R. J. (2003). “The skill content of recent technological change: An empirical exploration.” Quarterly Journal of Economics, 118(4), 1279–1333.
Bick, A., Blandin, A., & Deming, D. J. (2025). “The rapid adoption of generative AI.” NBER Working Paper No. 32966.
Bonney, K. et al. (2024). “Tracking firm use of AI in real time.” US Census Bureau.
Brynjolfsson, E., Chandar, B., & Chen, J. (2025). “Canaries in the coal mine: Six facts about the recent employment effects of AI.” Stanford Digital Economy Lab.
Brynjolfsson, E., Li, D., & Raymond, L. R. (2023). “Generative AI at work.” NBER Working Paper No. 31161.
Brynjolfsson, E., Mitchell, T., & Rock, D. (2018). “What can machines learn, and what does it mean for occupations and the economy?” AEA Papers and Proceedings, 108, 43–47.
Chandar, B. (2025). “Tracking employment changes in AI-exposed jobs.” Stanford University working paper.
Chatterji, A., Cunningham, T., Deming, D. J., Hitzig, Z., Ong, C., Shan, C. Y., & Wadman, K. (2025). “How people use ChatGPT.” NBER Working Paper No. 34255.
Crane, L., Green, M., & Soto, P. (2025). “Measuring AI uptake in the workplace.” FEDS Notes, Board of Governors of the Federal Reserve System.
Dell’Acqua, F. et al. (2023). “Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality.” Harvard Business School Working Paper No. 24-013.
Eloundou, T., Manning, S., Mishkin, P., & Rock, D. (2023). “GPTs are GPTs: An early look at the labor market impact potential of large language models.” arXiv:2303.10130.
Felten, E., Raj, M., & Seamans, R. (2021). “Occupational, industry, and geographic exposure to artificial intelligence.” Strategic Management Journal, 42(12), 2195–2217.
Hampole, M., Papanikolaou, D., Schmidt, L., & Seegmiller, B. (2025). “Artificial intelligence and the labor market.” NBER Working Paper No. 33509.
Handa, K., Tamkin, A., McCain, M. et al. (2025). “Which economic tasks are performed with AI? Evidence from millions of Claude conversations.” arXiv:2503.04761.
Hatz, L. et al. (2025). “Public perceptions of AI and employment.” Working paper.
Hosseini Maasoum, A., & Lichtinger, J. (2025). “Generative AI as seniority-biased technological change: Evidence from U.S. resume and job posting data.” SSRN.
Humlum, A., & Vestergaard, E. (2025). “Large language models, small labor market effects.” Working paper.
Lin, L., & Parker, K. (2025). “U.S. workers are more worried than hopeful about future AI use in the workplace.” Pew Research Center.
Manning, S. (2024). “Predicting AI’s impact on work.” Centre for the Governance of AI.
Manning, S., & Aguirre, T. (2026). “How adaptable are American workers to AI-induced job displacement?” NBER Working Paper No. 34705.
Noy, S., & Zhang, W. (2023). “Experimental evidence on the productivity effects of generative artificial intelligence.” Science, 381(6654), 187–192.
Peng, S. et al. (2023). “The impact of AI on developer productivity: Evidence from GitHub Copilot.” arXiv:2302.06590.
Teutloff, J., Giering, O., Kirchner, M., & Maertens, A. (2025). “Winners and losers of generative AI: Early evidence of shifts in freelancer demand.” Journal of Economic Behavior & Organization, 106845.
Tomlinson, K. et al. (2025). “Working with AI: Measuring the occupational implications of generative AI.” arXiv:2507.07935.
Webb, M. (2019). “The impact of artificial intelligence on the labor market.” Stanford University working paper.

Project by Tomás Aguirre, mentored by Sam Manning. Centre for the Governance of AI.