This site is a work in progress and has not been widely shared. Content may contain errors. Feedback is welcome.
This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.
Back to papers

GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models

Eloundou, Manning, Mishkin, Rock

2023Science532 citations
Exposure / measurementInterdisciplinary
LLM / Generative AIAI ExposureWriting / contentSoftware / coding
Abstract

We investigate the potential implications of large language models (LLMs), such as Generative Pre-trained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from LLM-powered software compared to LLMs on their own. Using a new rubric, we assess occupations based on their alignment with LLM capabilities, integrating both human expertise and GPT-4 classifications. Our findings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. We do not make predictions about the development or adoption timeline of such LLMs. The projected effects span all wage levels, with higher-income jobs potentially facing greater exposure to LLM capabilities and LLM-powered software. Significantly, these impacts are not restricted to industries with higher recent productivity growth. Our analysis suggests that, with access to an LLM, about 15% of all worker tasks in the US could be completed significantly faster at the same level of quality. When incorporating software and tooling built on top of LLMs, this share increases to between 47 and 56% of all tasks. This finding implies that LLM-powered software will have a substantial effect on scaling the economic impacts of the underlying models. We conclude that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications.

Summary

Eloundou et al. develop a novel task-level exposure rubric and apply it to O*NET occupational data using both human expert annotation and GPT-4 classification to measure the potential labor market impact of large language models and LLM-powered software in the US economy

Main Finding

Approximately 80% of US workers have at least 10% of their tasks exposed to LLMs, while 19% of workers have at least 50% of tasks exposed when considering LLM-powered software; higher-wage occupations face greater exposure (contrary to prior ML exposure measures); programming and writing skills positively associated with exposure while science and critical thinking negatively associated; 28-40% of variance in LLM exposure unexplained by prior technology exposure measures

Primary Datasets

O*NET task descriptions; BLS OEWS

Secondary Datasets

GPT-4 annotations

Key Methods
Novel exposure rubric applied to O*NET tasks and Detailed Work Activities; human annotation by OpenAI alignment team members; GPT-4 self-annotation with prompt engineering; task-level scores aggregated to occupations using core/supplemental weighting; OLS regressions on skill importance and prior exposure measures; comparison with prior automation/AI exposure metrics
Sample Period
2023
Geographic Coverage
US
Sample Size
1,016 occupations covering approximately 154.2M US workers; 19,265 tasks and 2,087 Detailed Work Activities from O*NET
Level of Analysis
Task, Occupation, Industry
Occupation Classification
O*NET-SOC
Industry Classification
NAICS (4-digit)
Notes
Published in Science (2024); originally arXiv:2303.10130 [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease. [Claude classification]: Published in Science (2024) as doi:10.1126/science.adj0817; originally arXiv:2303.10130 (Aug 2023). Foundational LLM exposure measure widely cited. Three variants: α (direct LLM only), β (E1 + 0.5*E2, with complementary software), ζ (E1 + E2, upper bound). GPT-4 and human annotators show ~80.8% agreement on α. R² of 60-73% vs. prior exposure measures (Webb, Felten, SML, Frey-Osborne), with 28-40% unexplained variance unique to LLM exposure. Authors from OpenAI and University of Pennsylvania. Paper argues LLMs meet criteria for general-purpose technology status. Does not make predictions about adoption timeline or actual labor market outcomes, only technical feasibility of task exposure. Image capabilities (E3) coded separately but combined with E2 for analysis. Includes comparison with productivity growth data showing weak correlation between recent productivity gains and LLM exposure, suggesting potential to avoid exacerbating cost disease.