This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.

Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality

Dell’Acqua, McFowland, Mollick, Lifshitz‐Assaf, Kellogg, Rajendran, Krayer, Candelon, Lakhani

2023Harvard Business School Working Paper Series599 citations

Experimental evidenceInterdisciplinaryCausal

LLM / Generative AIWriting / contentCreative workHuman-AI collaborationAugmentation vs. substitutionTraining / upskilling

View Repository DOI: 10.2139/ssrn.4573321

Summary

Dell'Acqua and colleagues conduct a pre-registered field experiment with 758 Boston Consulting Group consultants randomly assigned to use GPT-4 or not, testing performance on realistic consulting tasks both within and outside the AI capability frontier to understand how generative AI affects knowledge worker productivity and quality

Main Finding

For tasks within the AI capability frontier, consultants using GPT-4 completed 12.2% more tasks, worked 25.1% faster, and produced 40%+ higher quality work, with bottom-half performers improving 43% versus 17% for top-half performers; however, for tasks outside the frontier, AI users were 19 percentage points less likely to reach correct solutions despite producing higher quality recommendations

Primary Datasets

Dell'Acqua BCG Consultants

AI-focused

Boston Consulting Group experimental data (758 consultants, approximately 7% of global individual contributor consultants); proprietary task completion data, AI interaction logs, human and GPT-4 quality evaluations

Secondary Datasets

Psychological assessments (Big 5 personality, innovativeness, creativity, paradox mindset); demographic and tenure data; GPT-4 interaction logs (all prompts and responses)

Key Methods: Pre-registered randomized field experiment with three conditions (no AI, GPT-4 access, GPT-4 plus training) testing 18 tasks inside the AI frontier and 1 task outside; human and GPT-4 evaluation of outputs; analysis of prompting behaviors and AI interaction patterns
Sample Period: 2023
Geographic Coverage: International
Sample Size: 758 consultants completing experimental tasks; 385 in inside-frontier experiment (creative product innovation), 373 in outside-frontier experiment (business problem-solving)
Level of Analysis: Individual, Task
Occupation Classification: None
Industry Classification: None

Notes

Harvard Business School Working Paper No. 24-013 [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement. [Claude classification]: This is a landmark field experiment on LLM effects on high-skill knowledge work. The paper introduces the concept of a 'jagged technological frontier' where AI capabilities are uneven. It identifies two distinctive patterns of human-AI integration: 'Centaur' behavior (strategic division of labor between human and AI) and 'Cyborg' behavior (complete integration of workflows). The experiment used actual BCG consultants (7% of individual contributors globally, n=758) performing realistic job tasks. The paper also documents reduced idea diversity with AI use (measured via semantic similarity of outputs). Used GPT-4 both as the experimental treatment AND as an evaluator of outputs. Participants received office recognition and career implications for performance, ensuring genuine engagement.