This site is a work in progress and has not been widely shared. Content may contain errors. Feedback is welcome.
This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.
Back to papers

Generative AI Enhances Team Performance and Reduces Need for Traditional Teams

Li, Zhou, Mikel-Hong

2024Tsinghua University Working Paper9 citations
Experimental evidenceManagement / Organizational BehaviorCausal
LLM / Generative AIHuman-AI collaborationDecision-makingCollective intelligence / teamsAugmentation vs. substitutionWriting / content
Summary

Li, Zhou, and Mikel-Hong conduct a preregistered randomized controlled experiment with 435 participants in 122 teams and 139 individuals to examine how generative AI (ChatGPT 4.0) integration affects team performance across professional content generation and strategy development tasks.

Main Finding

AI-assisted teams outperformed human-only teams across quality, novelty, and usefulness (explaining 2-4% additional variance), but teams with multiple AIs showed no advantage over single-AI teams; individual-AI pairs matched human-only team performance when given equal time but still underperformed AI-assisted teams, suggesting collaborative dynamics remain important despite AI's augmentation potential.

Primary Datasets

Experimental data collected via Prolific platform; IMDB-WIKI facial image dataset used for stimuli

Secondary Datasets

Raven's Progressive Matrices test (14 questions for IQ measurement)

Key Methods
Preregistered randomized controlled experiment with 435 participants assigned to 122 teams (human-only, single-AI, or multiple-AI conditions) performing two professional tasks; second experiment with 139 individual-AI pairs; OLS regression analysis with team-level controls, Bayesian regression for null effects, and interaction analysis of human-AI engagement patterns
Sample Period
2024
Geographic Coverage
US (Prolific participants restricted to US residents)
Sample Size
Experiment I: 435 participants in 122 teams completing 2 tasks (244 team-task observations); Experiment II: 139 individuals completing 2 tasks (278 individual-task observations)
Level of Analysis
Individual, Firm
Occupation Classification
None
Industry Classification
None
Replication Package
Partial
Notes
Tsinghua University Working Paper 2405.17924 [Claude classification]: Uses ORIV (Obviously Related Instrumental Variables) methodology in robustness checks to address measurement error. Preregistered study. Task is age classification from photographs using IMDB-WIKI dataset. AI predictions come from Caffe deep learning model. Incentivized using binarized scoring rule. [Claude classification]: Uses ORIV (Obviously Related Instrumental Variables) methodology in robustness checks to address measurement error. Preregistered study. Task is age classification from photographs using IMDB-WIKI dataset. AI predictions come from Caffe deep learning model. Incentivized using binarized scoring rule. [Claude classification]: Uses ORIV (Obviously Related Instrumental Variables) methodology in robustness checks to address measurement error. Preregistered study. Task is age classification from photographs using IMDB-WIKI dataset. AI predictions come from Caffe deep learning model. Incentivized using binarized scoring rule. [Claude classification]: Uses ORIV (Obviously Related Instrumental Variables) methodology in robustness checks to address measurement error. Preregistered study. Task is age classification from photographs using IMDB-WIKI dataset. AI predictions come from Caffe deep learning model. Incentivized using binarized scoring rule. [Claude classification]: Uses ORIV (Obviously Related Instrumental Variables) methodology in robustness checks to address measurement error. Preregistered study. Task is age classification from photographs using IMDB-WIKI dataset. AI predictions come from Caffe deep learning model. Incentivized using binarized scoring rule. [Claude classification]: Uses ORIV (Obviously Related Instrumental Variables) methodology in robustness checks to address measurement error. Preregistered study. Task is age classification from photographs using IMDB-WIKI dataset. AI predictions come from Caffe deep learning model. Incentivized using binarized scoring rule. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration.