This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.

Generative AI Enhances Team Performance and Reduces Need for Traditional Teams

Li, Zhou, Mikel-Hong

2024Tsinghua University Working Paper9 citations

Experimental evidenceManagement / Organizational BehaviorCausal

LLM / Generative AIHuman-AI collaborationDecision-makingCollective intelligence / teamsAugmentation vs. substitutionWriting / content

View Repository DOI: 10.2139/ssrn.4844976

Summary

Li, Zhou, and Mikel-Hong conduct a preregistered randomized controlled experiment with 435 participants in 122 teams and 139 individuals to examine how generative AI (ChatGPT 4.0) integration affects team performance across professional content generation and strategy development tasks.

Main Finding

AI-assisted teams outperformed human-only teams across quality, novelty, and usefulness (explaining 2-4% additional variance), but teams with multiple AIs showed no advantage over single-AI teams; individual-AI pairs matched human-only team performance when given equal time but still underperformed AI-assisted teams, suggesting collaborative dynamics remain important despite AI's augmentation potential.

Primary Datasets

Experimental data collected via Prolific platform; IMDB-WIKI facial image dataset used for stimuli

Secondary Datasets

Raven's Progressive Matrices test (14 questions for IQ measurement)

Key Methods: Preregistered randomized controlled experiment with 435 participants assigned to 122 teams (human-only, single-AI, or multiple-AI conditions) performing two professional tasks; second experiment with 139 individual-AI pairs; OLS regression analysis with team-level controls, Bayesian regression for null effects, and interaction analysis of human-AI engagement patterns
Sample Period: 2024
Geographic Coverage: US (Prolific participants restricted to US residents)
Sample Size: Experiment I: 435 participants in 122 teams completing 2 tasks (244 team-task observations); Experiment II: 139 individuals completing 2 tasks (278 individual-task observations)
Level of Analysis: Individual, Firm
Occupation Classification: None
Industry Classification: None
Replication Package: Partial

Notes

Tsinghua University Working Paper 2405.17924 [Claude classification]: Uses ORIV (Obviously Related Instrumental Variables) methodology in robustness checks to address measurement error. Preregistered study. Task is age classification from photographs using IMDB-WIKI dataset. AI predictions come from Caffe deep learning model. Incentivized using binarized scoring rule. [Claude classification]: Uses ORIV (Obviously Related Instrumental Variables) methodology in robustness checks to address measurement error. Preregistered study. Task is age classification from photographs using IMDB-WIKI dataset. AI predictions come from Caffe deep learning model. Incentivized using binarized scoring rule. [Claude classification]: Uses ORIV (Obviously Related Instrumental Variables) methodology in robustness checks to address measurement error. Preregistered study. Task is age classification from photographs using IMDB-WIKI dataset. AI predictions come from Caffe deep learning model. Incentivized using binarized scoring rule. [Claude classification]: Uses ORIV (Obviously Related Instrumental Variables) methodology in robustness checks to address measurement error. Preregistered study. Task is age classification from photographs using IMDB-WIKI dataset. AI predictions come from Caffe deep learning model. Incentivized using binarized scoring rule. [Claude classification]: Uses ORIV (Obviously Related Instrumental Variables) methodology in robustness checks to address measurement error. Preregistered study. Task is age classification from photographs using IMDB-WIKI dataset. AI predictions come from Caffe deep learning model. Incentivized using binarized scoring rule. [Claude classification]: Uses ORIV (Obviously Related Instrumental Variables) methodology in robustness checks to address measurement error. Preregistered study. Task is age classification from photographs using IMDB-WIKI dataset. AI predictions come from Caffe deep learning model. Incentivized using binarized scoring rule. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration. [Claude classification]: Preregistered study (OSF: 5su8c). Uses GPT-4.0 API to assess quality of human input to AI. Human judges blind to conditions rated outputs on quality, novelty, and usefulness (Cronbach's alpha 0.68-0.77). Coarsened Exact Matching used for robustness checks. Key finding: centralized AI usage (one or few team members engaging deeply) more effective than distributed engagement in multiple-AI teams. Teams with higher IQ, familiarity, and size benefited more from multiple AIs. AI integration improved team potency and satisfaction but not coordination or information elaboration.