This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.
Back to papersSummary Main Finding Notes SSRN Electronic Journal
[Claude classification]: Conceptual/theoretical paper presenting a multilayer network framework for human-AI collective intelligence. Reviews real-world applications using the Supermind Design database (938 cases across 12 application areas). Integrates perspectives from complex systems theory, network science, and multiple disciplines. Does not conduct original empirical analysis.
[Claude classification]: Conceptual/theoretical paper presenting a multilayer network framework for human-AI collective intelligence. Reviews real-world applications using the Supermind Design database (938 cases across 12 application areas). Integrates perspectives from complex systems theory, network science, and multiple disciplines. Does not conduct original empirical analysis.
[Claude classification]: Conceptual/theoretical paper presenting a multilayer network framework for human-AI collective intelligence. Reviews real-world applications using the Supermind Design database (938 cases across 12 application areas). Integrates perspectives from complex systems theory, network science, and multiple disciplines. Does not conduct original empirical analysis.
[Claude classification]: Conceptual/theoretical paper presenting a multilayer network framework for human-AI collective intelligence. Reviews real-world applications using the Supermind Design database (938 cases across 12 application areas). Integrates perspectives from complex systems theory, network science, and multiple disciplines. Does not conduct original empirical analysis.
[Claude classification]: Conceptual/theoretical paper presenting a multilayer network framework for human-AI collective intelligence. Reviews real-world applications using the Supermind Design database (938 cases across 12 application areas). Integrates perspectives from complex systems theory, network science, and multiple disciplines. Does not conduct original empirical analysis.
[Claude classification]: Conceptual/theoretical paper presenting a multilayer network framework for human-AI collective intelligence. Reviews real-world applications using the Supermind Design database (938 cases across 12 application areas). Integrates perspectives from complex systems theory, network science, and multiple disciplines. Does not conduct original empirical analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
[Claude classification]: Post-registered as AEARCTR-0014530. Three separate field experiments with different designs: Microsoft (8 months, 50% treatment), Accenture (4 months, 61% treatment), Anonymous Company (2 months staggered rollout). Imperfect compliance required IV approach. Statistical power challenges due to large outcome variance and high fraction of zero-output weeks. Weighted IV estimator places more weight on periods with larger treatment-control adoption differences. Additional abandoned Accenture experiment discussed in appendix (42% layoff, missing usage data). Outcomes: pull requests (primary), commits, builds, build success rate. Microsoft data includes tenure and seniority allowing heterogeneity analysis.
The Effects of Generative AI on High-Skilled Work: Evidence from Three Field Experiments with Software Developers
Cui, Demirer, Jaffe, Musolff, Peng, Salz
2024Working paper43 citations
Experimental evidenceCausal
LLM / Generative AISoftware / codingJunior / entry-levelHuman-AI collaborationAugmentation vs. substitution
Cui et al. analyze three randomized controlled trials at Microsoft, Accenture, and an anonymous Fortune 100 company involving 4,867 software developers to study the causal effect of access to GitHub Copilot (an AI coding assistant) on developer productivity in real workplace settings
Using GitHub Copilot causes a 26.08% (SE: 10.3%) increase in weekly completed tasks among software developers, with significantly larger gains for less experienced developers (shorter tenure and more junior positions) who also had higher adoption rates
Primary Datasets
Supermind Design Augmented Collective Intelligence Database (938 cases)
Secondary Datasets
None
- Key Methods
- Randomized controlled trials (field experiments) with instrumental variable regression using experimental assignment to instrument for actual Copilot usage; weighted IV approach that weights periods by treatment-control adoption differences; developer and week fixed effects
- Sample Period
- Not applicable - review paper
- Geographic Coverage
- Not applicable - theoretical review
- Sample Size
- 4,867 software developers across three experiments (Microsoft: 1,521; Accenture: 316; Anonymous Company: 3,030); developer-week observations
- Level of Analysis
- Individual
- Occupation Classification
- None
- Industry Classification
- None
- Replication Package
- Partial