The Impact of Large Language Models on Open-Source Innovation: Evidence from GitHub Copilot
Yeverechyahu, Mayya, Oestreicher-Singer
2024Working paper7 citations
Observational labor marketInterdisciplinaryCausal
LLM / Generative AISoftware / codingAugmentation vs. substitutionHuman-AI collaboration
SummaryYeverechyahu, Mayya, and Oestreicher-Singer exploit GitHub Copilot's selective programming language support at launch using difference-in-differences to study how LLMs affect open-source innovation volume and type across Python/Rust (treated) versus R/Haskell (control) packages from October 2019 to December 2022.
Main FindingGitHub Copilot increases overall open-source contributions by 37% for Python (vs. R) and 54% for Rust (vs. Haskell), with disproportionately larger effects on iterative innovation (maintenance commits) compared to capability innovation (new feature development), particularly in high-activity projects with rich contextual information.
Primary Datasets
GitHub API data on commits and repository activity for Python, R, Rust, and Haskell packages; PyPI, CRAN, Hackage, and Crates.io for version release data
- Key Methods
- Difference-in-differences with propensity score matching, comparing programming languages supported by GitHub Copilot (Python, Rust) versus unsupported languages (R, Haskell); synthetic difference-in-differences as robustness check; LLM-based classification of commit types
- Sample Period
- 2019-2022
- Geographic Coverage
- Global (GitHub open-source projects)
- Sample Size
- Over 1.1 million commits across 1,187 matched Python/R packages and 1,373 matched Rust/Haskell packages from October 2019 to December 2022
- Level of Analysis
- Task, Firm
- Occupation Classification
- None
- Industry Classification
- None
NotesarXiv:2409.08379
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group
[Claude classification]: Uses LLMs (GPT-4o) as methodological tool to classify commit types; natural experiment exploits selective language support of GitHub Copilot at launch (October 2021); distinguishes between capability innovation (new features) and iterative innovation (maintenance/refinement); study period ends December 2022 before ChatGPT affected control group