This site is a work in progress and has not been widely shared. Content may contain errors. Feedback is welcome.
This site is undergoing review. Some annotations were human-generated, some AI-generated — all are being verified.
Back to papers

Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?

Horton

2023NBER Working Paper 31122234 citations
AI capability / benchmarking
LLM / Generative AIHuman-AI collaborationDecision-makingAugmentation vs. substitution
Abstract

Newly-developed large language models (LLM)-because of how they are trained and designed -are implicit computational models of humans-a homo silicus.LLMs can be used like economists use homo economicus: they can be given endowments, information, preferences, and so on, and then their behavior can be explored in scenarios via simulation.Experiments using this approach, derived from Charness and Rabin (2002), Kahneman, Knetsch andThaler (1986), andSamuelson andZeckhauser (1988) show qualitatively similar results to the original, but it is also easy to try variations for fresh insights.LLMs could allow researchers to pilot studies via simulation first, searching for novel social science insights to test in the real world.

Summary

Horton uses computational simulations with GPT-3 to demonstrate that large language models can qualitatively replicate findings from classic behavioral economics experiments (dictator games, fairness judgments, status quo bias, and labor substitution), proposing LLMs as "homo silicus" agents for piloting studies.

Main Finding

GPT-3 text-davinci-003 successfully replicates qualitative patterns from classic experiments: it exhibits social preferences in dictator games when appropriately endowed, shows political variation in fairness judgments (82% finding price gouging unfair matches original), displays status quo bias in budget allocation, and demonstrates labor-labor substitution under minimum wages.

Primary Datasets

GPT-3 API responses (text-davinci-003, text-ada-001, text-babbage-001, text-currie-001)

Secondary Datasets

None

Key Methods
Computational simulation with GPT-3 API calls; agents endowed with different preferences, political views, and beliefs; systematic variation of prompts and scenarios; comparison of AI responses to original human experimental results
Sample Period
2023
Geographic Coverage
Not applicable (computational simulation)
Sample Size
Varies by experiment: 500 observations (100 agents × 5 scenarios) for status quo bias; 360 observations for minimum wage simulation; multiple API calls per scenario across experiments
Level of Analysis
Individual
Occupation Classification
None
Industry Classification
None
Replication Package
Yes
Notes
NBER WP 31122; published in EC'24. Demonstrates LLMs can replicate classic economic experiments (endowment effects, status quo bias, fairness norms), proposing 'homo silicus' as a complement to homo economicus for piloting studies via simulation. [Claude classification]: Published at EC'24 (ACM Conference on Economics and Computation). This is a methodological/conceptual paper proposing LLMs as 'homo silicus' - computational models of humans that can be used to pilot studies via simulation. The experiments are computational simulations using GPT-3, not experiments with human subjects. Only the most advanced GPT-3 model (text-davinci-003) successfully changes behavior based on endowed preferences; earlier models fail this test. The paper demonstrates qualitative replication of classic experiments but emphasizes that results from AI experiments require empirical confirmation with real humans. Cost: approximately $50 total for all experiments. Regression used only in minimum wage simulation (Table 1) to show effects on hired worker characteristics. [Claude classification]: Published at EC'24 (ACM Conference on Economics and Computation). This is a methodological/conceptual paper proposing LLMs as 'homo silicus' - computational models of humans that can be used to pilot studies via simulation. The experiments are computational simulations using GPT-3, not experiments with human subjects. Only the most advanced GPT-3 model (text-davinci-003) successfully changes behavior based on endowed preferences; earlier models fail this test. The paper demonstrates qualitative replication of classic experiments but emphasizes that results from AI experiments require empirical confirmation with real humans. Cost: approximately $50 total for all experiments. Regression used only in minimum wage simulation (Table 1) to show effects on hired worker characteristics. [Claude classification]: Published at EC'24 (ACM Conference on Economics and Computation). This is a methodological/conceptual paper proposing LLMs as 'homo silicus' - computational models of humans that can be used to pilot studies via simulation. The experiments are computational simulations using GPT-3, not experiments with human subjects. Only the most advanced GPT-3 model (text-davinci-003) successfully changes behavior based on endowed preferences; earlier models fail this test. The paper demonstrates qualitative replication of classic experiments but emphasizes that results from AI experiments require empirical confirmation with real humans. Cost: approximately $50 total for all experiments. Regression used only in minimum wage simulation (Table 1) to show effects on hired worker characteristics. [Claude classification]: Published at EC'24 (ACM Conference on Economics and Computation). This is a methodological/conceptual paper proposing LLMs as 'homo silicus' - computational models of humans that can be used to pilot studies via simulation. The experiments are computational simulations using GPT-3, not experiments with human subjects. Only the most advanced GPT-3 model (text-davinci-003) successfully changes behavior based on endowed preferences; earlier models fail this test. The paper demonstrates qualitative replication of classic experiments but emphasizes that results from AI experiments require empirical confirmation with real humans. Cost: approximately $50 total for all experiments. Regression used only in minimum wage simulation (Table 1) to show effects on hired worker characteristics. [Claude classification]: Published at EC'24 (ACM Conference on Economics and Computation). This is a methodological/conceptual paper proposing LLMs as 'homo silicus' - computational models of humans that can be used to pilot studies via simulation. The experiments are computational simulations using GPT-3, not experiments with human subjects. Only the most advanced GPT-3 model (text-davinci-003) successfully changes behavior based on endowed preferences; earlier models fail this test. The paper demonstrates qualitative replication of classic experiments but emphasizes that results from AI experiments require empirical confirmation with real humans. Cost: approximately $50 total for all experiments. Regression used only in minimum wage simulation (Table 1) to show effects on hired worker characteristics. [Claude classification]: Published at EC'24 (ACM Conference on Economics and Computation). This is a methodological/conceptual paper proposing LLMs as 'homo silicus' - computational models of humans that can be used to pilot studies via simulation. The experiments are computational simulations using GPT-3, not experiments with human subjects. Only the most advanced GPT-3 model (text-davinci-003) successfully changes behavior based on endowed preferences; earlier models fail this test. The paper demonstrates qualitative replication of classic experiments but emphasizes that results from AI experiments require empirical confirmation with real humans. Cost: approximately $50 total for all experiments. Regression used only in minimum wage simulation (Table 1) to show effects on hired worker characteristics. [Claude classification]: Published at EC'24 (ACM Conference on Economics and Computation). This is a methodological/conceptual paper proposing LLMs as 'homo silicus' - computational models of humans that can be used to pilot studies via simulation. The experiments are computational simulations using GPT-3, not experiments with human subjects. Only the most advanced GPT-3 model (text-davinci-003) successfully changes behavior based on endowed preferences; earlier models fail this test. The paper demonstrates qualitative replication of classic experiments but emphasizes that results from AI experiments require empirical confirmation with real humans. Cost: approximately $50 total for all experiments. Regression used only in minimum wage simulation (Table 1) to show effects on hired worker characteristics. [Claude classification]: Published at EC'24 (ACM Conference on Economics and Computation). This is a methodological/conceptual paper proposing LLMs as 'homo silicus' - computational models of humans that can be used to pilot studies via simulation. The experiments are computational simulations using GPT-3, not experiments with human subjects. Only the most advanced GPT-3 model (text-davinci-003) successfully changes behavior based on endowed preferences; earlier models fail this test. The paper demonstrates qualitative replication of classic experiments but emphasizes that results from AI experiments require empirical confirmation with real humans. Cost: approximately $50 total for all experiments. Regression used only in minimum wage simulation (Table 1) to show effects on hired worker characteristics. [Claude classification]: Published at EC'24 (ACM Conference on Economics and Computation). This is a methodological/conceptual paper proposing LLMs as 'homo silicus' - computational models of humans that can be used to pilot studies via simulation. The experiments are computational simulations using GPT-3, not experiments with human subjects. Only the most advanced GPT-3 model (text-davinci-003) successfully changes behavior based on endowed preferences; earlier models fail this test. The paper demonstrates qualitative replication of classic experiments but emphasizes that results from AI experiments require empirical confirmation with real humans. Cost: approximately $50 total for all experiments. Regression used only in minimum wage simulation (Table 1) to show effects on hired worker characteristics. [Claude classification]: Published at EC'24 (ACM Conference on Economics and Computation). This is a methodological/conceptual paper proposing LLMs as 'homo silicus' - computational models of humans that can be used to pilot studies via simulation. The experiments are computational simulations using GPT-3, not experiments with human subjects. Only the most advanced GPT-3 model (text-davinci-003) successfully changes behavior based on endowed preferences; earlier models fail this test. The paper demonstrates qualitative replication of classic experiments but emphasizes that results from AI experiments require empirical confirmation with real humans. Cost: approximately $50 total for all experiments. Regression used only in minimum wage simulation (Table 1) to show effects on hired worker characteristics. [Claude classification]: Published at EC'24 (ACM Conference on Economics and Computation). This is a methodological/conceptual paper proposing LLMs as 'homo silicus' - computational models of humans that can be used to pilot studies via simulation. The experiments are computational simulations using GPT-3, not experiments with human subjects. Only the most advanced GPT-3 model (text-davinci-003) successfully changes behavior based on endowed preferences; earlier models fail this test. The paper demonstrates qualitative replication of classic experiments but emphasizes that results from AI experiments require empirical confirmation with real humans. Cost: approximately $50 total for all experiments. Regression used only in minimum wage simulation (Table 1) to show effects on hired worker characteristics.