This folder provides replication materials for "Technology and Labor Markets: Past, Present, and Future; Evidence from Two Centuries of Innovation"
by Liu, Papanikolaou, Schmidt and Seegmiller For Fall 2025 Brookings Papers on Economic Activity

------------------Need to install------------------

Matlab: 
shadedErrorBar from https://www.mathworks.com/matlabcentral/fileexchange/26311-raacampbell-shadederrorbar and put the shadedErrorBar.m file under the directory Code/.

Stata: 
reghdfe (See https://scorreia.com/software/reghdfe/ for install instructions) 
estout (ssc install estout) 
ivreg2 (ssc install ivreg2) 
ivreghdfe (https://github.com/sergiocorreia/ivreghdfe/issues/50 may help with troubleshooting) 
mat2txt (ssc install mat2txt) 
gtools (ssc install gtools) 

------------------Building dataset for analysis------------------

1. Execute the Stata script `Code/PrepareData.do”. This will generate intermediate data needed for further replication.
	a.NOTE: You should modify line 2 of this script to refer to the appropriate file path for the main project directory

------------------Replicating empirical results------------------

1. Execute the Stata script ``Code/Regressions_Baseline.do”. This will generate the results found in Table 1, 4, 5, A2, A3, A4, A5, A6, A7, A8, A9 and Figure 2. 
	a. NOTE: You should modify line 2 of this script to refer to the appropriate file path for the main project directory

2. Execute the Stata script ``Code/Regressions_Spillover.do”. This will generate the results found in Table 2, A10, A11, A12, A13. 
	a. NOTE: You should modify line 2 of this script to refer to 				the appropriate file path for the main project directory

3. Execute the matlab script ``Code/EstimatedCoeffsbyDecade.m”. This will generate the results found in Figure A2 and A3. Note that step 3) must be executed after step 1) and 2) 

4. Execute the Stata script ``Code/Regressions_Heterogeneity_Decomp.do”. This will generate the results found in Table 3, Figure 4, Figure A4, Figure A5, and Figure A6. Results are stored separately for each panel in the given table/figure. 
	a. NOTE: You should modify line 2 of this script to refer to the appropriate file path for the main project directory



------------------Replicating Model Results------------------

Model code is found in Code/Model. Model results in the paper can be replicated as follows: 

1. To estimate the model, first execute the Matlab script “Code/Model/run_ge_aggregates_with_products.m”. The rest of the Matlab scripts are helper functions that will be run automatically. This script reads in raw input data found in the Data/ModelData directory. 
	a.NOTE: This code was last run in Matlab R2023b and requires the Matlab econometrics and statistics and parallel computing toolboxes.
	b.NOTE: You should change the variable ``targetN” (line 12) to refer to however many cores you want to parallelize over. The default number is 7. 

2. Next, execute the Stata Script “Code/Model/Table6OutputSimulationResults.do”. This will generate a csv file with Table 6 output, and also files that will generate the plots in Figure 5
	a.NOTE: You should modify line 2 of this script to refer to the appropriate file path for the main project directory 


3. Last, execute the Stata script “Code/Model/Figure5ExportSimulationsForPlots.do”. This will output the data for the model bar plots in Figure 5. 
	a.NOTE: You should modify line 2 of this script to refer to the appropriate file path for the main project directory 




------------------Input data description------------------

The input data are located in the folder Data/Raw for the empirical part and Data/ModelData for the model part.
1. Data/Raw includes the following:
	a. Folder Crosswalks: Contains crosswalks between different occupational coding schemes and their labels.
	b. Folder IPUMS: Files ending with “Emp” contain employment weights for different occupational coding schemes (as noted in the file name) and decades (and by age if the file name includes “SpecificAge”).
Files including “AvgAge” contain average age information for each decade and occupational coding scheme.
Files including “MaleOnly” are versions of the employment weight computed using only male labor.
	c. Folders GPT_4oSearch, GPT_4o, and Llama: Contain exposure measures generated by the corresponding LLMs.
Files starting with “OccYearExp” contain mean and concentrated exposure measures (s_95 and d_95) for each decade and occupational coding scheme.
Files starting with “CategoryExposure” contain exposure measures for each task type.
Files starting with “IndAdjuster” contain predicted levels of patenting for each industry and decade.

	d. The other three files, edscores_by_occcode_year.dta, occ_female_share.dta, and patent_count_by_decade.csv, are respectively the occupational education score and education quintile by occupation and year taken from IPUMs data; occupational female share and female share quintile by occupation and year taken from IPUMs data; and the number of patents linked to each industry code by decade.

2. Data/ModelData includes:

	a. AI_product_patent_ratio_by_naics.csv: column "AI_ratio" corresponds to predicted flow of new AI patents over the next 10 years divided by stock of existing AI patents at 2-digit NAICS level (as in appendix equation A.35) 

	b. broad_occ_educ_female_by_soc: occupational broad occupation categories, female shares and education scores taken from 2023 IPUMs data
 
	c. occupation_distribution_consolidated.csv: occupation-industry (soc 6-digit code by 2-digit NAICS industry) employment distribution calculated taken from 2024 BLS OES data (excluding occupations with less than 0.1% share of an industry's BLS employment) 

	d. occupation_exposed_task_shares_3cat: occupational mean and concentration of Eloundou et al task exposures for SOC occupations as implied by their beta measure. 

	e. occupation_exposed_task_shares_3cat_cognitive: Occupational mean and concentration of Task exposures for SOC occupations as implied by their share of exposed cognitive tasks. 

	f.  occupation_exposed_tasks_3cat: Eloundou et al task exposures for SOC occupations as implied by their "beta" measure. 

	g. occupation_exposed_tasks_3cat_cognitive: Occupational Task exposures for SOC occupations as implied by their share of exposed cognitive tasks. 

	h. TargetCoeffs: Target coefficients for model nu calibration to match (calibrated to 10 year OLS raw/non-standardized coefficient estimates in the paper)


