This directory provides all the files needed to produce the figures and tables in Jensen, Kaplan, Naidu, and Wilse-Samson (2013) "Political Polarization and the Dynamics of Political Language: Evidence From 130 Years of Partisan Speech" Brookings Papers on Economic Affairs. It also corrects some minor errors in the final draft.
The raw counts and predictions done in R are in the rawdata/Routput_replication directory, other files are in rawdata, all pictures and tables get put in Pics_Tables.

Order of operations:
0) (Optional) Run loadingsValidation_replication.r in R from the base directory. Note that this file takes a long time to run (at least in 2012). It puts the correlations in rawdata/Routput while the output from the run used to generate the paper is in rawdata/Routput_replication. This is the only R program, and everything else is done in Stata. If you do want to use the output from this step, you will have to make some modifications to paths in combine_Trigrams_replication.do. Note that the % correctly predicted graph will be different, as a different random sample will be used. 
1) Run combine_Trigrams_replication.do This assembles the data  from files in rawdata/Routput_replication and produces Figure 1  and Table 1 and 2 (with correct standard errors; the ones in the paper are incorrect).
1') (Optional) If you ran 0), you will need to run combine_Trigrams_replication_fresh.do to assemble the data from files in rawdata/Routput and produces Figure 1  (which will differ from that in the paper) and Table 1 and 2 (which will be the same as those produced in step 1)
2) maketimeseries_replication.do produces the time-series data and the first part of the panel data
3) topics_tax_replication.do produces Figure 2
4) analysis_time_series_replication.do: This produces all the time series pictures (Figures 3-6) and Tables 3 and 4
5) phrase_panel_construct_replication.do: produces the full panel data
6) phrase_panel_analysis_replication.do: generates tables 5, 6 and 7. Table 5, the summary statistics, in the paper is from a early version of the analysis that got mistakenly included in the paper. This file generates the correct version.
Files in Appendix folder:
7) The file dwnominate_replicate produces Figure A1 in the Appendix from files in Routput_replication.
8) analysis_time_series_freqcut_replication.do analysis_time_series_tstatcut_replication.do generate the time series graphs B1 and B2 in the Appendix.
9) The files phrase_panel_construct_appendix_replication.do phrase_panel_analysis_appendix_replication.do produce the Tables B1-B4 in the Appendix.


Corrigenda:
Figure 1: the In-Sample predictions are based on the same 75% sample used to generate the Out-of-Sample.
The standard errors in Tables 1 and 2 are incorrect.These replication files generate the correct ones.
The summary statistics in Table 5 are incorrect.These replication files generate the correct ones.