Data Cleaning Scripts for BPEA Paper “Measuring the Labor Market at the Onset of the COVID-19 Crisis”
The sequence of the scripts indicates potential dependency.
Crosswalks
cw_geo_nber.R |
Reads in and organizes NBER crosswalk for county, MSA, and state codes. |
cw_geo_zip.R |
Reads in and organizes HUD ZIP-FIPS crosswalk. |
cw_naics.R |
Reads in and organizes NAICS codes. |
cw_date_st_reg.R |
Reads in and organizes dates of stay-at-home orders and reopen orders. |
cw_date_school.R |
Reads in and organizes dates of school closure. |
cw_date_ui.R |
Reads in and organizes dates of UI distribution. |
Homebase Data
data_0_raw_clean.R |
Functions that conduct the most basic cleaning of Homebase data. |
data_1_cw_geo_raw.R |
Output raw crosswalks for MSA and other geographical variables. |
data_1_cw_geo.R |
Manually fix MSA and state for some observations and merge with NBER geo crosswalk. These geographical variables have been deprecated. |
data_1_cw_geo_improved.R |
Improves Homebase geographical variable (county FIPS, MSA, and state codes) based on zip codes and HUD crosswalk. |
data_1_cw_owner.R |
Produces a crosswalk from Homebase establishments to owners. |
data_1_raw.R |
Appends raw data together and conducts basic cleaning (no subsetting). Memory and time intensive. |
data_1_raw_update.R |
Appends raw data with daily updates (no subsetting). |
data_1_sel_firm_year.R |
Selects firms in base period for 2018-2020. |
data_2_firm_ind_geo.R |
Aggregates data to firm-ind-geo level. |
Homebase Worker Survey
ws_1_quest.R |
Reads in responses to each question. |
ws_1_raw.R |
Reads in and clean raw survey data (including factorize responses when possible). |
ws_1_userid_var_sel.R |
Subset of respondents who are in the base period and associated with baseline firms. |
ws_2_hours_match.R |
Produces hours data for respondents who are in the base period and associated with baseline firms. |
ws_0_f_qsubset.R |
Conditions for each question. Used to produce tables. |
wso_1_tab_allQ.Rmd |
Table for each question. |
wso_1_crosstab_Qsel.Rmd |
Crosstabs for selected questions. |
SafeGraph Data
sg_1_core_poi.R |
Reads in Core POI files. |
sg_1_core_poi_merge.R |
Merges all versions of Core POI together based on locid, zip, naics. |
sg_1_visit_raw.R |
Reads in raw weekly patterns files from 2019-12-30 to 2020-05-11. Memory and time intensive. |
sg_1_visit_raw_update.R |
Reads in raw weekly patterns files after 2020-05-11 and appends them to data for earlier dates. Memory and time intensive. |
sg_1_visit_sum_stats.R |
Reads in and combines normalization statistics and meta data. |
sg_1_visit_sel_loc.R |
Finds number of visits to locations in base period. |
sg_2_visit_agg.R |
Aggregates to location-date level and merge with Core POI. |
Other Data
cbp_1_raw.R |
Reads in and cleans County Business Patterns (CBP) data at the state level. |
ppp_unzip.R |
Unzips PPP data. |
ppp_1_code.R |
Finds the levels of selected variable in the PPP data. |
ppp_2_stc.R |
Categorizes states based on PPP amount (and also median UI replacment rates). |
kr_1_raw.R |
Reads in the raw Kronos data. |