Data Cleaning Scripts for BPEA Paper “Measuring the Labor Market at the Onset of the COVID-19 Crisis”

The sequence of the scripts indicates potential dependency.

Crosswalks

Script Description
cw_geo_nber.R Reads in and organizes NBER crosswalk for county, MSA, and state codes.
cw_geo_zip.R Reads in and organizes HUD ZIP-FIPS crosswalk.
cw_naics.R Reads in and organizes NAICS codes.
cw_date_st_reg.R Reads in and organizes dates of stay-at-home orders and reopen orders.
cw_date_school.R Reads in and organizes dates of school closure.
cw_date_ui.R Reads in and organizes dates of UI distribution.

Homebase Data

Script Description
data_0_raw_clean.R Functions that conduct the most basic cleaning of Homebase data.
data_1_cw_geo_raw.R Output raw crosswalks for MSA and other geographical variables.
data_1_cw_geo.R Manually fix MSA and state for some observations and merge with NBER geo crosswalk. These geographical variables have been deprecated.
data_1_cw_geo_improved.R Improves Homebase geographical variable (county FIPS, MSA, and state codes) based on zip codes and HUD crosswalk.
data_1_cw_owner.R Produces a crosswalk from Homebase establishments to owners.
data_1_raw.R Appends raw data together and conducts basic cleaning (no subsetting). Memory and time intensive.
data_1_raw_update.R Appends raw data with daily updates (no subsetting).
data_1_sel_firm_year.R Selects firms in base period for 2018-2020.
data_2_firm_ind_geo.R Aggregates data to firm-ind-geo level.

Homebase Worker Survey

Script Description
ws_1_quest.R Reads in responses to each question.
ws_1_raw.R Reads in and clean raw survey data (including factorize responses when possible).
ws_1_userid_var_sel.R Subset of respondents who are in the base period and associated with baseline firms.
ws_2_hours_match.R Produces hours data for respondents who are in the base period and associated with baseline firms.
ws_0_f_qsubset.R Conditions for each question. Used to produce tables.
wso_1_tab_allQ.Rmd Table for each question.
wso_1_crosstab_Qsel.Rmd Crosstabs for selected questions.

SafeGraph Data

Script Description
sg_1_core_poi.R Reads in Core POI files.
sg_1_core_poi_merge.R Merges all versions of Core POI together based on locid, zip, naics.
sg_1_visit_raw.R Reads in raw weekly patterns files from 2019-12-30 to 2020-05-11. Memory and time intensive.
sg_1_visit_raw_update.R Reads in raw weekly patterns files after 2020-05-11 and appends them to data for earlier dates. Memory and time intensive.
sg_1_visit_sum_stats.R Reads in and combines normalization statistics and meta data.
sg_1_visit_sel_loc.R Finds number of visits to locations in base period.
sg_2_visit_agg.R Aggregates to location-date level and merge with Core POI.

Other Data

Script Description
cbp_1_raw.R Reads in and cleans County Business Patterns (CBP) data at the state level.
ppp_unzip.R Unzips PPP data.
ppp_1_code.R Finds the levels of selected variable in the PPP data.
ppp_2_stc.R Categorizes states based on PPP amount (and also median UI replacment rates).
kr_1_raw.R Reads in the raw Kronos data.