/*******************************************************************************

The following .do file identifies, among the Homebase firms which reopened, 
after having shut down: (1) the share of hours relative to their baseline they
have restored and (2) the proportion of those restored work hours worked by 
previous employees versus new hires.

Note that, like elsewhere, we define firm as the same company operating in the 
same state/MSA.

We identify these in multiple steps. First, we construct our baseline sample. 
We restrict to observations working reasonable hours, in reasonable locations.
We also restrict to firms which employed workers for at least 80 total hours 
in the baseline period.  

Second, we want to identify the firms that ever shutdown. We total all hours 
worked each week at each firm. We identify firms that ever are registered with 
zero total hours worked in a week; we define these firms as having shutdown. We 
record the first week they report zero hours and the last week they report zero
hours. We save the firm identifiers and those stats.

Next, we return to the full list of firms, and we identify those who report any
hours worked in a week *after* their last shutdown week. We define these firms
as having reopened.

Now we have our analytical sample: firms that ever shutdown but then reopened. 
Among these firms, we can identify how much of their original "employment"
(total hours worked by their workers and size of their workforce) they've 
collectively restored through each week. 

Since we also want to measure how that "restoration" is split between original
employees and new hires, we want to tag their workers as old or new hires. So 
we return to the firm/person/week dataset. We merge in the "reopened" sample,
keeping only those workers ever associated wtih one of those reopened firms, 
and then identify whether each worker's first week happened after the firm
shutdown. If so, they represent a new hire. If not, they're an original employee.

Now we can sum up total workers employed and total hours worked at the firm level,
and then overall, among new and old hires. We identify the share of hours worked 
and share of workers employed relative to the baseline hours worked and workers 
employed, split between new and old hires, and plot those shares.

Created @ Matt Unrath -- 5/6/20

Updated on 5/7 to add stacked bar graph.

*Updated on 10-27 to fix definition of firm shutdown during crisis

*******************************************************************************/

cap log close rehires_log

local date: display %tdCCYY-NN-DD =daily("`c(current_date)'", "DMY")
log using "/accounts/projects/jrothst/homebase/data/bpea_replication_archive/logs/02_rehires_`date'_oct25.log", ///
	replace name(rehires_log)


set more off
set varabbrev off

loc dir 	"/accounts/projects/jrothst/homebase/data/bpea_replication_archive/"
loc code	"`dir'/code/"
loc results	"`dir'/results/"
loc sourcedata	"`dir'/data_raw/"
loc workingdata	"`dir'/data_clean/"


*******************************************************************************/

*******************************************************************************
* Open cleaned file data
*******************************************************************************

use `workingdata'/homebase_raw_2020_update, clear
 compress

*******************************************************************************
* Clean company and location vars
*******************************************************************************
 ren sthud	state
 ren stfipshud	stfips
 ren msachud	msac
 ren ind	industry
 ren company_id	firm
 ren user_id	person
 egen establishment = group(firm location_id)
 
 drop if state==.
 drop if msac==.

*gsort firm establishment person day
*unique firm
*unique establishment
*unique person

*isid firm establishment person day

 keep firm industry msac state stfips week establishment person day hours_worked numdaysinweek


********************************************************************************
* Prep
********************************************************************************

*Identify max week with at least seven days.
 sum week if numdaysinweek==7
 loc maxweek `r(max)'

 
*Keep obs with reasonable hours, those in the US, those with real stfps
 keep if inrange(hours_worked,0,20)
 drop if inlist(stfips,98,99)

*Create new firm definition. Firms should be specific to industry and MSA.
 egen new_firm = group(firm industry msac state), missing 	
								
 drop firm   
 ren new_firm firm

 *isid firm week establishment person day
 keep if inrange(week,4,`maxweek')
 
 
*Relabel weeks
lab def week	1 "Jan 4"	///
		2 "Jan 11"	///
		3 "Jan 18"	///
		4 "Jan 25"	///
		5 "Feb 1"	///
		6 "Feb 8"	///
		7 "Feb 15"	///
		8 "Feb 22"	///
		9 "Feb 29"	///
		10 "Mar 7"	///
		11 "Mar 14"	///
		12 "Mar 21"	///
		13 "Mar 28"	///
		14 "Apr 4"	///
		15 "Apr 11"	///
		16 "Apr 18"	///
		17 "Apr 25"	///
		18 "May 2"	///
		19 "May 9"	///
		20 "May 16"	///
		21 "May 23"	///
		22 "May 30"	///
		23 "Jun 6"	///
		24 "Jun 13"	///
		25 "Jun 20"	///
		26 "Jun 27"	///
		27 "Jul 4"	///
		28 "Jul 11", replace
lab val week week

 local weeklab: label week `maxweek'
		
 
tempfile fulldata
save `fulldata'


********************************************************************************
* Find firms operating in baseperiod
********************************************************************************


*Construct firm/person/week dataset
 collapse (sum) hours_worked, by(firm week person state msac industry)
 isid firm week person
 gen anywork=(hours_worked>0)
tempfile firmpersonweek
save `firmpersonweek'

*Construct firm/week dataset
 collapse (sum) hours_worked anywork, by(firm week state msac industry)
 isid firm week
 rename anywork numworkers
 compress
tempfile firmweek
save `firmweek'
 
*Identify base period hours
 gen baseperiod=(inlist(week,4,5))
 keep if baseperiod==1
 collapse (sum) hours_worked numworkers, by(firm)
 
 *keep only firms with at least 80 total hours in base period
 keep if hours_worked >= 80
 
 replace hours_worked	=hours_worked/2
 replace numworkers	=numworkers/2
 rename hours_worked 	base_hours
 rename numworkers 	base_workers
 
 replace base_hours=floor(base_hours)
 
 isid firm
 unique firm
 loc baselinecount=`r(unique)'
 
tempfile baseperiod
save `baseperiod'
 
 
 
********************************************************************************
* Create dataset of relevant firms
********************************************************************************

*Open firm/week dataset
  use `firmweek'
 
*Keep if operating in baseperiod 
 merge m:1 firm using `baseperiod', assert(1 3) nogen keep(3)
 
 
*Firms that shutdown might not showup in HB data. We create observations, so we
*can observe their zero hours worked.
 fillin firm week

*Backfill values for observations left missing
 gsort firm _fillin
 foreach var in state msac base_hours base_workers {
  by firm: replace `var'=`var'[1] if _fillin==1
 }
 replace hours_worked=0 if hours_worked==.
 replace numworkers=0 if numworkers==.

*Replace firm/week dataset
tempfile firmweek
save `firmweek'


*Do firms close and open and close again?
 isid firm week
 gsort firm week
 gen anywork=(hours_worked>0)
 by firm: gen openfirstweek=anywork[1]
 keep if openfirstweek==1

 
*Confirm that firms are open in first week
 assert anywork==1 if week==4
 
 tsset firm week
 tsspell anywork
 by firm: egen totspell=max(_spell)
 
*Count firms that pop in and out multiple times
 forvalues spell=3(2)11 {
  unique firm if totspell>=`spell'
  loc spell`spell'=`r(unique)'
 }
 

 
*Count firms open whole time
 by firm: gen alwaysopen=(anywork[1]==1) & (anywork[_N]==1) & (totspell==1)
 tab alwaysopen

*Find max open week
 gsort firm -anywork -week
 by firm: gen maxopenweek=week[1]
 
 gsort firm week
 
*Count firms open, closed, and remained shutdown
 by firm: gen remainshutdown=(anywork[1]==1) & (anywork[_N]==0) & (totspell==2)
 tab remainshutdown
 
  
*Identify that ever shutdown
 by firm: egen minhours=min(hours_worked)
 by firm: gen evershutdown=(minhours==0)

 
*Identify firms that shutdown during depth of crisis
 gen shutdown=anywork==0 & inrange(week,11,14)
 gen shutdownweek=week if anywork==0 & inrange(week,11,14)
 by firm: gegen max_crisis_shutdownweek=max(shutdownweek)
 by firm: gegen crisis_shutdown=max(shutdown)

 
*Count firms that shutdown, reopened, and remain open
 by firm: gen reopened = (anywork[1]==1) & (anywork[_N]==1) & (totspell>=3)
 tab reopened
 
 
*Count firms that shutdown during crisis, then reopened
/*Conditions limit to firms that 
	(1) were open, closed, and then opened, 
	(2) shutdown during crisis, and 
	(3) were open for a week after shutting down amidst crisis. */
 by firm: gen crisis_reopened = (anywork[1]==1) & (totspell>=3) & crisis_shutdown==1 & maxopenweek>max_crisis_shutdownweek
 
 
*Count firms that shutdown during crisis, then reopened and remain open
 by firm: gen crisis_reopened_stayedopen = (anywork[1]==1) & (anywork[_N]==1) & (totspell>=3) & crisis_shutdown==1
 
 
/*Identify whether "ever shutdown" firms come back. We'll identify these firms
by comparing their most recent week with zero hours versus latest with non-zero
hours.*/
 gsort firm anywork week
 by firm: gen minshutdownweek=week[1]
 by firm: gen firmmaxweek=week[_N]
 
 gsort firm -anywork week
 by firm: gen maxshutdownweek=week[_N]

 assert maxshutdownweek>=minshutdownweek
 
 gsort firm week


*Check out all crosstabs of definitions created above 
 foreach var in reopened remainshutdown evershutdown {
  tab alwaysopen `var'
  assert `var'==0 if alwaysopen==1
 }
 
 tab reopened	remainshutdown
 assert remainshutdown==0 	if reopened==1
 assert reopened==0 		if remainshutdown==1
 
 tab reopened	evershutdown
 assert evershutdown==1 if reopened==1
 
 tab evershutdown remainshutdown
 assert evershutdown==1 if remainshutdown==1
 assert remainshutdown==0 if evershutdown==0
 
 
*Count types of firms
*We report these numbers in the paper.
 unique firm if alwaysopen==1
 loc alwaysopencount=`r(unique)'
 
 unique firm if remainshutdown==1
 loc remainshutdowncount=`r(unique)'
 
 unique firm if evershutdown==1
 loc evershutdowncount=`r(unique)'
 
 unique firm if reopened==1
 loc reopenedcount=`r(unique)'
  
 unique firm if crisis_shutdown==1
 loc crisisshutdowncount=`r(unique)'
 
 unique firm if crisis_reopened==1
 loc crisisreopenedcount=`r(unique)'
 
 unique firm if crisis_reopened_stayedopen==1
 loc crisisremainreopencount=`r(unique)'
 
 
tempfile firmevershutdown
save `firmevershutdown'



/*******************************************************************************
*Stacked bar graph

Among the firms that shutdown by April 4, we want to identify where those firms'
baseline hours are each week afteer April 4th. Are those hours restored, and if 
so, were they restored to original employees or new hires? Or are those hours
missing because reopened firms have cut hours? Or are those hours missing 
because firms have not yet reopened.

So below, we'll identify two stats for each firm and each week: (1) Is the 
firm still closed? (2) If they're reopened, how many hours are worked by new vs
re-hires? 

Among reopened firms, we can identify the size of the hours reductions by
subtracting the hours worked by new and re-hires from the baseline hours.

*******************************************************************************/


*Open the list of firm/week that ever stutdown, created above
use `firmevershutdown', clear

*Restrict to those in base period, and merge in base_hours info
 merge m:1 firm using `baseperiod',	keepus(base_hours base_workers)	///
					keep(3) nogen	

*Keep if shutdown before Apr 4
 keep if crisis_shutdown==1
 keep if minshutdownweek<=14

 keep 	firm week state msac	///
	base_hours		///
	hours_worked		///
	anywork			///
	minshutdownweek		///
	maxshutdownweek 	///
	firmmaxweek

tempfile firmevershutdown
save `firmevershutdown'	


/*Save firm/weeks when they're shutdown. 

Rename those firms' base hours as "shutdown_share" so we know how much of the 
collective base hours we should attribute to shutdown firms.*/
 keep if anywork==0
 
 ren base_hours shutdown_share
 keep firm week shutdown_share
 
tempfile shutdown_share
save `shutdown_share'


/*Save firms that are ever open.

Reopen the list of firm/weeks shutdown before Apr 4, and keep the firm/weeks
when those firms are open. We'll merge this to the firm/person/week dataset 
below, so we can limit to workers associated wtih firms in these weeks.*/

use `firmevershutdown'

 keep if anywork==1

tempfile everopen
save `everopen'

 
*Reopen firm/person/week dataset, and identify each person's first week
use `firmpersonweek'
 
 sort person firm week
 by person: egen minweek=min(week) 
 
 
/*Merge in sample of reopened firms saved above. Keep workers who ever worked at 
one of these firms and worked there after they reopened*/
 merge m:1 firm week using `everopen', keep(3) nogen

 
*If a worker's first week in HB records is after firm reopened, then they're a new hire.
 gen newhire=(minweek>minshutdownweek)
  
 
*Sum total hours worked at each firm and week by new and old hires 
 collapse (sum) hours_worked anywork, by(firm week state msac newhire)
 isid firm week newhire
 rename anywork numworkers
 compress
 
tempfile firmpersonweek
save `firmpersonweek'


*Save copy with just new hires. 
*Rename total hours worked as those worked by new hires.
 keep if newhire==1
 isid 	firm week
 ren 	hours_worked	hours_worked_newhire
 keep 	firm week 	hours_worked_newhire
 
tempfile firmpersonweek_newhire
save `firmpersonweek_newhire'


*Save copy with just rehires
*Rename total hours worked as those worked by re-hires.
use `firmpersonweek'

 keep 	if newhire==0
 isid 	firm week
 ren 	hours_worked 	hours_worked_rehire
 keep 	firm week 	hours_worked_rehire
 
tempfile firmpersonweek_rehire
save `firmpersonweek_rehire'



/*Open file of firm/week hours for those firms ever shutdown by Apr 4, but only
the weeks when they're open.*/
use `everopen'
 keep firm week base_hours
 isid firm week
 
*Merge in hours split between new and re-hires.
 merge 1:1 firm week using `firmpersonweek_newhire', gen(merge_newhire)
 merge 1:1 firm week using `firmpersonweek_rehire', gen(merge_rehire)
 
*Back fill values =0 if missing.
 foreach type in re new {
  replace hours_worked_`type'hire=0 if hours_worked_`type'hire==.
 }
 
*Hours cuts equal to base_hours minus observed hours worked
 gen hours_cuts = base_hours - (hours_worked_rehire + hours_worked_newhire)
 
*Now we can merge in firm/week for shutdown firms. 
*Since the current dataset should only include firms where there are observed
*work hours, these new observations should not match to any existing ones.
 merge 1:1 firm week using `shutdown_share', gen(merge_shutdown) assert(1 2)
 
/*Replace base_hours from the master file with "shutdown_share", which are just
base_hours for those shutdown firms.*/
 replace base_hours=shutdown_share if merge_shutdown==2 & base_hours==.

*Fill in zeros for missing values
 foreach var in 	hours_worked_newhire 	///
			hours_worked_rehire 	///
			hours_cuts 		///
			shutdown_share {
  replace `var'=0 if `var'==.
 }
 
 isid firm week 
 
*Confirm that shares sum to base_hours, so ratio's always equal 100.
 egen sim_hours=rowtotal(	hours_worked_newhire 	///
				hours_worked_rehire 	///
				hours_cuts 		///
				shutdown_share)
 gen diff = sim_hours-base_hours
 sum diff, detail
 assert inrange(diff,-1,1)
 
/*Now we can identify where these firms' collective base hours are: still shutdown,
reduced, or reemployed by new or re-hires. We collapse by week, summing up all 
types of firms' hours by week. */
 collapse (sum) base_hours hours_worked_newhire hours_worked_rehire hours_cuts shutdown_share, by(week)
 
*Identify ratio
 foreach var in hours_worked_newhire hours_worked_rehire hours_cuts shutdown_share {
  gen `var'_share = `var'/base_hours
 }

 
 keep week shutdown_share_share hours_cuts_share hours_worked_rehire_share hours_worked_newhire_share
save `results'/data/rehires_stackedbar_oct25, replace
export delimited using "`results'/data/rehires_stackedbar_oct25.csv", replace
 
 
*Plot ratios
 graph bar 	shutdown_share_share 						///
		hours_cuts_share						///
		hours_worked_rehire_share					///
		hours_worked_newhire_share					///
	if inrange(week,15,`maxweek'), 						///
	over(week, label(labs(vsmall))) stack					///
	bar(1, color("59 126 161"))						///
	bar(2, color("0 50 98"))						///
	bar(3, color("196 130 14"))						///
	bar(4, color("253 181 21"))						///
	ylab(0 "0" .2 "20" .4 "40" .6 "60" .8 "80" 1 "100", 			///
				labs(small) grid gstyle(minor) ) 		///
	yti(	"Percent attributable to each hours change", size(medsmall))	///
	subti(" ", size(tiny)) 							///
	legend(	lab(4 "New hires") lab(3 "Re-hires")				///
		lab(2 "Reopened at reduced hours") 				///
		lab(1 "Still closed")						///
		size(vsmall) r(1) c(3) symysize(*.8) bmargin(small))		///
	scheme(s1color)	intensity(*.7)						///
	yscale(titlegap(*10))							///
	/*note("Data updated through `weeklab'", size(vsmall))*/
gr save `results'/figures/rehires_stackedbar, replace
gr export "`results'/figures/rehires_stackedbar_oct25.png", replace
 
 

********************************************************************************
*Counts
********************************************************************************

di "`baselinecount' firms in our baseline sample"
di "`evershutdowncount' firms ever shutdown"
di "`remainshutdowncount' firms remain shutdown"
di "`reopenedcount' firms have reopened after having shutdown"

di "`crisisshutdowncount' firms shutdown during crisis"
di "`crisisreopenedcount' firms reopened after shutting during during crisis"
di "`crisisremainreopencount' firms reopened and remained open after having shutdown during crisis"

forvalues spell=3(2)11 {
loc times=(`spell'-1)/2
di "`spell`spell'' firms shutdown and reopened `times' times"
}


********************************************************************************

log close rehires_log
********************************************************************************
