A call for a new generation of COVID-19 models

New Jersey Governor Phil Murphy speaks about a chart showing discharges verses new covid-19 hospitalizations during his Saturday, April 18, 2020, press conference at War Memorial in Trenton, NJ, on the State's response to the coronavirus.Governor Murphy Press Conference

The epidemiological models of COVID-19’s initial outbreak and spread have been useful. The Imperial College model, which predicted a terrifying 2.2 million deaths in the United States, agitated drowsy policymakers into action. The University of Washington’s Institute of Health Metrics and Evaluation (IHME) model has provided a sense of the scale and timeline for peak hospitalization. Other models have estimated the effects of quarantine and of travel restrictions, or sought to find the pandemic’s turning point. Despite some notable flaws, the epidemiological models have cumulatively had a beneficial effect on the national conversation. Their ability to incorporate some epidemiological knowledge and the limited available data led to better—and harder to dismiss or deny—predictions of the near future than mere guesswork would have allowed.

Now, the situation is changing, and the models will need to change, too. Stay-at-home orders and other social distancing measures seem to be bringing the infection rate under control, and we may be just past the peak of the death rate, but the pandemic is far from over. Thousands of policymakers across the country, mostly at the state and local level, will need to decide where and when to re-open schools, ease business and social distancing restrictions, allow sports to resume, and make a myriad of other choices. Federal and state leaders also need to allocate COVID-19 tests amid supply shortages, prioritize the deployment of contact tracing staff, and eventually, distribute vaccines to those who need them most.

A new generation of COVID-19 models

Existing models have been valuable, but they were not designed to support these types of critical decisions. A new generation of models that estimate the risk of COVID-19 spread for precise geographies—at the county or even more localized level—would be much more informative for these questions. Rather than produce long-term predictions of deaths or hospital utilization, these models could estimate near-term relative risk to inform local policymaking. Going forward, governors and mayors need local, current, and actionable numbers.

Broadly speaking, better models would substantially aid in the “adaptive response” approach to re-opening the economy. In this strategy, policymakers cyclically loosen and re-tighten restrictions, attempting to work back towards a healthy economy without moving so fast as to allow infections to take off again. In an ideal process, restrictions would be eased at such a pace that balances a swift return to normalcy with reducing total COVID-19 infections. Of course, this is impossible in practice, and thus some continued adjustments—the flipping of various controls off and on again—will be necessary. More precise models can help improve this process, providing another lens into when it will be safe to relax restrictions, thus making it easier to do without a disruptive back-and-forth. A more-or-less continuous easing of restrictions is especially valuable, since it is unlikely that second or third rounds of interventions (such as social distancing) would achieve the same high rates of compliance as the first round.

The proliferation of COVID-19 data

These models can incorporate cases, test-positive rates, hospitalization information, deaths, excess deaths, and other known COVID-19 data. While all these data sources are incomplete, an expanding body of research on COVID-19 is making the data more interpretable. This research will become progressively more valuable with more data on the spread of COVID-19 in the U.S. rather than data from other countries or past pandemics.

Further, a broad range of non-COVID-19 data can also inform risk estimates: Population density, age distributions, poverty and uninsured rates, the number of essential frontline workers, and co-morbidity factors can also be included. Community mobility reports from Google and Unacast’s social distancing scorecard can identify how easing restrictions are changing behavior. Small area estimates also allow the models to account for the risk of spread from other nearby geographies. Geospatial statistics cannot account for infectious spread between two large neighboring states, but they would add value for adjacent zip codes. Lastly, many more data sources are in the works, like open patient data registries, the National Institutes of Health’s (NIH) study of asymptomatic persons, self-reported symptoms data from Facebook, and (potentially) new randomized surveys. In fact, there are so many diverse and relevant data streams, that models can add value simply be consolidating daily information into just a few top-line numbers that are comparable across the nation.

FiveThirtyEight has effectively explained that making these models is tremendously difficult due to incomplete data, especially since the U.S. is not testing enough or in statistically valuable ways. These challenges are real, but decision-makers are currently using this same highly flawed data to make inferences and policy choices. Despite the many known problems, elected officials and public health services have no choice. Frequently, they are evaluating the data without the time and expertise to make reasoned statistical interpretations based on epidemiological research, leaving significant opportunity for modeling to help.

Steps to enable better COVID-19 models

There are steps to be taken to enable these models and make them as informative as possible. First, the Centers for Disease Control and Prevention (CDC) needs to dramatically expand the amount of data it releases. While Johns Hopkins University’s team should be lauded for their work, a well-funded federal effort is warranted. The CDC should make data publicly available at least at the county level, with demographic information including age and race (especially important due to the staggering inequities in COVID-19 deaths). They should also consider collecting and releasing to researchers a far more geographically-precise dataset, with anonymized point-locations of testing results. For this, the CDC should consider either a researcher-only data repository, where this and other sensitive COVID-19 relevant data can be stored securely, or a differential privacy approach. If the CDC needs technical support to do this, they should engage with the Census Bureau, the US Digital Service, or external actors.

While university researchers are likely to build these models on their own (Columbia University already has a county-level model oriented towards hospitalizations), having several competing models would best inform policymakers. Being able to ensemble different models—combining their estimates into one—can often improve on any individual effort. Both the CDC and NIH should direct some of their emergency funding from the CARES act towards this goal. If they cannot, Congress should appropriate additional funding specifically for modeling. Any such funding should also require the open-sourcing of model code and results, not just hastily prepared-methodological papers, to enable meaningful scientific debate. Given the tremendous expense of the economic and health repercussions of COVID-19, especially if policy mistakes were to lead to a resurgent outbreak, the cost of these models is negligible. This is especially true since they can enhance the benefit of improved data collection.

Models can play an important role in the path to recovery

These models do not need to work alone. Near-term and small-area estimates of COVID-19 risk can complement the other metrics (such as number of days with consecutive case decline) used to reopen the economy and scale back social distancing. Far more testing is still needed, and direct measurements like the trajectories of cases, test-positive rates, and hospitalizations should certainly not be discounted. However, a considered modeling approach might help a local policymaker identify more subtle causes for concern, such as rising social mobility in adjacent jurisdictions with higher case numbers.

The only alternative to local inferences made by models are local inferences made exclusively by humans—these decisions simply must be made. Without models, many officials will miss warning signs or misinterpret the complex interactions of many streams of data. So, while these models will certainly be imperfect, they can act as a sanity check to help form and scrutinize subjective judgements.

It is important to keep in mind that the recovery from this pandemic may last for many months or even years, and policymakers in every jurisdiction and across every level of government will need to make important choices about how to manage COVID-19. This means that models built now will have many opportunities to inform better policy choices, most notably easing restrictions and allocating limited medical resources. This new generation of models for short-term and highly-local COVID-19 risk scores can be a much-needed compass to help navigate the path to recovery.