Sections

Commentary

What tests to use, when, why—and why not? Pitfalls of mass testing for COVID-19

Fig
Editor's note:

Editor’s Note:

This analysis is part of the USC-Brookings Schaeffer Initiative for Health Policy, which is a partnership between Economic Studies at Brookings and the University of Southern California Schaeffer Center for Health Policy & Economics. The Initiative aims to inform the national health care debate with rigorous, evidence-based analysis leading to practical recommendations using the collaborative strengths of USC and Brookings.

The recent outbreak in the White House highlights the limits on testing as a containment strategy for COVID-19. Concerns about hotspots flaring in schools of all types, sports teams, and workplaces lend special urgency to answering how best to limit the spread of COVID-19, and specifically how to test for and track the SARS-CoV-2 virus in the general population. An ongoing public health debate centers on whether we should use sub-optimal tests on a massive scale, testing frequently to overcome their analytical shortcomings.

The basic argument was encapsulated in the 9/11 Health Affairs post by Paltiel and Walensky and has two parts. First, that widespread screening will dramatically expand testing capacity and ease ongoing strain on critical supply chains. Second, that cases missed by sub-optimal tests are (probably) not infectious. In this Post, we address why these contentions ignore the serious consequences of false positive results, underestimate the importance of false-negative results, misapprehend the nature of supply chain failures in clinical laboratories, and ignore how over-reliance on biomedical tests results in risky public health behaviors. Unfortunately, the proponents of high-frequency, lower-sensitivity testing rarely consider the consequences of false-positive results, whether narrowly on the operation of clinical laboratories or more broadly on clinical practice and public health. We explore the inevitable results of high-frequency, lower-sensitivity testing and explain why implementing such an approach would result in bad public policy.

Frequent Testing Emboldens Unsafe Behavior

The authors, two of whom are Directors of Clinical Laboratories and the third an experienced health policy analyst, strongly agree that clinical testing has a key role. New case clusters in the White House, the Senate, and college dormitories (that continue to fuel the US outbreak), underscore that excellent access to screening tests is insufficient to prevent significant outbreaks. The whole point of frequent testing regimens is to mitigate COVID-19 outbreaks. Root causes in these scenarios appear to extend beyond shortcomings in the tests (where false-negative test results led to missed case detection that more sensitive diagnostic PCR testing would have found). So what allowed the disease to spread?

These outbreaks demonstrate the concept and consequences of the “preventive misconception” – that individuals undergoing a preventive health intervention (in this  case, screening) will engage in risky behavior because they assume they are not infectious – and that making this cognitive error is not rare. In the case of the White House case cluster, masks were eschewed and physical distance was not maintained. Similar behaviors were reported among college students. Thus, overconfidence in the ability of a testing regimen to stop chains of transmission paradoxically embolden behaviors that increase transmission.

Key Factor Limiting Even the Best Diagnostic Tests: Pre-Test Probability that Patients have the Disease

Beyond the impact of testing on behavior, it is important to distinguish diagnostic testing of persons with a reasonable index of suspicion for COVID-19 from screening testing of low-prevalence populations. The most relevant difference is not necessarily in the ability to detect positive cases (sensitivity), negatives cases (specificity), or any other analytical parameter of the assay. Rather, the key point is the effect of pre-test probability – the prevalence of COVID-19 in the target population – on the proportion of erroneously positive test results. As we demonstrate graphically (Figure 1), the lower the prevalence, the higher the rate of false positives; the grey box represents target prevalence in outbreak suppression efforts.

Fig 1

For a population with a given disease prevalence, the sensitivity and specificity of an assay crucially affect the proportion of false positives and false negatives: the positive predictive value (PPV) and negative predictive value (NPV). We model how PPV (Figure 1) and NPV (Figure 2) change with different sensitivity and specificities and over a range of COVID-19 prevalence from 0.1% to 10%. The sensitivities selected for our model (>95%) are comparable to PCR testing for SARS-CoV-2 and possibly overly optimistic. Rapid tests have much lower sensitivity, represented in our model as 80% sensitivity. Sensitivity has little impact on false positive rates (Figure 1). The sensitivities in our model (³ 95%) are representative of (or better than) most gold-standard PCR assays. Specificities of rapid assays are similar to the lowest in our model (98.5%), if not worse. The take home point is that in low-prevalence populations, even using assays with outstanding analytical performance, half or more of all positive results will be erroneous (Figure 1). By comparison, false negative results are relatively rare – especially in the low-prevalence setting – even with insensitive (rapid) tests (Figure 2).

Fig 2

An important real-world example comes from the <1% prevalence of SARS-CoV-2 among asymptomatic patients without known COVID-19 exposures admitted to our large, academic hospital, despite Seattle having been an early US epicenter of the outbreak. If we used an assay with sensitivity and specificity both of 99.5% to detect SARS-CoV-2 infection in these patients waiting for a hospital bed in the Emergency Room (assuming prevalence of 1%), we would expect ~1/3 of the positive results to be false! By comparison, if we used the exact same assay for our patients with respiratory symptoms (cumulative positivity rate of ~5%), we expect less than 10% of positive results to be false (Figure 1). Statisticians will recognize this difference as Bayes’ Theorem in action. In Laboratory Medicine we call this Pre-Test Probability.

Adverse Consequences of False Positives

False-positive SARS-CoV-2 results harm individuals, strain limited laboratory and public health resources, and risk long-range harm by undermining confidence in clinical and public health efforts. We have seen false positive SARS-CoV-2 test results delay life-saving surgeries. We also know first-hand how confirmatory testing and investigation of unexpectedly positive results strain the laboratory, consuming scarce reagents, adding to the workload of overtaxed lab staff/health care providers, and delaying turnaround time for test results. Deploying assays en masse that would yield so many falsely positive results raises an important question: do all of the positives need confirmation by gold-standard PCR assays? The potential need for confirmatory testing risks markedly increasing the strain on already stressed supply chains upon which clinical laboratories depend. Similarly, a high proportion of false positive results will substantially complicate (if not overwhelm) contact tracing efforts. Another unexplored question is how would a high false positive rate interact with policies around reopening schools or other “normal” socioeconomic activity?

False-positive results may have another, more insidious, longer term consequence: erosion of trust in diagnostic testing. Imagine the public reaction to national headlines describing “tens of thousands of false positive results.” Given that the United States has struggled with widespread adoption of masks, disinformation, and conspiracy theories, we question the ability of doctors to satisfy public concerns by explaining conditional probability and shudder to imagine the sociopolitical consequences of widespread “phony” test results.

Causes and Consequences of Missed Case Detection

The case for high-frequency testing relies crucially on two assumptions:  false-negatives will be detected on repeat testing 2-3 days later, and “false negatives” represent non-infectious people. Unfortunately, each of these assumptions is fatally flawed.

What happens if a college student is exposed on a Sunday, tests negative on a Friday, attends parties Friday and Saturday nights, and then develops symptoms on the next Sunday when they also test positive? Very similar outbreaks have already been documented. This scenario is consistent with what we know about SARS-CoV-2 viral kinetics and poses a prime opportunity for rapid spread since the virus has been transmissible for at least 1-2 days by the time symptoms set in. Even short testing windows may fail to mitigate transmission due to risky behaviors during the infectious, pre-symptomatic period. Considering how recent behavioral models that failed to account for preventive misconception among college students, this scenario goes from plausible to likely.

What Do We Know about Infectiousness of Very Low Concentrations of SARS-CoV-2?

Proponents of high-frequency, lower-sensitivity mass testing suggest that any false negative test results represent patients with very low concentrations of SARS-CoV-2, and that these infected individuals are unlikely to be infectious and may have even recovered from their disease. These conclusions are not supported by the available scientific evidence about who is infectious. There is still limited literature linking the “CT” – a semi-quantitative value from PCR tests that is not reported but stored in laboratory instruments that reflects the number of amplification cycles needed to detect viral RNA – and viral infectivity, and the information we do have comes from viral culture and not from studies of transmission.

Overinterpreting the biomedical literature on the relationship between low concentrations of SARS-CoV-2 and infectiousness is dangerous and not supported by current evidence. Important early reports suggested SARS-CoV-2 could only be cultured from when there is a lot of virus detected by PCR. However, subsequent studies have cultured virus from samples with exponentially less (2-3 logs) viral RNA, a finding corroborated by a large study released 28 September 2020. Despite these studies, we do not know how well the ability to culture virus serves as a proxy for infectivity, nor do we know the limit below which infectious virus is no longer present. As it currently stands, the scientific literature supports the proposition that cases missed due to reduced test sensitivity may well be infectious.

Impact of PCR Tests Detecting Patients who Have Recovered from COVID-19

Proponents of high-frequency, mass testing often point to what might appear to be a vexing problem: positive test results in patients who have recovered from COVID-19. Arguing about these “re-positive” patients is a straw man argument: these convalescents are not the target of mass testing regimens. Although genes from the virus can be detected long after patients have recovered, we have not seen these patients transmit virus nor have we cultured virus in such scenarios. Our clinical bottom-line is quite simple: a test result should never replace a thoughtful diagnosis informed by the patient’s clinical status, their history, and other test results.

Continuing Supply Chain Challenges

Based on our experiences as Clinical Laboratory Directors, we anticipate that low-cost test alternatives like lateral flow assays and paper-based test strips will be subject to supply chain limitations similar to those we continue to experience with PCR assays. There is little evidence to support the notion that these alternatives will not have supply chain disruptions; to the contrary, preliminary findings from a survey of laboratory directors and infectious disease doctors conducted by the Infectious Diseases Society of America, along with lay reporting, demonstrate shortages extend far beyond COVID-19 testing supplies and threaten clinical laboratories’ ability to perform many different routine diagnostic tests. Moreover, this survey suggests clinicians are less aware of shortages than Laboratory Directors. Regardless, the need to confirm false positives will tax existing laboratory and contact tracing resources.

Conclusion

Testing for SARS-CoV-2 is important, particularly for diagnosing active infections, testing high-risk exposures, and targeted surveillance. However, mass testing, regardless of test quality is not necessary to achieve public health goals and could actually do harm. To effectively reduce the spread of COVID-19 we need wide-spread adoption of simple, cheap, collective public health policies: mask wearing, hand washing, and physical distancing (especially inside). High-frequency testing of asymptomatic populations may result in laxness practicing such key behaviors by engendering a false sense of security and paradoxically burden clinical laboratories and contact-tracing efforts.

Recent case clusters demonstrate that rigorous testing is not enough to disrupt transmission chains, even among groups that know how to prevent the pandemic’s spread. It is not yet clear to what extent preventive misconception and risk-taking, reduced assay sensitivity, or inherent limitations in a frequent testing algorithm enabled such outbreaks to occur (although behavioral choices clearly played critical roles). It is clear, however, that test results should always be interpreted in context. High-frequency, mass-scale testing can substitute for neither good behavior nor good clinical judgment.

The authors did not receive financial support from any firm or person for this article or from any firm or person with a financial or political interest in this article. They are currently not an officer, director, or board member of any organization with an interest in this article.

Authors