Today’s finance-research complex closely links funding to “impact”, which includes evidence of policymakers and politicians citing the research in question. What research politicians choose to highlight therefore influences what is eventually funded. Yet, recent speeches and op-eds suggest that even key personnel in technocratic institutions that fund the generation of knowledge can end up citing material that deserves no place in a serious discussion. This implies either a lack of judgment or a genuine preference for headline-grabbing numbers over careful, nuanced results. In today’s charged world, both are cause for concern.
Study 1: A McKinsey report that claims that gender equality will add $12 trillion to the global economy was recently cited by a key person in charge of development aid in the U.K.
Study 2: A World Bank report that claims that gender equality will add $160.2 trillion to global wealth has been heavily cited by the institution’s top management.
These numbers differ, in part because one tries to capture income, and the other tries to capture wealth, but the deeper problem is that both studies are extrapolating from current to imagined worlds. In 1963, Martin Feldstein pointed out that estimates of cost studies (cost of illness, cost of crime, cost of drugs) were fundamentally incorrect because they did not evaluate a viable policy counterfactual. They assumed that the offending activity was removed at zero cost. This is in sharp contrast to specific evaluations, which focus on policies attempting to decrease crime or drug-use and carefully compute the benefits and costs of these programs. Not surprisingly, numbers emerging from the second type of study are much smaller than imagined extrapolations plucked out of thin air. It is worth quoting from Feldstein’s review of Weisbrod’s “Economics of Public Health: Measuring the Economic Impact of Diseases”:
Lead Economist, Development Research Group - World Bank
“The method Dr. Weisbrod advances for calculating the money losses to the nation caused by diseases (direct costs of care and indirect losses due to absenteeism and premature death), which would be saved by complete elimination or prevention of a disease, is subject to substantial objections on economic and statistical grounds.
But more significantly, this attempt fails because it is irrelevant to the actual problems of public policy. Instead of a method for comparing alternative possible programmes, we are given an algorithm for calculating the benefits of unattainable goals; it may be interesting to know how much more society loses due to cancer than due to tuberculosis, but in the absence of any single “programme” by which we can eliminate either ” disease ” these estimates are of little use. The proposed method omits the fundamental feature of a benefit-cost analysis: a model or production function that links the actions of the decision maker with the benefits that will accrue.”
The following, almost ludicrous example illustrates “what is the ‘cost’ of sleep?” You could compute the hourly wage of everyone in the world, assume that they are sleeping 8 hours a day and “estimate” what would happen if, instead, they slept 7 hours a day, spending the additional hour at work. Assuming an 8-hour workday, that additional hour adds 12.5 percent or (roughly) $12 trillion to global GDP (in PPP terms) of $107 trillion. This is the equivalent of what those two studies do. An alternative is to evaluate a specific policy that leads to one-hour less sleep (e.g., perhaps loud alarms?) and evaluate what happens to actual GDP. My guess is that the impacts would be lower. For gender equality, the equivalent studies would be like those reported by Hsieh et al. as well as the World Bank’s own World Development Report on gender (of which I was a team member).
But the result, “One hour less sleep will increase global GDP by $12 trillion”, will grab more headlines.
So, what’s the problem?
When politicians cite such reports as credible evidence, it devalues a research process that can take years, from collecting data to constant revisions and refinements that respond to comments from colleagues, reviewers, and editors. It is long, but each step forces researchers to comprehend the underlying factors at play and nuance their conclusions, ultimately lending greater credibility to the work. But if the powers that be signal that all of this is not necessary—reports that are not subject to the scrutiny of our peers are fine if we produce some big numbers and make sure that we have nice visuals—that is eventually the direction in which some “research” will turn. Although many will buck the trend, the danger is that less and less funding will be directed to them. This will create an even larger gulf between academic researchers and “large-number” report producers, as the difference between the smaller successes of the former and the trillion dollar claims of the latter eventually clash. In the long run, they will all be painted with the same brush of “fake news” and the “policy-based evidence” crowd will have won another victory.
Irresponsible citations follow primarily from the difficulty of keeping up with rapidly evolving fields and insufficient translational work. In that spirit, here are some suggestions based on current debates in the field.
1. Wait until peer review and publication: Please do not cite research that has not yet been peer reviewed and published. Publishing is not a guarantee of quality, but speechwriters can—and should—check if the journal is predatory or, alternatively, stick to reputed journals that have been around for a while. If all this is hard to do, why not fund an independent site like www.econofact.org, task it with the responsibility of synthesizing research for public engagements, and cite only from material discussed on the site? This would be a partial solution to the problem of adjudicating quality research and allow the public to understand the source of the cited material. This recommendation may be unpopular with colleagues, as working papers are oft-cited, but I think it is fair, especially since working papers are often heavily revised after the review process.
2. Beware of small samples: The smaller the sample, the more likely that statistically significant results will be qualitatively large when the “true” effect is zero. Suppose the true effect is zero and you do an experiment with 20 people 1000 times. Because small samples lead to large standard errors, the only statistically significant results will be those with large effect sizes. If journals are more likely to publish positive finding (the publication bias), then most big effects with small samples are suspect. A good rule-of-thumb is not to cite small-N research until it has been replicated multiple times. I am not arguing that such results should not be published—they should—but the abundance of caution that is necessary is often lost when they are cited as the next big thing.
3. Cite the failures: Conversely, most large-N research will not produce big numbers; a lot of research will discover that things we thought should work really do not. Politicians seldom cite research documenting the failures of programs and therefore the millions of dollars in bad funding that were saved. There are many, many examples of this and it would do wonders for the morale of researchers who find zero results (as most are wont to do) to understand that this is critical and important for policy. Perhaps it could also help mitigate the publication bias against zero results. (An example is one (wrong) paper showing that institutional deliveries improved health outcomes in India, which has garnered 569 citations according to Google Scholar on July 3, compared to four careful papers showing zero impact of institutional deliveries on health outcomes in India, Malawi, and Rwanda with fewer than 150 cities among them.)
4. Beware of the silver bullet. Nothing excites us more than a silver bullet. Unfortunately, they seldom exist and it takes a while after the first intervention before the shine of that elusive metal wears off. So, yes, there are programs with a lot of promise, but that initial promise should prompt ecstatic calls for further research rather than outright endorsement.
These suggestions are likely controversial. But an atmosphere of giddy certainty surrounding large numbers and big effects is dangerous, especially when later studies that debunk the original miracles typically receive less play—and are therefore less likely to be undertaken in the first place. When I have brought up these speeches with colleagues, the answer is usually an eye-roll, a head shake and the inevitable conclusion: “They are politicians.” But if there is one group that understands the power of words and the importance of signaling intentions and preferences through writing and public speaking, it is precisely the politicians. And we do them a disservice by not pointing out why their citations are problematic. We should work with them to incorporate the latest developments in research into their public engagements. Politicians need to cite responsibly; researchers need to hold them to account when they don’t.