Scaling up social policies

The Airbus A380 project to build the world’s largest passenger airliner was launched in 1990. By the A380’s first commercial flight in October 2007, the project budget had reached €30 billion, and Airbus had conducted 47,500 test-flight cycles—2.5 times the number of flights an A380 would make in 25 years of operation. Flight testing and evaluation represent about a third of the total production costs in the aircraft industry.

Such robust, large-scale testing is hardly conducted with social policies, despite their massive scale and impact. In contrast to the airline industry, the attendant budget for research and development is tiny. Yet, the adverse impact of an untested social policy could be more damaging than crashing the A380.

Consider Universal Basic Income (UBI), a hot topic in high-level global forums. Leaders across the political spectrum and top economic experts from the United States, India, Germany, Brazil, Kenya, and Namibia see UBI as a transformational social policy for reducing poverty, improving health and education outcomes, and building more equitable societies. In August 2020, Germany started the pilot project Grundeinkommen (Basic Income Pilot Project), which provided 122 participants with €1,200 a month for three years. The project, initiated by a Berlin-based NGO, cost €5.2 million. Only 13 countries have conducted experiments on the effectiveness and impact of UBI. These experiments have included 125 people in Stockton, California; about 1,000 in Namibia; 2,000 in Finland; and 4,000 in Canada. The largest UBI experiment in Kenya, which grew out of smaller trials by the charity GiveDirect, involved just 21,000 people and a budget of about $30 million. There have been no large-scale pilots that allow one to evaluate the longer-term consequences of these programs.

The small scale of these evaluations is typical not only of UBI pilots but of many social policies that may carry large unintended consequences when implemented at scale. Funding for a new social program could come from cutting the existing programs; such reallocations might result in significant losses in well-being on a national and global scale. Even minor alterations in the design of social programs could lead to substantial monetary losses. If Germany adopted UBI nationwide and paid every adult German citizen €1,200 a month, the program would cost almost €1 trillion annually. Increasing the monthly UBI by €100 would cost Germany another €100 billion.

The complexity and multifaceted, interdependent, and deep-rooted impacts of the processes unleashed by introducing a new social policy far exceed the complexity of designing an airplane. The Lucas critique—the notion that it is naive to predict the effects of a change in economic policy entirely based on relationships observed in historical data—means that modeling social policies could be more challenging than modeling engineering projects. A recent paper published in the American Economic Review by Daruich and Fernandez (2024) highlights potential issues with evaluating large policy changes using small-scale and short-run experimental settings. The researchers argue that such an approach fails to capture the longer-run general equilibrium consequences. Specifically, the universality and generous payments implied by Universal Basic Income (UBI) may have significant longer-run intergenerational impacts that randomized control trials or natural experiments cannot capture.

Why, then, do so few governments engage in large-scale testing and modeling of prototypes before scaling up policies? Why are multibillion-dollar programs rolled out with little evidence that they will work?

First, public officials have little incentive to pilot policies. An evaluation that shows no or negative results consumes resources but brings no electoral mileage. It might position the politician as indecisive and even ignorant compared with more assertive competitors who already “have the right answers.” Moreover, unlike an airplane, a social program will not “crash.” It might fail, but that failure could be attributed to exogenous factors, relieving its backers from responsibility. Evaluation of policy options also takes time—a very scarce political resource.  

In addition, scientific evidence is just one input into the political process. Given the forces of inertia, momentum, expediency, ideology, and finance, it is unclear how influential it is for political decisions. Recent debates about the effects of the minimum wage and migration demonstrate that even a large volume of consistent scientific evidence may be insufficient to convince voters to support a policy. The ability of politicians to select evaluations that confirm their political agenda could bias the results and create a “scientific” cover for politically motivated policy choices. Evidence-based policies morph into policy-based evidence. This problem could be especially acute in developing countries with less government transparency and fewer automatic checks and balances. There is also the risk of “researcher capture”— researchers whose funding depends on maintaining good relationships with donors may have weak incentives for objectivity. They may avoid focusing on politically sensitive issues lest they jeopardize future partnerships.

The societal gains from evidence-based policies could be enormous. What can be done to incentivize politicians and public officials to use more empiricism in their decisionmaking?

There are some encouraging examples: In China and Korea, the central government initiates many policy pilots to encourage local governments’ competition and incentivize innovation among public officials. Singapore moved away from the agency-centric policy evaluation to the collaborative governance model based on the contribution of the government, the private sector, and civil society for higher public value at a lower cost. These “collaborative” pilots help create a transparent environment where failures are easily managed and welcomed as opportunities to learn new ways to address complex policy challenges. The choice between rigorous randomized control trials and policy pilots can be seen as a trade-off between carefully measuring the impact of a well-defined but narrow intervention and understanding the interaction of a broader set of policies and behaviors.

The  World Bank President Ajay Banga advocates for a focus on development projects that  are “scalable, replicable.” To promote this agenda, large-scale policy experimentation can be better supported by Multilateral Development Institutions (MDI), especially in low-income settings. MDIs could promote public subsidization of social policy innovations that use rigorous ex-ante evidence to identify high-impact policies. MDIs can also help diversify risks through knowledge transfers and contribute to creating evidence as a global public good.