Randomized control trials for development? Three problems

Development economists have extensively used randomized control trials (RCTs) as the “gold standard” of evidence for informing development policy. The reason is that, by randomly assigning people to be in the treatment group and control group, you are able to sift away other factors, thereby identifying the causal link between treatment and outcomes. Here are three concerns about the use of this and other methods where the identification strategy, rather than the importance and relevance of the policy question, is the basis of evidence for guiding development policies.

First, there is a systematic bias toward analysis of private goods as opposed to public goods. Private goods are excludable since a seller needs to be paid. They are also the easiest things to evaluate with RCTs because you can tell exactly who did and didn’t get the treatment. While universally important things like genuine public goods (think about clean air or national defense) are very hard to study because you can’t tell who’s getting them and who’s not.

RCTs and similar techniques have recently been used to evaluate policies, including training programs that benefit the trainee or their employer (after a long history using other research methods that had concluded that these programs rarely work, RCTs have “discovered” the same result); bed nets and chlorine tablets (with some legitimate debate over the existence of, but not the size of, external effects); books; food; private-firm production processes; bicycles; job offers (is getting a job you want good for you?); and many other purely private goods. Most of these would not or should not normally be taken seriously as a public policy: it is unclear what market failure they address, what are the externalities, and what in fact is “public” about them at all.

There seems to be an unholy alliance between self-serving politicians and us researchers. They want to give out private goods since the recipient knows to whom to be grateful. We want to give them out since we can more easily separate treatment from control. Neither is helping the economy work better for everyone.

The second problem with RCT-related studies is that the objectives ascribed to peoples’ actions are defined very narrowly and their specific constraints often ignored. A researcher can measure average productivity of fertilizer across plots. While farmers might conceivably be interested in that number (though they will probably want to experiment with it themselves on their own plots of land to get their own marginal productivity), it is possible they are interested in a whole bunch of things, particularly the distribution of returns if they are risk-averse. Maybe they are interested in maximizing expected (or some other variant of) utility, including the consequences of bad years and not just the average. When we decide they are using “too little” fertilizer, we may want to know all the constraints they face and not just those we can conveniently control. And if household production is not strictly separable from consumption, every possible competing use of labor in the household is a relevant concern to determine an individual effect and not an average.

Similarly, we decide (sometimes using RCTs) that people save too little. Perhaps they even said so in a survey. However, did that survey ask whether they would also like to spend more on their children’s education, or dietary diversity, or better doctors, or any other specific reasons? No. But we researchers “know better.” So when we find we can nudge them into saving more, we think we are helping when, perhaps, we are doing them a disservice because of our tunnel vision.

We used to be modest and assume people were solving a problem we didn’t know about and studied their behavior to understand that problem. The maintained hypothesis was that utility maximization (not fertilizer use, not more savings, but actual well-being) was a good place to start since we did not know all their problems. Now we assume we know how to be better at farming than the farmers do, that we can balance their multiple objectives in life (which we either don’t ask about or consider to be errors that are randomized away) better than they can. They raise too many cows (of course, we are also upset when they choose to sell those extra cows when the household head gets sick); they don’t use enough fertilizer (which we discover in other contexts might be due to the lack of insurance we didn’t know about); they don’t visit the doctor as much as they should (though they know exactly how useful those doctors are); they don’t buy insurance (when they’ve been burned by other such schemes in the past). What we have now is the towering arrogance of how much better we would do living their lives than they do.

Third, the jump from positive analysis to normative policy is stunning. As a method for understanding causal relations (we lower a price of fertilizer, farmers buy more), using an RCT is not disturbing (if you were uncertain whether demand curves slope down or up). Taking the next step of advocating a subsidy is.

If we tried to measure the actually relevant concept—the difference between social and private value—then these studies could be made relevant to policy. We could then estimate the net value of a subsidy or the proper amount of subsidy given all the other demands on public resources. Without that difference, interpreting an elasticity is ambiguous and often counterintuitive. Researchers find big jumps in demand for various things near a price of zero and use it as an argument for subsidies (since people are irrationally sensitive or something). PRICE OF ZERO! Who cares? Most functional forms for demand functions are sensitive around zero. If there is no wedge between social and private costs, the right neighborhood to look at is near marginal cost, not zero. Furthermore, if there is a big jump at zero then there is an enormous welfare loss when things are free, as with letting water taps run. Perhaps the sensitivity near zero is a good thing—people may actually have to think whether the good or service is work anything at all.

Rich countries do many paternalistic things, and we are usually grateful that they do. However, we forget that poor countries are poor and that, along with big market failures, there are serious resource and capacity constraints on the governments. So we still have to make serious choices on what to spend our time and effort on to make sure big problems are fixed before tiny ones.

Other concerns I have on the use of such evaluation measures: (1) inferring government program effectiveness from intervention done by nongovernmental organizations under close elite university staff supervision; (2) cheap talk as evidence of a real preference (“you said you wanted to work more even though you never do! No backsies! Those are your words!”); and (3) the possibility that we have a lot of knowledge of the Hawthorne effect in various circumstances and none about the real world. Perhaps for a different post.