Census Sampling is Dangerous

Edward Glaeser
Edward Glaeser Fred and Eleanor Glimp Professor of Economics - Harvard University

February 15, 2001

America is filled with brand loyalty. Some dial only Nokia phones. Others fly only United. Me, I swear by Census products. As an empirical economist, I use Census Bureau-provided data almost every day of the week, even without a frequent user program.

While I am devout in my admiration for the work of the Census Bureau, none of its regular customers are unaware of the huge problems inherent in large databases. Entering the world of massive data collection means entering a world of half-truths and mismeasurement. Homeowners overstate their housing values. Single cohabitants list themselves as married. Many people get counted twice, and there are some people that the Census misses altogether.

Of course, the undercounting in the Census matters much more than the mismeasurement of other data. As the Census determines the distribution of congressional representatives, errors in the Census are a national issue. If some groups are disproportionately undercounted, then areas where these groups live will get less representation than they deserve. For example, while the 1990 Census reported that 12.05% of adults are African-Americans, the true figure is more like 12.41%.

To address undercounting, the Census Bureau has an adjustment mechanism called the Accuracy and Coverage Evaluation. The idea is quite simple—after the first census, the bureau randomly selects particular locations (technically, 11,800 block clusters with 314,000 housing units) and then intensively revisits them. Using this second survey, the Census estimates the number of people who were missed by the first census in various population subgroups (e.g., white males between 19 and 24 living in rural areas). Then the Census uses these undercount estimates to adjust its population estimates. Thus, if 5% of the African-American population is estimated to be missed, the population count for this group will be raised appropriately.

Critics of these techniques argue that they introduce their own errors. This is true, but in ideal circumstances, this procedure will surely improve things.

Unfortunately, statistical adjustment also gives much greater discretion to the Census Bureau. The correction procedure is based on population subgroups, and choosing them is very subjective. Do we treat all young urban black males as a subgroup or do we separate them by region? How many ethnic groups do we want to treat as distinct?

This leads to a general point: As you allow for more statistical sophistication, you put more discretion in the hands of the statistician. If you ask a researcher for the average level of earnings in a sample of employees you can be pretty sure what you are getting. If you tell him to use his judgment and use statistical techniques to correct for any data problems that may exist, he can pretty much tell you anything that he wants. Statistically adjusting the Census isn’t inherently biased, but it does leave room for more Census officials to make decisions that will end up determining who gets U.S. representatives.

This is ultimately the huge problem with allowing statisticians to “solve” the undercounting problem. First, as a social scientist, I am pretty sure that I don’t want social scientists to have that much power. We have our own biases and are surely not representative of the country as a whole.

Second, and more importantly, even if the experts at the Census Bureau really do represent the country, giving the Census this much discretionary power presents a temptation to every politically interested group to interfere with the Census process. Special interest groups will lobby. Position papers on how to “improve” sampling techniques will surely multiply. Other groups may sue. After all, every adjustment decision can be cast as a violation of someone’s rights. The president may himself succumb and attempt to manipulate the bureau through executive orders. If the Census is given political power, it will surely become more politicized. Statistical improvements to the census will surely eventually end up favoring the powerful.

The Census has been a bulwark of democracy for 210 years, but its position will be endangered if it is given this kind of political authority. Undercounting is unfortunate, but it is a problem that is diminishing over time. In 1940, 5.4% of the population was not counted in the Census. In 1990, 1.8% of the population was missed. Yesterday, the Census announced that no more than 1.4% of the population was missed in the 2000 Census. Let’s not politicize the Census in our zeal to fix this problem.