Measuring racism and discrimination in economic data

Miniatures of people walking on a bar graph.

Although researchers in economics are increasingly cognizant that race and ethnicity are key determinants of economic outcomes, credibly assessing potential causes and identifying solutions is often complicated by the lack of high-quality data. The typical economist’s work primarily focuses on proposing relationships and testing for causal mechanisms across a broad set of economic phenomena. The study of race and the consequences of race in market interactions have long been hampered by the relative lack of longitudinal data collected on relevant markers of discrimination, racism, and related long-term outcomes.

In recognition of these limitations, the American Economic Association Committee on Economic Statistics, in collaboration with the National Economic Association, convened an expert panel comprised of academics, statistical agency officials and non-profit organizations at a special joint session at the Allied Social Sciences Associations 2021 meeting in January. Their principal goal was to assess the state of federal statistics data and its ability to document key racial disparities. The participants also discussed challenges to access and collect administrative data, possibly linked across agencies. These types of data may help facilitate new research and identify where the most pressing problems persist and how they can be addressed. The panel also discussed a plan of action to expand resources to this type of data collection and sharing.

One important theme that emerged from the discussion was that existing data sets have limited usefulness for identifying the breadth and scope of outcomes and instances of racism. Three main points emerged from this discussion. First, many of the standard data sets such as the National Longitudinal Study of Youth (NLSY), Survey of Consumer Finance (SCF) and the Panel Study of Income Dynamics (PSID) do not provide adequate observations for groups such as Asians, American Indians and, in some cases, Hispanics. Therefore, there is a need for the expansion of existing data sets with oversamples of certain groups that are not typically included in existing longitudinal studies; without this type of data, there will continue to be a lack of convincing empirical research for certain groups and populations.

A second point relating to the lack of useable data relates to the fact that important aspects of discrimination and racism often go unmeasured in standard data sets. For example, the panel discussed the inability, “to track interactions, transactions, and transformations in which discrimination may be realized.” In particular, this refers to all of the “behind closed door” discussions where hiring, promotion, or loan decisions are made. There was an important assertion that there needs to be additional data collection that allows researchers to identify the differential application rates and denial rates for various economic activities by race and ethnic groups. While undoubtedly difficult to capture, these hidden activities are important in identifying the true rates of discrimination in society. Further work needs to be undertaken to push for novel data collection methods that provide for either direct or indirect measures of these crucial non-market factors and behaviors.

The third point that emerged from this discussion is the emerging opportunities that exist for the linking of administrative data sets at the federal government and perhaps state government levels. These large data sets may possess enough observations so that meaningful disaggregation and analyses by race and ethnic groups can be conducted without the need to collect additional data. There are emerging research and analyses showing the potential for this data source in measuring racism and discrimination. In fact, this activity aligns with a recent Presidential Executive Order (EO) which aims to advance racial equity through the U.S. Federal government. The EO states, “it is therefore the policy of my administration that the Federal Government should pursue a comprehensive approach to advancing equity for all, including people of color and others who have been historically underserved, marginalized, and adversely affected by persistent poverty and inequality.” In particular, the EO provides for an Equitable Data Working Group committee. Comprised of federal officials, the committee is tasked with “identifying inadequacies in existing Federal data collection programs, policies, and infrastructure across agencies, and strategies for addressing any deficiencies identified.” The EO goes on to note that, “…many Federal datasets are not disaggregated by race, ethnicity, gender, disability, income, veteran status, or other key demographic variables. This lack of data has cascading effects and impedes efforts to measure and advance equity.  A first step to promoting equity in Government action is to gather the data necessary to inform that effort.”

In our own work, we have also felt the difficulty of conducting research on discrimination and racism due to the lack of data. In fact, to conduct much of the work we do, we have had to link data across public, private, and confidential-use datasets. These creations were the only way in which it would be possible to conduct research in these areas, even in a limited sense. In Akee’s work on the impact of federal land allotment and home ownership in the early 1900s for Native Americans, Akee (and several research assistants) had to use historical census data to link individuals over time. In other work, we have linked data on an individual’s race or ethnicity to annual income measures in order create measures of income inequality and income mobility over time by race groups. Likewise, Casey’s work (with coauthors) on identifying racial and ethnic price differentials in housing markets and neighborhood sorting behavior combined longitudinal housing transactions data with U.S. Census and Home Mortgage Disclosure Act (HMDA) data. While in each case these data represented an important advance, there were important limitations or omissions that ultimately precluded strong conclusions about the source of these disparities. Notably, these types of longitudinal data are difficult to gain access to, link, and secure usage. Opening opportunities of data access to all levels of research will likely lead to more creative and definitive research in the future.


Akee, Randall. “Land titles and dispossession: Allotment on American Indian reservations.” Journal of Economics, Race, and Policy (2019): 1-21.

Akee, Randall, Maggie R. Jones, and Sonya R. Porter. “Race matters: Income shares, income inequality, and income mobility for all US races.” Demography 56, no. 3 (2019): 999-1021.

Bayer, Patrick, Marcus Casey, Fernando Ferreira, and Robert McMillan. “Racial and ethnic price differentials in the housing market.” Journal of Urban Economics 102 (2017): 91-105.

Biden Executive Order on Advancing Racial Equity and Support for Underserved Communities Through the Federal Government, Jan 20, 2021.

“American Economic Association Committee on Economic Statistics and National Economic Association Joint Session on Measuring the Economic Effects of Systemic Racism and Discrimination: A Summary.” February 2021.  and