The Education Data Portal: Making federal data accessible to study ‘small’ student populations

A school bus sits parked along a street in the Queens borough of New York January 16, 2013. For the first time in 34 years, New York City bus drivers went on strike, stranding up to 152,000 students in the nation's largest public school system on sleet-soaked Wednesday morning.   REUTERS/Shannon Stapleton (UNITED STATES - Tags: EDUCATION CIVIL UNREST BUSINESS EMPLOYMENT) - GM1E91H00RI01

American students are a diverse group. According to the U.S. Department of Education, in public schools in 2016-17 there were 24.4 million white students, 7.8 million Black students, 13.3 million Hispanic students, 2.6 million Asian students, 184,000 Native Hawaiian or Other Pacific Islander (NHPI) students, 511,000 American Indian or Alaska Native (AIAN) students, and 1.8 million students who reported identifying with two or more races.

Education policy research tends to focus on the largest groups of students, in part because reliable data are often not available on smaller groups of students. But groups that make up small fractions of total enrollment, such as AIAN (1%) and NHPI (0.4%), still include hundreds of thousands of Americans. And groups that are small on average can make up a sizable share in particular schools and communities.

National datasets collected by the federal government are a vital resource for understanding the educational contexts and experiences of all Americans. But, for too long, these data were hard to access. Historical data needed to understand trends over time were buried in antiquated file formats. And even more recent data often required navigating enormous data files and linking records across different sources.

The Education Data Portal, created by my colleagues at the Urban Institute, changes that. It is a one-stop shop for all major national datasets on schools, districts, and colleges, with data in the same format regardless of who collected the data or whether they were collected decades ago or yesterday. The Data Portal makes the data publicly accessible to a range of audiences through a point-and-click online interface called the Education Data Explorer, an application programming interface (API), and packages for Stata and R statistical software.

To illustrate how the Data Portal can be put to use, I quickly gathered data on enrollment by race and ethnicity on all schools in the U.S.—more than 96,000 of them. The figure below shows that the enrollment breakdowns of schools attended by AIAN and NHPI students look very different from schools overall. The average AIAN student attends a school that is 35% AIAN, even though this group of students only makes up 1% of students nationwide. Schools attended by AIANs are quite diverse, on average, with student bodies that are 35% white, 7% Black, and 17% Hispanic.

NHPI students also attend very diverse schools, which are 21% NHPI, 27% white, 23% Hispanic, 14% Asian, and 8% Black, on average. Students at these schools are more likely to identify with two or more racial groups: 6.3%, compared with the national average of 3.6%.

These data tools can be used to dig down to individual schools. For example, the public school in the U.S. with the largest enrollment of AIAN students is Purnell Swett High in North Carolina, where 85% of the 1,707 students are AIAN.

Another new tool from the Urban Institute uses the Education Data Portal’s API to quickly call more than 25 years of enrollment data by race and ethnicity for every public school in the U.S. The figure below shows that student demographics at Purnell Swett have been relatively stable since 1989, although the number of Black students has slowly declined over this period and total enrollment has grown over the last 10 years.


Simple data on student enrollment by race and ethnicity are just the beginning of what is possible with the datasets in the Education Data Portal. Census data indicate that Robeson County, the school district where Purnell Swett High is located, has the ninth-highest poverty rate in North Carolina (out of 115 districts). And Purnell Swett employs one teacher for every 18 students, the second-highest student-teacher ratio of the seven high schools in Robeson County. Swett High suspends more students—27% received at least one out-of-school suspension—than three out of the other four high schools that reported discipline data.

Moving from educational contexts to student outcomes, Purnell Swett students post above-average math scores for the county (24% scored proficient, higher than four out of seven high schools) and below-average reading scores (25% scored proficient, the second-lowest in the county).

Data can be broken down by student characteristics such as race and ethnicity. For example, 26% of AIAN students scored proficient on both math and reading tests, slightly above the school-wide average. And 9% of AIAN students took at least one Advanced Placement course, more than the rate among Black and white students (5% and 8%, respectively).

These data do not tell us what to do to improve outcomes for AIAN students, but they provide important context including information on how resources and outcomes vary across schools. And the nationwide coverage of these data enables analysts to understand the educational experiences of “small” groups of students such as AIANs and NHPIs in ways that are not possible with many other datasets like nationally representative surveys.

These national datasets do have important limitations. They do not tell us everything we want to know about students and schools. And they group students into broad categories such as “Asian” or “Hispanic” that don’t allow analysts to understand the circumstances of more precise subgroups such as Hmong immigrants.

Nonetheless, federal data collections provide a useful national resource on education policy even as they merit continued improvement. Hopefully efforts that make them more accessible will lead to a richer understanding of American education—and especially groups of students who are too often ignored in national policy discussions.