Student data privacy and education research must be balanced

Female students work on computers.

Editor’s Note:
This post originally appeared on U.S. News and World Report’s

Knowledge Bank


Last week, the U.S. House Committee on Education and the Workforce held a hearing on data privacy protections for students. At issue is whether and how Congress will update the decades-old Family Educational Rights and Privacy Act, commonly known as FERPA, for use in the modern age where big data is king. It’s a federal law that protects the privacy of student education records, and which governs the state and local education agencies that collect and maintain data on their students.

I feel compelled to write on this topic to bring attention to the gravity of these debates. Education research relies heavily on student data, thus any changes to the law could have very serious implications for both the research industry and those who rely on it to make education policy decisions. At the same time, I must also acknowledge that I am not an unbiased commentator on this issue, as just about every research study I have conducted has utilized student data made available to me as a researcher from state or district databases. With that caveat aside, though, let’s discuss the issues.

To begin, what are the underlying public concerns driving the demand for greater privacy? According to the testimony of Rachael Stickland, co-founder and co-chair of the Parent Coalition for Student Privacy, presented during the committee hearing, these concerns include: providing access liberally to various organizations with no interest in helping to improve the education system, collecting too much unnecessary data, and the vulnerability of any sensitive data to security breaches.

Reading between the lines, I also see manifestations of broader public anxieties currently active in the public debate: worries about overtesting and parents’ rights to opt out (whether from the test or the data system); concerns that the Common Core State Standards could lead to a common core of student data records across states; and government intrusion into local affairs where states and the feds trump autonomies historically respected at the school and district level. Also, the specter of commercial interests trying to find yet another way to creep into the classroom (aside from charter schools, standardized tests and billionaire philanthropies, of course) puts many parents on edge.

Though I do not wish to minimize the very real concerns of parents, the threat to research quality is also real. Last month, the Association for Education Finance and Policy held its annual research conference. And this year’s theme: “The perils of research irrelevance: Balancing data use against privacy concerns.” This was not hyperbole, as 42 percent of the 544 research studies presented during the three-day conference reportedly used education data from administrative agencies.

During the conference, a general session featured a panel moderated by the association’s President-Elect Dan Goldhaber (American Institutes for Research and University of Washington), and included Stickland along with Aimee Guidera (Data Quality Campaign) and Shayne Spalten (Schusterman Family Foundation), all of whom engaged in a lively debate about the balance between privacy and research concerns. In this session, Guidera noted, “data is not valuable until it is converted into information,” emphasizing the role of researchers in facilitating that conversion process. She further encouraged researchers to stay engaged in this critically important issue.

So allow me to offer some counterarguments from the researcher perspective, as well as demonstrate some of the assurances already built into the system. First, I feel it’s important to note that personally identifiable data is generally not necessary for education policy research. In fact, states and districts take great care not to unnecessarily disclose identifying information like student names, birthdates and Social Security numbers when they provide external researchers with data access.

Rather, randomized research identifiers linking data elements (representing students, classrooms and teachers, etc.) are provided in lieu of real identifiers. On this point, Goldhaber remarked during the conference panel discussion, “if a parent were to ask me for their student’s data, I’d have no way to locate it.” Deductive identification – where identification is possible indirectly using clues from the data – could still impose a small risk, but research norms suppress findings in small cell sizes when published. As a consequence of these combined practices, using anonymized administrative data is commonly categorized as imposing “minimal risk” on the human subjects involved in the study.

On a related note, in my experience states and districts do not share data with outside organizations liberally; in fact, accessing data is one of the most formidable hurdles in my job. Agencies have rigorous procedures for vetting external researchers’ proposed projects before granting access. These procedures require not only review from the researchers’ institutional review board to ensure human subjects are not being unduly harmed but also allows the state or district to be selective about which research questions are addressed to ensure they are of sufficient interest to the agency before approval. Not only that, but they also execute data use agreements that limit what researchers can do with the data, which generally stipulates requirements for data security, publication of findings and data destruction.

There’s also an efficiency argument to be made here, which is one of the key points Jane Hannaway of Georgetown University made in her testimony to the House committee. That is, research done using existing data resources that were already collected for other purposes provide a secondary use for this resource, and this secondary use implies findings from that research are less costly and more timely than they’d otherwise be.

All in the debate acknowledge the need to update student privacy laws to reflect current realities of digital data and cybersecurity threats. Maintaining parents’ trust requires that we protect their students’ privacy, and fostering an environment of research inquiry is a critical element in the feedback loop on our public institutions. I would argue that the best data practices already balance these two competing interests quite well. Specifically, the U.S. Department of Education offers guidance over data security, which includes best practices when releasing data to researchers through their Privacy Technical Assistance Center. Research safeguards and reviews at the institutional level already provide one layer of protection; state and district reviews and data use agreements add another.

Changes to the law to make these best practices standard across education agencies would be helpful to both assuage parent concerns while keeping the research feedback loop open. I am optimistic that standardized, sensible safeguards can both maintain student privacy without making research irrelevant.