Last week, Tom Kane wrote compellingly about the potential importance of textbooks for student achievement. If you haven’t read his piece yet, do. He surveyed a large sample of teachers, asked what textbooks they used, and related those choices to student achievement. He found substantial variation in textbook effects, larger than the improvement a typical teacher sees in the first three years.
Vox also discussed Kane’s study in their podcast, The Weeds, this week, highlighting the low cost and potentially large return on investment of a textbook-focused research agenda.
This is not the first study to find substantial textbook effects. Rachana Bhatt and Cory Koedel have written two papers using district-level textbook adoption data from Indiana and Florida, and there have been experimental studies as well.
While these studies are important, future work is hampered by the fact that states generally don’t keep track of what books are used in schools and districts. Thus, if you want to find out what books are used, you have to either find a new data source that hasn’t been found yet, or you have to ask schools or districts individually. So that’s what I’m doing.
Over the past year and a half, I’ve been funded by the National Science Foundation and an anonymous foundation (with Cory Koedel in the latter case) to gather textbook adoption data in the five largest U.S. states and analyze it for evidence of textbook effects (among other questions). We’re just coming to the end of our data collection phase. Given Kane’s recent call for more work in this area, I thought it would be useful to present briefly what we’ve learned about the process of trying to pull these data together.
What got me interested in collecting textbook data was the discovery (by a PhD student of mine) of California’s School Accountability Report Cards (SARCs). These SARCs are required by state law, and among other things must contain information supporting the adequacy of curriculum materials. Sadly, the state does not keep the data in a usable format, so the only way to collect them is brute force. Over the past year, my PhD students have supervised a horde of master’s students who are downloading the PDF SARCs, copying the textbook information into Excel, and coding the book titles. This process has produced very good data (identifiable textbook data in about 80 percent of California’s K-8 schools), but we’ve hit a number of road blocks, including:
- The sheer volume of the data is enormous—about 7,400 K-8 schools in California. Because so many districts are non-uniform adopters (each school can make its own choices), we must collect school-level data. I’ve paid for well over 1,000 student hours at this point.
- About 10 percent of schools either defy the state’s requirement to make SARCs available or provide no textbook information.
- There is no uniform reporting requirement, so about ten percent of schools provide textbook information that is unintelligible. For instance, many schools list things like “Houghton Mifflin” as their math books, but Houghton Mifflin currently produces about a half-dozen elementary math series. This lack of clarity calls into question why the data are even reported at all.
- The number of unique books is just plain daunting. In k-8 math, we’ve found over 260 unique books used (after reconciling different versions of the same title) . Certainly most schools seem to buy off the state-approved list, but many buy one-off materials that no one else in the state uses.
But I’m lucky California even has these data, because in Florida, Illinois, and New York, the data don’t exist at all. In those places, if I want the data, I have to ask. So I asked. I had students gather email and addresses for the curriculum leaders in every school district in those states (itself a task) and invited those leaders to fill out a survey. I got about a three percent response rate.
Then a colleague on Twitter suggested I might consider sending districts freedom-of-information (FOI) requests. This idea turned out to be a gem. The FOI requests resulted in over 50 percent response in Illinois and Florida, somewhat less in New York (I’m still tabulating response rates). But here, too, there were challenges:
- The responses came in every format imaginable, including hand-written lists and compiled emails from teachers. Even just the scanning in of the information was time-consuming.
- A non-trivial number of districts denied the request, saying they had no existing documents to provide. Chicago was one prominent denial, which I wrote about elsewhere.
- Some other districts charged me for access to the information. If I’d not had the funds, I wouldn’t have been able to get it (this is their legal right, and it was relatively few districts and relatively little money).
All of these states could learn a lesson from Texas. After we discovered that the state keeps track of textbook purchases by each district, I used a FOI request to the state and obtained these data going back to 2011. The data, while not spotless, are exquisite. They can be cleaned in a few days, not a few years, and they include every book in every subject. Still, we found in these data 313 unique spellings of Houghton Mifflin Harcourt, among other oddities and challenges. Unfortunately for researchers like me, Texas isn’t a Common Core state, so it’s a bit harder to take findings from their data and apply them outside the state.
We will end up soon with some very useful data, with which we can answer many interesting questions. Of course we’ll look at the efficacy question as Kane says, but we also plan to look at things like the equitable distribution of materials, charter/traditional public differences, who’s adopting Common Core-aligned books and who’s not, etc. And there are many opportunities to link to other data, such as Kane’s teacher survey data and RAND’s American Teacher Panel. But to do any of these things, the data need to be usable. And the process we’ve had to go through to get usable data is pretty stunning. Even when we’ve gotten the data, they’ve been incredibly messy (suggesting states that care about this issue need to build systems in place that make for cleaner data—I suggest using books’ ISBN numbers as the backbone of any system). It shouldn’t be that hard to collect usable data. Texas shows it can be done. In this era of common standards, all we need is a few more states to do it so we can learn what’s working and what’s not for kids.
 This material is based upon work supported by the National Science Foundation under Grant No. 1445654. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
 This requirement is the result of 2004 court decision in the Williams v. State of California case, which found that agencies had failed to provide public school students with equal access to instructional materials, safe and decent school facilities, and qualified teachers.
 The data may exist in Florida, but I have not found them. I know these existed in the past because Bhatt and Koedel used them in a previous analysis.