The Year of the Asterisk? California's Testing Proposal Subverts Test-Based Accountability

As states plan for new Common-Core-aligned student assessments in the spring of 2015, policymakers are struggling to transition their testing and accountability programs. Last week, California legislators took an unprecedented step: they voted to discontinue their old test and conduct a statewide field test of the Smarter Balanced Assessment in the spring of 2014. More controversially, they will exclude many eligible students from the testing and withhold scores from school districts, teachers or parents. This aspect of the plan has drawn warnings from the U.S. Department of Education, putting the state’s $1.5 billion in annual federal funding for Title I at risk.

It is understandable that Secretary Duncan would be eager to discourage such efforts. There are many states in a similar position to California in that they are planning to participate in a field test with either the Smarter Balanced or the Partnership for Assessment of Readiness for College and Careers (PARCC) consortium in the spring. If Secretary Duncan were to grant a waiver to California for its current plan, other states would be likely to follow suit. Contrary to the perceptions of lawmakers and administrators in California, this is not simply a grudge match between the California Department of Education and Secretary Duncan (who have quarreled before). On the contrary, this dispute has broader implications for test-based accountability nationwide.

On September 17, the U.S. Department of Education issued new guidance to states, signaling that they could expect a waiver to substitute either of the field tests for their existing state tests, as long as eligible students in the state took both an English and math assessment. In a significant concession, states would not be required to report scores for those students taking the field test. For accountability purposes, field test schools could use their 2012-13 scores in the relevant grades. Implicit in the guidance is the assumption that states will administer the field test to a limited number of schools or classrooms.

At least until California announced its plans recently, most of the Smarter Balanced consortium members were planning to do just that: administer the field test to a 10 percent sample of students and keep the existing tests for another year for the vast majority of students.

But many in California (and perhaps in other states) will be dissatisfied with the new guidance. They will argue that it makes more sense to set sail for the new standards now, to allow teachers and schools to begin preparing for the higher expectations of the Common Core. At the same time, they argue that it would be unfair to hold teachers, schools and students accountable for their performance during this interim period. If they choose to follow this course, they should redesign their field tests to function as statewide assessments, and provide scores back to students and schools.

The California Plan

The leaders in Sacramento have taken what was designed to be a limited field test of the new assessment and have attempted to stretch it to cover the length and breadth of California. It doesn’t fit, because the field test was never designed to be a statewide assessment.

Many others have pointed to the proposal’s flaws: school districts that are not prepared for the electronic version may not have a paper-and-pencil option; many students will take either the math or ELA exam, but not both.

There are other issues, though, that have drawn surprisingly little attention. For instance, it’s not clear how much teachers, students and parents will learn during this interim year if their results are withheld from them. In basketball, a “practice shot” is beneficial only when you see whether or not the ball goes in. Under the proposal, the state is asking students and teachers to take practice shots in the dark.

Perhaps more importantly, for the purposes of the field test, test designers only need responses from 10 percent of California students in each subject area. (If students took both tests, this would be 10 percent of students; if students took either ELA or math (but not both), this would be 20 percent of California students.) It’s not clear that the test developers would even score the remaining tests! So there could be millions of students in California who spend 3.5 hours taking a subject test that will never be scored. (Given that the scores would be withheld anyway, I suppose the only thing more irrational would be to go to the great expense of scoring all the tests and then withholding the scores.)

Redesigning the Field Test to Function Statewide

Those who argue for an earlier transition to a Common Core aligned assessment have a point. They also have a point that a moratorium on high stakes might be appropriate during a transition year. However, it would be a waste not to use the transition year to tell students, teachers and parents where they stand relative to the new standards, and that means giving all students the opportunity to participate in the test, scoring all their tests and providing them with a score.

Here’s one option which would be available now: (i) Administer the new assessments to all eligible students; (ii) Score the assessments for a randomly chosen 10 percent of students; (iii) Estimate the item parameters and weed out the items which did not perform as expected; (iv) Go back and score the remaining tests for the remaining 90 percent of students; (v) Provide scaled scores back to school districts, parents and teachers.

Apparently, one state partner in the Smarter Balanced consortium is considering this option (although its name has been withheld pending a final decision).

However, the leaders of the Smarter Balanced consortium have been discouraging states from doing so. Why? There are two reasons. First, the field test has been designed to test individual items, not to provide reliable scaled scores for individual students. One of the key considerations in designing a field test is the number of common items shared across the multiple forms. The larger the common-item block, the more reliable will be the scaled score for individual students. However, when the number of common items is large, the sample size available for the non-common items shrinks. When there are many items to test, and a limited number of students available, test designers must limit the size of the common-item block.

In 2015, the Smarter Balanced test will have an adaptive design and adaptive tests require many items (because each student will be receiving items tailored to their achievement level). As a result, to test a larger number of items, designers were forced to use a relatively small, 10-item common block, on the field test.

The small number of common items makes the test developers nervous about the resulting student-level scores. It is still possible to generate a scaled score for each student. And the scaled score will provide information about a student’s likely score on the future Smarter Balanced assessments. But it will not be as reliable as the final assessment will be.

But, of course, the current design of the field test is based on the expectation that only 10 percent of the student population will be taking the test. If the field test were to be 10 times larger, it would afford a different design, with a larger set of common items, more forms, fewer non-common items on each form. A typical field test is designed simply to test item parameters. But, by aiming to give students and teachers a head start on the Common Core, the state is asking for something the typical field test design cannot deliver. A redesigned statewide trial of the Smarter Balanced assessment could both serve the typical function of a field test and generate more reliable student-level scores.

The second reason the Smarter Balanced consortium has been discouraging this approach is political. The Smarter Balanced leadership is primarily interested in estimating item parameters, not in generating student level scores. The informational value for California students, teachers and parents is not their primary concern. In addition, if they describe a plan for generating student-level scores as workable, they risk alienating the organized interests in California who relish the prospect of a year without test scores (even if it’s not a year a without testing!). Therefore, it has become politically expedient for them to claim that any scores would be “meaningless,” even if they know better.

Therefore, if California or another state were eager to accelerate the transition to the Common Core, it should not try to stretch a limited field test to serve statewide, it should redesign the field test, weed out the poorly functioning items and produce student-level scaled scores achieving a minimal level of reliability. The question of how those tests are used—and whether any waiver allows a moratorium on the use of those tests for high stakes purposes—is a separate issue. Absent such a break through, the 2014 may become the year of the asterisk in CA and, potentially, other states as well.

The Year of the Asterisk? California’s Testing Proposal Subverts Test-Based Accountability

The Year of the Asterisk? California’s Testing Proposal Subverts Test-Based Accountability

Thomas J. Kane

The California Plan

Redesigning the Field Test to Function Statewide

The Year of the Asterisk? California’s Testing Proposal Subverts Test-Based Accountability

Subscribe to the Brown Center on Education Policy Newsletter

The Year of the Asterisk? California’s Testing Proposal Subverts Test-Based Accountability

Thomas J. Kane Thomas J. Kane Walter H. Gale Professor of Education and Economics - Harvard University

The California Plan

Redesigning the Field Test to Function Statewide

Thomas J. Kane