Sections

Research

Evidence on New York City and Boston exam schools

A student take a test in the hallway outside the classroom

New York City is wrestling with what to do with its exam schools. Students at Stuyvesant, Bronx Science, and Brooklyn Tech (the oldest exam schools) perform brilliantly and attend the best colleges. Their students score at the 99th percentile of the state SAT distribution (with Stuyvesant at the 99.9th percentile) and they account for the majority of New York City students attending Harvard, Princeton and Yale. These are by any measure elite schools and are revered as jewels of the city school system. 

But of the 900 freshmen who enrolled at Stuyvesant this past fall, just 10 were black. By state law, admission to these schools is via a specialized, voluntary, admissions test. Mayor Bill de Blasio and others complain that this admissions system perpetuates inequality in opportunity to an excellent education.

A lot of ink has been spilled over the exam schools, in popular news outlets as well as in academic journals. In this piece, I address a narrow but relevant question: the causal impact of these schools on the students who attend them.  Do the exam schools produce academically outstanding graduates, or do they simply admit stellar students and enjoy credit for their successes? I also briefly discuss alternative methods the city could use to dole out scarce seats at these over-subscribed schools.

Understanding the effectiveness of any school is a challenge because parents choose their children’s schools. In many cases, the school a child attends is tied to her address, so a parent effectively chooses a school when she picks a residence. In places like New York and Boston, which have district-wide choice, families can choose from dozens of public schools, including charters, magnets and exam schools. And there are private schools for those who can afford them or who have vouchers to subsidize the cost.

Because parents have choices, some schools are filled with students (say, the children of well-educated, highly-motivated parents) who would perform well in almost any setting. This pattern could mislead us into thinking such schools provide an exemplary education, when the truth is they simply attract strong students.

This is selection bias, the greatest challenge in evaluating the effectiveness of schools. Stuyvesant High School is filled with smart students who might succeed anywhere. When those students do well, is it because of the school or the students or both?

In the case of exam schools, we have selection bias on steroids. Students who enter Stuyvesant have middle-school test scores a full two standard deviations above the city mean – that is, they score higher than 95% of the students in the city’s public schools. How can we possibly disentangle the effect of the exam schools in the face of such massive differences in baseline achievement?

To overcome this challenge, researchers have made use of the tests that make these exam schools. By state law, entrance to the exam schools in New York is determined by a student’s score on the Specialized High School Admissions Test (SHSAT). A student who scores high enough can win admission to Stuyvesant. A slightly lower score will get her into Bronx Science, and so on. Researchers have exploited these cutoffs to estimate the causal impact of exam schools on students’ academic achievement and college attendance.

The research method is called “regression-discontinuity” design. The key to this approach is that it’s essentially random whether someone ends up right above or right below the cutoff. By comparing students just on each side of the cutoff, we can capture the causal impact of the school on student outcomes.

Of course, it’s not at all random that some students have very high scores and others very low scores, and of course more of those with high scores will get into the exam schools. That’s exactly what we see in New York. What regression-discontinuity analysis relies on is the large, discontinuous jump in exam-school attendance right at the cutoff scores. A score a smidgen above the cutoff guarantees admission, while a score a smidgen below yields rejection. These smidgens could be the result of random variation in the test or in how a student is feeling on testing day.

Two sets of economists applied the regression-discontinuity methodology to the study of New York City’s exam schools. Atila Abdulkadiroğlu (Duke University), Joshua Angrist and Parag Pathak (both of Massachusetts Institute of Technology) published “The Elite Illusion: Achievement Effects at Boston and New York Exam Schools” in Econometrica  while Will Dobbie (Princeton) and Roland Fryer (Harvard) published “The Impact of Attending a School with High-Achieving Peers: Evidence from the New York City Exam Schoolsin American Economic Journal: Applied Economics.

What do the researchers conclude? They find a precisely zero effect of the exam schools on college attendance, college selectivity, and college graduation. They put the data through the grinder, and that’s the unexciting result. Findings for Boston’s exam schools are the same, with a bonus finding of zero effect on test scores, including the SAT and PSAT. The authors note that it is still possible that the schools affect outcomes later in life, such as employment or wealth. But, if so, any such effect does not operate through attendance at an elite college.

What do the researchers conclude? They find a precisely zero effect of the exam schools on college attendance, college selectivity, and college graduation.

These null results take a lot of the air out of the wrought discussions about the exam schools as gateways to economic opportunity. At least for the students just on the margin of admission to exam schools, the schools have no measurable effect on academic achievement or postsecondary outcomes. These students may well be happier, more engaged, or safer at these schools. But it is surprising we don’t see effects where so many expected them.

While a strength of the regression-discontinuity design is that we obtain causal effects for students who are just on the margin of admission, a weakness is that we can’t estimate effects for students who were certain of admission (the very top students) or those who don’t bother to apply under the current admissions regime.

The city, or at least the mayor, would like to diversify the exam schools. How can schools for gifted students be diversified?   Fortunately, we have a lot of excellent research on this question.

The current admissions approach almost certainly shuts out many gifted, disadvantaged students. When we rely on parents, teachers, or students to make the decision to apply to a program for gifted students (by, for example, voluntarily signing up for a test), evidence indicates it is disadvantaged students who disproportionately get shut out.

But getting rid of the test is not the answer.  Well-educated, high-income parents work the system to get their kids into these programs. The less transparent the approach (e.g., portfolios or teacher recommendations instead of a standardized test) the greater the advantage these savvy, connected parents have in winning the game.

An important step is to make the test universal, rather than one that students choose to take. In the dozen states where college admissions tests are universal (free, required, and given during school hours), many more students take the test and go on to college. The democratizing effect is strongest among low-income and nonwhite students. The same dynamic holds among young children: when testing for giftedness is universal, poor, Black and Hispanic children are far more likely to end up in gifted classes. A school district in Florida showed huge increases in the diversity of its gifted programs when it shifted to using a universal test, rather than recommendations from parents and teachers, to identify gifted students.

Rather than force students to take yet another test, New York could use its existing 7th– and 8th-grade tests to determine admission to the exam schools. These tests are, in principle, aligned to what is taught in the schools and so are an appropriate metric by which to judge student achievement.  When so many are complaining about over-testing, why have yet another test for students to cram and sit for?

The city could go further toward diversifying the student body by admitting the top scorers at each middle school to the exam schools. Texas uses this approach to determine admission to the University of Texas flagships: the top slice (originally 10%, now lower) of students in each high school is automatically admitted to these selective colleges. This ensures that Texas’s elite colleges at least partially reflect the economic, ethnic and racial diversity of the state’s (highly segregated) school system. 

This “top 10%” approach could lead some parents to scramble to enroll their children at lower-performing schools, where their kids are more likely to score at the top. This effect was indeed observed in Texas. This wouldn’t necessarily be a bad outcome, since it helps to integrate the system racially and economically.

Some might object that the standardized tests given to all students are insufficiently challenging to pick out the academic elite suited for the exam schools.  This is called a “ceiling effect,” where a test can’t distinguish among high achievers and super-high-achievers. This is a plausible theory, but the data don’t support this hypothesis in New York. According to the teams who conducted the analyses discussed earlier, students at Brooklyn Tech score about 1.5 standard deviations above the rest of the city, which is within the normal, measurable variation of the city’s standardized test. Even at Stuyvesant, students are within two standard deviations of the city on middle-school tests.

If the schools and city are intent upon keeping the specialized admissions test, they could administer it on a school day to all students who score above a given threshold on the universal middle-school tests.

New York City has a lot to grapple with in deciding the fate of its exam schools. Taking into account the scientific evidence on their performance would be a terrific way forward.


The author did not receive any financial support from any firm or person for this article or from any firm or person with a financial or political interest in this article. She is currently not an officer, director, or board member of any organization with an interest in this article.

Authors