In recent years, it has become increasingly clear that one of the best ways to build a productive and prosperous society is to start early – that is, before children enter kindergarten – in building children’s foundation for learning, health, and positive behavior. From the U.S. Chambers of Commerce to the National Academy of Sciences, those planning our country’s workforce insist we will need more people, with more diverse skills, to meet the challenges of the future. In response, educators have focused on supporting learning earlier, recognizing that early learning establishes the foundation upon which all future skill development is constructed. Identifying and replicating the most important features of successful pre-K programs in order to optimize this potential is now a national imperative.
A wealth of evidence supports continued efforts to improve and scale up pre-kindergarten (pre-K) programs. This evidence is summarized in a companion report to this evaluation roadmap: “Puzzling It Out: The Current State of Scientific Knowledge on Pre-Kindergarten Effects”.
Designing programs in a way that ensures meaningful, short- and long-term effects requires evaluation of programs over time. This goal was the focus of a series of meetings and discussions among a high-level group of practitioners and researchers with responsibility for and experience with designing, implementing and evaluating pre-k programs across the country. This report reflects the best thinking of this practitioner-research engagement effort.
As you prepare to evaluate a pre-K program, we invite you to draw upon this practice- and research-informed expertise to design early education settings that better support early learning and development. Your careful attention to evaluation will help early education systems from across the country identify the factors that distinguish effective programs from less effective ones and take constructive action to better meet our country’s educational and workforce goals.
We view this work as the equivalent of building a national highway. We must survey and compare local conditions, adapt designs to suit, map and share progress, and identify and resolve im-pediments so our country can get where it needs to go. This document is a guide – or roadmap – for those who are building this educational highway system; we hope it will ensure that we optimize our resources and learn from innovations along the way.
There is much good work to build upon. State-funded pre-K programs have been the focus of nearly two decades of evaluation research. This research has produced a large body of evidence on the immediate impacts of pre-K programs on children’s school achievement and pointed to some good bets about the inputs that produce these impacts.
But there is more you can do to improve existing programs and ensure that the next generation of programs builds upon this evidence. A central finding from the initial phase of pre-K evaluations is that state and local conditions vary widely, which makes it difficult to draw firm conclusions about the effectiveness of pre-K programs across locations. As the “Puzzling it Out” authors concluded, “We lack the kind of specific, reliable, consistent evidence we need to move from early models to refinements and redesigns”. We don’t have the evaluation evidence we need to apply lessons learned from first- to second-generation pre-K programs or from one district or state pre-K program to another. In short, we don’t have the information we need to inform the continuous improvement efforts called for in “Puzzling it Out” that are so essential to fulfilling the promise of pre-k for our nation’s children. It is this challenge that we take on as we attempt to build the next phase of evaluation science on firm ground so that states and school districts can continue to expand and improve their pre-K systems for the benefit of our society.
This roadmap offers direction to states and school districts at varying stages of designing, developing, implementing, and overseeing pre-K programs. It is organized around seven key questions, briefly summarized in this introduction and discussed in more detail in the full re-port. These questions are best addressed as an integrated series of considerations when designing and launching an evaluation so that it produces the most useful information for you and your colleagues across the country. We summarize these key questions, below.
The departure point for any evaluation is clarity in the question(s) you want the evaluation to answer. The questions you want to answer will shape the specific information you seek and other decisions you make. Consider these three broad questions: (a) Are we doing what we planned to do (implementation studies)? (b) Are we doing it well (quality monitoring)? (c) Are we doing it well enough to achieve desired impacts (impact evaluations)? (d) What elements of program design account for the impacts (design research)? There is a logical sequence to these questions: if a program has not yet been implemented fully and with fidelity, there is little value in assessing its quality. And if a program has not yet reached an acceptable level of quality, there is little value in assessing its impacts. Once impacts are documented, replicating or strengthening them requires identifying the active ingredients or “effectiveness factors” that produced them. To wit: transportation officials don’t road-test a highway before it has been graded and paved, and work is constantly underway to improve the materials and methods for building better highways. Similarly, don’t test the impacts of a pre-K program before it has been fully implemented and, when evaluating impacts, be sure to include assessments of program design features that might explain the impacts. The data you collect while assessing program implementation, quality, and impacts will help you interpret and improve the program’s capacity to contribute to children’s learning in both the short- and longer-term.
Specificity is key to all that comes after. One of the biggest challenges we face in securing comparable data from pre-K evaluations conducted across districts and states is the fact that there is no single approach to providing pre-K education. Different states have adopted different models and implemented different systems, and districts within states often adapt models and strategies to meet local needs. Many target pre-K systems to children at risk of poor school performance (usually those in poverty), while others offer pre-K to all 4-year-olds and even 3-year-olds, regardless of their socioeconomic status. Programs also differ by length and location. Some provide full-day programs, others provide half-day programs, and still others provide both. Virtually all states provide pre-K in school-based classrooms, but most also provide pro-grams in Head Start and/or community-based child care settings-often with differing teacher qualifications and reimbursement rates. Funding for pre-K programs is often braided into federal and state child care subsidies as well as funding for other programs, such as those affiliated with Head Start, the Individuals with Disabilities Education Act, and the Every Student Succeeds Act.
Importantly, given the wide variation in pre-K programs across and within states, the first step in designing an evaluation must be to map the landscape of pre-K education in your area. Be sure to answer the following questions: How is it funded? Where is it provided? Which children and families participate in pre-K, for how much time during the school day and year, and with what attendance rates?
Moreover, because of this variation in how the provision of pre-K education is approached in different locales, as well as in program design features such as teacher qualifications and support, and reliance on specific curricula or instructional strategies, approaching pre-K as a monolithic program to be evaluated by a single set of broad questions (e.g., Is it well implemented? Did it work?) will not yield particularly actionable data. The more informative task is to understand the conditions under which pre-K is well implemented, provides quality services, and produces impacts. Thus, understanding the key elements of variation in your pre-K program, as well as “best bet” candidates for design features that may explain your findings, is foundational to designing useful evaluations. Research-practice partnerships can be especially valuable in this context.
Different, though overlapping, research strategies are needed for different evaluation questions, namely those addressing (a) implementation, (b) quality monitoring, (c) impacts, and (d) program design. For questions about implementation and quality, the core design challenges relate to representation and validity. The representation challenge is to obtain data from a sufficiently representative and large sample of settings (and classrooms within settings), while the validity challenge is to ensure the use of assessment tools that capture variation in the key constructs of interest. For questions about impacts and the design elements that produce them, the core challenges relate to causality and counterfactual evidence (i.e., effects that would have arisen anyway in the absence of the pre-K program or model under review). The causality challenge is to provide the most compelling evidence that impacts can be ascribed to the pre-K program or model under study rather than to other factors. The counterfactual challenge is to be as precise as possible in identifying a non-pre-K (or “different” pre-K) comparison group from which sufficient information can be gathered about children’s non-pre-K or other-pre-K experiences. Select a design that best meets these challenges and, at the same time, be sure to collect data that are not subject to bias. Importantly, different participant enrollment strategies (e.g., by lottery, with an age or income cut-off) yield different possibilities for enhancing design strength. Efforts to identify the program elements that account for impacts entails a thorough understanding of key features along which programs vary, as well as current knowledge of elements that are surfacing in other pre-K evaluations as strong candidates for effectiveness factors. Also note: Longitudinal evaluations that address long-term impacts entail additional design considerations (e.g., selection of measures that are appropriate for a span of grades, sample attrition, and how to manage) that should be considered before launching a study of longer-term impacts.
This question is about sampling strategy. The first step is to consider whom your program serves and whether you want to document its effects on specific subgroups. If so, the next step is to determine whether to identify subgroups by participant characteristics (e.g., home language, special needs status, race and gender, degree of economic, or other hardship) or program features (e.g., part-time or full-time schedule; school-based classroom or other setting; number of years children spend in the program). You may want to know, for example, if all children in the program have equal access to well-implemented programs in high-quality settings or if access varies across participants. Subgroup studies require samples of sufficient size and representation as well as measurement tools that are suitable for all participants. Another critical task is identifying the right comparison group. Ideally, the evaluation will compare “apples to apples.” That is, it will compare children who do participate in the pre-K program with similar children who do not – or children who attend pre-K programs that do one thing or have certain features to those who attend programs that do another thing or have different features (e.g., school- vs. community-based programs; programs using one instructional model or curriculum versus another) – so that program participation or model is the main difference between the two groups. Random assignment designs are the best way to ensure apples-to-apples comparisons, but there are other commonly used and well-respected approaches to use when random assignment is not possible. Even with alternative approaches, the collection of pre-test information about children who do and do not participate in pre-K or who participate in different pre-K models prior to enrollment will strengthen your capacity to produce reliable conclusions.
Choosing measures for an evaluation study can be time-consuming and expensive. A good starting place is to familiarize yourself with data that has already been collected (e.g., administration data, school records, testing data, enrollment and financial forms, etc.) and assess its completeness and quality. Then, determine what is missing, keeping your key questions in mind. If your questions are about ensuring access to high-quality pre-K classrooms for all children, you will collect different data than if your questions are about designing classrooms to promote inclusive peer interactions or increasing the odds of third-grade reading proficiency. Draft tightly focused questions to avoid the temptation to collect a little data on a lot of things; instead, do the reverse: collect a lot of data on a few things. It is helpful to think about four buckets of data to collect: (a) child and family characteristics that may affect children’s responses to pre-K programs, (b) characteristics of teachers and other adults in the program who support implementation and program quality, (c) pre-K program design features and dosage, and (d) children’s outcomes tied to pre-K goals and theories of change. Questions that address pre-K implementation and quality monitoring will necessarily focus on pre-K program characteristics and dosage, but information on child and family characteristics will be helpful in interpreting the findings. Questions that address pre-K impacts require coordinating measure-ment from all three buckets. Impact questions that extend beyond outcomes at the end of the pre-K year entail additional data-related considerations, such as pre-K-to-elementary system data linkage and how best to measure your constructs of interest at different ages.
There are many more and various data sources and strategies for obtaining data than you might imagine. Existing administrative and program data (e.g., state, district, and school records, Quality Rating and Improvement System data, data from other systems such as child welfare services or income support offices) are a good place to start, although it is critical to assess the completeness and quality of these data. Prior and ongoing pre-k evaluation efforts in other locales offer fertile ground for data collection approaches to consider (see also the benefits of research-practice partnerships). There are cost, time, training, and intrusion trade-offs to consider when deciding whether to ask parents, teachers, coaches, or principals to provide data; to conduct direct assessments of children; to independently observe classrooms; and so on. In the end, you want to ensure that, having decided what to measure, you next decide how and from whom to collect those data to ensure maximum data quality.
Planning, launching, and seeing an evaluation effort to completion (and then considering its implications for policy, practice, and next stage research) are all essential parts of effective pre-K programming. The feasibility and quality of an evaluation depends on setting up the necessary infrastructure (e.g., implementing ethical research practices and procedures, ensuring adequate staff and clarifying roles, storing and archiving data, setting up advisory committees and review processes, producing reports, etc.). Forging partnerships with local universities and colleges can be helpful in this regard. Policy-practice-research partnerships can also create an evaluation team with a broader collective skill set and lend important external credibility to the findings your efforts produce. Pre-K programs and their evaluations affect many stakeholders, including families, teachers and support staff, principals, superintendents, and other education policymakers. Informing these stakeholder groups about your evaluation at the beginning of the process, and keeping them in the loop as the evaluation proceeds and begins to produce evidence, is not only best practice for strong community relations but will greatly enhance the chances that your evidence will be used for program improvement efforts. And that is, after all, the goal of providing a strong roadmap for your evaluation effort.
We have designed this roadmap for optimizing pre-K programs across the country so that children have a better chance of succeeding in school and beyond. This depends on building a stronger pre-K infrastructure that is based on sound evaluation science. We aim to provide sufficient detail and advice to ensure that future pre-K evaluations will get our country where it needs to go. We view this work as the equivalent of building a national highway. As social scientists who have engaged over many years with local and state policymakers and practitioners to conduct research about state and district pre-K programs, we have struggled with many of the questions addressed here. By sharing a roadmap and the knowledge gained from our experiences, we hope to contribute to construction of a strong, reliable “highway” infrastructure of pre-K programs that better meets our country’s educational and workforce goals.
The author/authors did not receive financial support from any firm or person for this article or from any firm or person with a financial or political interest in this article. Deborah Phillips is a member of the Research Advisory Board of the Committee for Economic Development. Beyond that affiliation, the authors are not currently an officer, director, or board member of any organization with a financial or political interest in this article.