Education assessment in the 21st century: Moving beyond traditional methods

Editor's note:

This blog is part of a four-part series on shifting educational measurement to match 21st century skills, covering traditional assessments, new technologies, new skillsets, and pathways to the future. These topics were discussed at the Center for Universal Education’s Annual Research and Policy Symposium on April 5, 2017. You can watch video from the event or listen to audio here.

The United Nations’ Sustainable Development Goals (SDGs) describe the target of achieving inclusive and quality education for all by 2030. As we work to accomplish this goal, we must also face the bigger challenge of not only identifying where children can access education, but how they can benefit from access—an imprecise target. From the perspective of educational measurement, to what extent are we ready and able to assess progress in terms of quality of education?

Traditional educational measurement

When we think about tests in schools, we often picture students shuffling papers at their desks. They fill in short answers to questions, respond to multiple-choice style options, or write brief essays. The majority of their cognitive effort is focused on searching their memory to find appropriate responses to the test items, or applying formulae to familiar problems. This style of educational assessment targets the types of skills that were seen as important throughout the 20th century—the skills of storing relevant information and retrieving it upon demand, often as these processes related to literacy and numeracy.

However, from a measurement perspective, the issues are more complex. Meaningful measurement requires defining what one intends to measure, as well as a consistent system to define the magnitude of what is being measured. This is straightforward for physical measurements, such as weight in pounds and height in inches, but not for cognitive measurements. Although we have been assessing numeracy and literacy skills for over a hundred years, measuring these skills is not as simple as it seems.

Measuring human attributes

Numeracy and literacy are “made-up” concepts. These concepts (known as “constructs” in academic literature) are not tangible objects that can easily be measured by their weight or height. These constructs lack inherent measurement properties independent of human definition. This presents educators with a dilemma. We need to assess student-learning outcomes in order to know what students are ready to learn next. Historically we have relied upon numbers to communicate learning outcomes; however, numbers that are easily applied to properties that exist independently of humans, such as mass and length, do not translate so easily with regard to human characteristics.

When we think about learning or skills, we assume underlying competencies are responsible for particular behaviors. But we cannot see these competencies; we can only see their outcomes. So if we are to measure those competencies, we must examine the outcomes in order to estimate their amount, degree, or quality. This is the challenge: with a huge variety of ways in which competencies might manifest, how do we define a scale to measure outcomes in a way that has consistent meaning? An inch is always an inch, but what is viewed as a correct answer to a question may vary. So what we look for in measurement of these educational constructs are proxies—something that stands for what we are really interested in.

Using proxy measurements

We use proxy measures for many things, physical as well as conceptual. For example, in forensic science, when skeletons are incomplete, the height can be estimated using the length of the arm or leg. These proxies work well, as opposed to say teeth, because they are reasonably accurate and relate closely with height. The quality of our measurements are therefore very much dependent on the quality of the proxies we choose.

Student responses on educational tests are proxies for their competencies and learning, and different types of proxies will be better or worse at revealing the quality of competencies. Here is the crunch: What sorts of proxies are most useful for each skill or competency, and how do we collect these?

The future of educational assessment

Through the last few decades, pen and paper tests have been the main method used to assess educational outcomes. For literacy and numeracy, this makes reasonable sense, since the learning outcome can be demonstrated in much the same way as the applied skill itself is typically demonstrated. However, for other skills of increasing interest in the education world—such as problem solving, critical thinking, collaboration, and creativity—this is less the case.

The challenge is how to proceed from the status quo, where system-level assessment using traditional tests is still seen as using good-enough proxies of academic skill, and where testing processes are implemented using traditional methods that everyone finds convenient, systematic, and cost-effective. In addition, increasing interest in education systems’ implementation of 21st century skills raises new hurdles. If we are interested in supporting students’ acquisition of these skills, we need assessment methods that make the skills themselves explicit—in other words, we need to look for new proxies.