This blog is part of a four-part series on shifting educational measurement to match 21st century skills, covering traditional assessments, new technologies, new skillsets, and pathways to the future. These topics were discussed at the Center for Universal Education’s Annual Research and Policy Symposium on April 5, 2017. You can watch video from the event or listen to audio here.
Measuring skills as they are demonstrated in the real world
The apprenticeship model that dates back to the days of the guilds was effective because the learning and the assessment that followed was identical to how the skills were demonstrated in the real world. Most types of performance assessment reflect this model. The disadvantage is that the learning and assessment are too closely linked. It works for very specific skillsets, but as we move towards transversal skills, performance assessment cannot accommodate the variety or range of possible applications.
At the other end of the assessment spectrum, many large-scale standardized tests (whether of conventional domains or 21st century skills) do not assess targeted skills in way that they would appear in the real world. How many people have ever needed to do time-constrained trigonometric transformations in their careers?
The choice between abstract and concrete (or “authentic”) assessment can be controversial but we believe it’s not mutually exclusive. In the context of assessing complex skill sets, we believe it is more feasible to generalize performance from concrete scenarios than it is to transfer to concrete applications, performance on abstract scenarios. For example, a good engineer with very specialized problem solving skills is more likely to be a good problem solver in other non-engineering tasks than a good gamer with abstract 3D spatial skills is at piloting a plane.
The issue is therefore how to adopt the concepts of rich assessment in the manner of performance assessment, while also addressing measurement concerns.
Better proxies captured in more realistic environments
So-called “stealth assessments” build on the learning approach exemplified by SimScientist and by EcoMUVE. The term refers to embedded assessments that are woven into the fabric of the learning environment. The learning environment can be designed to function similarly to the working environment. As these two environments line up, the greater the relevance of the assessment process, just as in the apprenticeship model. Consider, platforms and tools based on the very popular Portal game can be both engaging and relevant if we are measuring complex skills like critical and creative thinking in traditional school subjects, such as classical mechanics and 3-D geometry concepts. But the transferability of these skills to real scenarios is not clear. Beyond individual assessment, however, such online environments provide us with some important opportunities to understand how complex skills are used, and the consequences of different combinations of these.
There are two main areas where we can leverage technology. One area is in making use of technology to capture and analyze process data through data mining in ways that were not feasible before the advent of the computer age. Process data includes distinct key strokes, mouse movements, and all capturable time-stamped user activities in a digital environment. The process data can be analyzed either discretely (looking at specific markers that can be linked with cognitive processes) or holistically (looking at sets of connected markers, such as sequences of actions, that can be linked to more complex cognitive processes). A specific example of the latter is “pathways” visualization, where complete sequences of actions taken when performing a task are recorded and mapped. The paths are then explored for meaningful markers of complex behaviors. This visualization, a form of data reduction, enables deeper examination of very large-scale data. Various approaches in data processing and analysis can be further strengthened by the increasing capability to embed systems and enable wider nets or connections. Leveraging technology to maximize network connections leads to the next area of promising potential.
Technology also enables distributed assessments through large-scale networks. Thanks to the internet, connectedness is no longer constrained by physical distance. There are still, and perhaps always will be, task outputs that cannot be scored by machines. Humans remain the best judges of complex and creative outputs, but human scoring is constrained by time and availability of expert markers. The concept of distributed scoring relies on peer evaluation. Distributed scoring has the potential to address the challenges of reliability while also enabling large-scale scoring. This is where technology can be leveraged to make peer evaluation stronger, by facilitating a more targeted distribution as well as enabling automated structuring of the scoring process to assist peer reviewers.
The range of approaches that use some form of distributed evaluation is diverse: from traditional systems where the evaluation task is distributed among experts (e.g., the system of journal peer review) to technology-leveraged approaches such as an implementation of a scaffolded peer scoring system that we developed.
Both of these areas, process data mining and distributed assessment, offer rich possibilities for the future of educational measurement. And by developing better measurement approaches and tools, we open up the potential to improve student outcomes on skill sets that will be required for the future.