Consider the real-world constraints

The learning disaster created by the pandemic is now showing up in assessment data (see, for example, stories here and here and here). Educators are facing an enormous challenge to recover lost ground with students, who are at risk of becoming the ‘COVID generation’ for decades. Whether assessment should be on the docket for redesign as schools emerge from a pandemic is questionable. Poor ventilation in schools, the lack of teacher training on how to deliver remote lessons, and the lack of a digital infrastructure that hampered learning for many lork Dw-income students seem like issues that should be teed up first. Changing how assessment is done will make the pandemic challenge even greater, which educators may find unappealing.

But if we do want to consider ways to improve assessment, then we should recognize that the current assessment system, relying heavily on standardized large-scale tests, fits within many constraints. It’s an efficient solution to the difficult problem of how to fairly evaluate what students have learned in differing classrooms, schools, and districts. The tests have a scientifically sound structure and they can be purchased at a reasonable cost. One might find flaws with the current assessment system, but then one needs to put forward an alternate solution that also fits the constraints, just as buyers looking to purchase a new car should compare alternatives that fit within their budget.

Changes to the current assessment system that William Penuel contemplates focus on having students do more activities and projects, and having these activities and projects be assessed (and proposed) through a neutral cultural lens. These are reasonable “what if” considerations, but let’s consider the constraints.

Does a proposed assessment approach support comparisons among teachers, say, or among schools? It’s easy to see how comparisons would not be supported — let the teachers or schools choose which kinds of activities to assess, or let the students propose them, and comparisons go out the window.

Can a proposed assessments approach be shown to be valid and reliable? An assessment that is invalid and unreliable is like a bathroom scale that gives you an entirely different weight every day. Large-scale standardized tests have high levels of validity and reliability. Questions are designed by teachers and content experts based on standards of what students should learn. The questions are scrutinized for bias (which may have its roots in culture, language, or student backgrounds) and ones that are deemed unbiased are pilot-tested on thousands of students. Results from pilot tests are analyzed extensively. Questions that perform poorly are tossed (including ones for which students of different races or ethnicities show different results). Questions that perform well are incorporated into published tests. An alternate system needs to demonstrate it has these desirable properties.

And can the alternate be done at reasonable cost? Current tests use bubbles because bubbles can be read by machines. Do we want millions of labor hours of teachers and educators spent instead on assessing activities and projects? (There are about 23 million students in grade levels for which federal law requires an annual test.) Maybe we do, but until that approach is costed out, it remains an idea without a budget.

As we emerge from the pandemic and think about ways to move education into the future, we may want to change how we assess students. But we need to make a case that changing assessment is a high priority, and we should not forget how we got to the current system of assessment. It’s not perfect but it works.

This article is an invited response to “Possible futures for equitable educational assessment” by William R. Penuel, part of Kappan‘s Reimagining American Education: Possible Futures series, sponsored by the Spencer Foundation.