New visions don’t fit in old structures: A response to Dynarski and Santelises

New visions for educational assessment inevitably provoke questions that are grounded in beliefs about the benefits of our current accountability-driven system. Mark Dynarski and Sonja Santelises both raise important concerns about what might be lost if we changed that system. For the past two decades, accountability assessments supported comparisons among students, schools, and students. These tests promised to shine a light on racial inequities and thereby ensure equal access to ambitious learning goals for all students. We will come back to the vision outlined in the original article, but first we address Dynarski and Santelises’s concerns.

We agree with Santelises that our public schools ought to help all students meet broad aims for education. But one problem with today’s approach to accountability is that testing has significantly narrowed the curriculum (Au, 2007). Because only reading and mathematics are tested, students get less time for social studies, science, and the arts. And because many schools affected by accountability pressures are schools with high concentrations of students of color, the narrowing of curriculum can exacerbate inequities in opportunities to learn (Diamond & Spillane, 2004).

Despite good intentions, the current test-based accountability system alone cannot achieve civil rights goals. Since the passage of No Child Left Behind in 2001, scores on the National Assessment of Educational Progress (NAEP) have improved only modestly, and achievement gaps have not narrowed appreciably. Further, reviews of test-based accountability policies show that they have had minimal effects on achievement (National Research Council, 2011). To the extent that NAEP scores have risen at all, the gains are more readily attributable to improvements in instructional capacity and teacher salary levels (Lee & Reeves, 2012).

Dynarski lauds the efficiency of today’s tests as a reason to advocate for their continued use. But he does not consider the downsides of curriculum narrowing. Nor does he consider the fact that while state testing is already enormously expensive, many school districts spend millions of additional dollars on interim tests that mimic state tests. While these interim tests can predict later student performance on state tests, they do nothing to show teachers what to do if their students are struggling. In fact, rigorous studies have failed to show that initiatives to promote the formative use of test score data from interim tests can improve achievement (e.g., Supovitz & Sirinides, 2018). More worrisome still, researchers have documented many instances in which this kind of data-use has reinforced deficit-based views of students, effectively blaming them and their communities for disappointing test scores (Bertrand & Marsh, 2021).

We have argued that a key reason test-score data cannot serve as a useful basis for improving learning outcomes is that most state test designs are not based on a coherent, defensible theory of learning and how to support it (Penuel & Shepard, 2016). High-quality academic standards in literacy emphasize writing for authentic audiences; in mathematics, students are expected to become proficient in constructing viable arguments and critiquing the reasoning of others; and in science, they should be able to communicate clearly and persuasively the ideas and methods they generate. These kinds of sophisticated thinking and reasoning practices simply cannot be captured by multiple-choice test questions asking students to pick the right answers. Today’s tests do not, in fact, provide a valid means of gauging whether students have met commonly held goals for learning. (Dynarski argues that such tests are reliable, but they are reliable only in that they can reliably distinguish between proficiency levels. He also claims they are unbiased, but the kinds of analyses performed on differential item functioning provide insufficient evidence for lack of bias.)

While we agree with Santelises that it is important to assess progress toward standards, we offer a different vision for how to do so. This requires being very clear about what locally designed or selected curriculum-embedded assessments can do, distinct from what state tests can do to provide an external check. A system of assessment that is closely tied to curriculum can improve equitable outcomes by providing teachers with clear guidance about what to do with the assessment information they gather. (The culminating task from our inquiryHub biology curriculum presented in the original article is an example of such an assessment.) There is evidence that such an approach can work, too. In a quasi-experimental study of a professional learning program designed to help teachers make formative use of curriculum-embedded assessments to adjust their teaching (Penuel et al., 2017), we found that students in these teachers’ classrooms performed better on an independent test of science achievement than students in comparison classrooms. Further, their teachers successfully changed their instructional practices in response to the materials and related professional development.

In short, an assessment system that is embedded in a locally designed or selected curriculum has distinct advantages over today’s state tests. When analyzed at the classroom or school level, the resulting assessment data can provide clear and useful insights into individual students’ thinking (both the problematic ideas they might hold and resources to build upon), pointing to specific goals for further teaching and learning (Campbell, Schwarz, & Windschitl, 2016). By analyzing student work at the district or consortium level, we can show where the curriculum (or specific lessons and materials) is failing to help students meet particular standards, or perhaps shortchanging whole groups of students. And when combined with research on how teachers enact the curriculum (and how students experience it), the data can identify the specific kinds of professional supports teachers need in order to provide more effective and equitable instruction (e.g., Krumm et al., 2020).

Building such a system requires political will and a vision to walk through the portal that the pandemic has offered us. It will require redirecting the billions of dollars now spent on testing, shifting it toward the development and ongoing refinement of high-quality instructional materials. It will require more investment in teacher learning. And it will require us to give state tests a much more limited role (testing smaller, random samples of students, as a means of checking local claims about student proficiency, much in the spirit of what Santelises envisions). We don’t imagine the move to such a system will be easy, but it will go a long way to building the kind of equitable schools and classrooms we aspire to create.

References

Au, W. (2007). High-stakes testing and curricular control: A qualitative metasynthesis. Educational Researcher, 36 (5), 258-267.

Bertrand, M. & Marsh, J.A. (2021). How data-driven reform can drive deficit thinking. Phi Delta Kappan, 102 (8), 35-39.

Campbell, T., Schwarz, C.V., & Windschitl, M. (2016). What we call misconceptions may be necessary stepping-stones toward making sense of the world. Science and Children, 53 (7), 69-74.

Diamond, J.B. & Spillane, J.P. (2004). High stakes accountability in urban elementary schools: Challenging or reproducing inequality? Teachers College Record, 106 (6), 1145-1176.

Krumm, A.E., Penuel, W.R., Pazera, C., & Landel, C. (2020). Measuring equitable science instruction at scale. In M. Gresalfi, I.S. Horn, N. Enyedy, H.J. So, V. Hand, K. Jackson, . . . & T.M. Philip (Eds.), Proceedings of the International Conference of the Learning Sciences (Vol. 4, pp. 2461-2468). International Society of the Learning Sciences.

Lee, J. & Reeves, T. (2012). Revisiting the impact of NCLB high-stakes school accountability, capacity, and pesources: State NAEP 1990–2009 Reading and math achievement gaps and trends. Educational Evaluation and Policy Analysis, 34 (2), 209-231.

National Research Council. (2011). Incentives and test-based accountability. National Academies Press.

Penuel, W.R., DeBarger, A.H., Boscardin, C.K., Moorthy, S., Beauvineau, Y., Kennedy, C., & Allison, K. (2017). Investigating science curriculum adaptation as a strategy to improve teaching and learning. Science Education, 101 (1), 66-98.

Penuel, W. R., & Shepard, L. A. (2016). Assessment and teaching. In D. H. Gitomer & C. A. Bell (Eds.), Handbook of Research on Teaching (pp. 787-851). AERA.

Supovitz, J.A. & Sirinides, P. (2018). The Linking study: An experiment to strengthen teachers’ engagement with data on teaching and learning. American Journal of Education, 124, 161-188.

This article is an invited response to “Possible futures for equitable educational assessment” by William R. Penuel, part of Kappan‘s Reimagining American Education: Possible Futures series, sponsored by the Spencer Foundation.