A Look Back
Inside the black box: Raising standards through classroom assessment
By Dylan Wiliam & Paul Black
October 1998, pp. 44, 146-148
Classroom assessment offers more than just the opportunity to gauge student knowledge, Dylan Wiliam and Paul Black write in this classic Kappan article. “(T)eaching and learning must be interactive,” the duo note in their 1998 paper, one of Kappan’s most-read articles of all time. The goal of educators must be to craft and execute meaningful formative assessments, in which “the evidence is actually used to adapt the teaching to meet student needs.” The practice, they argue, is “the heart of effective teaching” and has the potential to raise standards in classrooms across the globe. “Thus, it seems clear that very significant learning gains lie within our grasp,” the authors posit.
Getting there requires increased attention to the self-esteem of learners, the power of student self-assessments, and the evolution of effective teaching. Instruction and formative assessment “are indivisible,” the authors argue. As such, “feedback on tests, seatwork, and homework should give each pupil guidance on how to improve, and each pupil must be given help and an opportunity to work on the improvement.” Adopting formative assessment can be multilayered and time-intense and requires teachers “to take risks in the belief that such an investment of time will yield rewards in the future.” Partial or hurried attempts to implement formative assessment, Wiliam and Black caution, “are pointless and can even be harmful.” A successful shift requires examining even the most basic instructional strategies, including how teachers ask questions and react to student responses. Professional development (with video examples from real teachers) can help educators adapt the practices to their classrooms. “What teachers need is a variety of living examples of implementation, as practiced by teachers with whom they can identify and from whom they can derive the confidence that they can do better.”
Conversation Piece
This issue of Kappan focuses on how best to assess student learning. Use these questions to reflect on the topic with your colleagues:
- What types of assessments have you found most useful in determining what students know and are able to do? What is least useful?
- What experiences have you had with alternative assessment models, such as standards-based grading or portfolio assessments? What do you see as the benefits and drawbacks of such methods?
- If you could make one change to how assessment is done in your school or classroom, what would it be?
- How might technology help you assess student learning? What are the limitations of technology for student assessment?
PDK members have access to discussion guides related to specific articles in each issue of Kappan. Log in to the member portal and access the discussion guides.
“Our assessment can’t be so exclusively focused on standards that we stop short of the most important skills in Life. Students are counting on us to prepare them as expert learners.” — Lee Ann Jung, Assessing Students Not Standards (Corwin, 2024)
Research Connections
The emotional impact of feedback
Feedback can provide a powerful impetus for student improvement, but a 2024 review of interventions identified gaps that can make the process less effective. One key concern is a seeming lack of attention on helping students understand and deal with their emotions related to the feedback process. In the existing literature, “how students managed their affect during and after the intervention was not typically discussed,” researchers note. “Emotions are either seen as something to be overcome and managed, or something that may interfere with logical reasoning.” For students to truly benefit from feedback, preparation for the emotional impact of criticism needs to be woven into students’ understanding of the feedback process.
Source:Little, T., Dawson, P., Boud, D., & Tai, J. (2024). Can students’ feedback literacy be improved? A scoping review of interventions. Assessment & Evaluation in Higher Education, 49 (1), 39-52.
Redesigning state assessments
For better or worse, the content and format of state assessments impact classroom learning. A new report from the Learning Policy Institute offers six principles for states looking to redesign their testing programs. Assessments should be authentic, curriculum-anchored, and educative. They must also be developmental and asset-oriented, reflective of and responsive to learners, and useful for informing decisions that impact instruction. “By centering features of assessments that support better student learning experiences, teacher practice, and systemic supports and decision-making, we can create assessment systems that have a net positive impact on instruction,” the report notes.
Source: Badrinarayan, A. (2024, October). Design principles for instructionally relevant assessment systems. Learning Policy Institute.
“The traditional 0-100, A-F grading system does not communicate learning. It communicates behavior, privilege, and positionality. Worse than that, the A-F scale promotes giving up and cheating to get the grade, moving students away from the desire to grow and learn.” — Jonathon Medeiros, language arts teacher, Kauaʻi High School, Hawaii (Education Week, Nov. 6, 2023)
Equitable grading practices
Allowing retakes, emphasizing formative feedback, and prioritizing fairness and flexibility has the potential to raise achievement while decreasing student stress, according to a 2025 study by researchers at the University of Arkansas. A survey of 256 ninth-grade students suggested that supportive and transparent grading practices are needed to better serve today’s students. Such equitable grading practices, “by prioritizing fairness, flexibility, and mastery over performance on high-stakes assessments, can provide a sanctuary for students,” the researchers write. “For students grappling with the lingering effects of the pandemic on their academic and emotional well-being, such an approach to grading can be particularly beneficial, offering them a sense of control and agency in their learning journey.”
Source: Morris, S.R., McKenzie S. C., Wai, J., & Maranto, R.A. (2025, February). An investigation of ninth grade students’ perceptions of equitable grading practices. Journal of Research Initiatives, 8 (5), article 9.
Assessment for Learning
Assessment for Learning (AfL) — ongoing assessment practices that take place during a lesson or unit — is needed to combat learning barriers in the contemporary classroom, a new report posits. More than a method of testing students, AfL becomes the primary method of teaching as educators incorporate practices like student self-assessment, peer assessment, goal setting, and observations of student learning to inform instruction. As schools seek to move beyond summative assessment, address academic integrity challenges presented by AI, and support students with COVID-era learning gaps, AfL offers educators a means to personalize instruction. “When used consistently, AfL practices have shown promise in raising student achievement and learning outcomes around the world,” note the authors, who studied schools in the U.S., Australia, Canada, Ireland, Israel, New Zealand, and Norway.
Source: Volante, L., DeLuca, C., Barnes, N., Birenbaum, M., Kimber, M., Koch, M., Looney, A., Poskitt, J., Smith, K., & Wyatt-Smith, C. (2025, January). International trends in the implementation of assessment for learning revisited: Implications for policy and practice in a post-COVID world. Policy Futures in Education, 23 (1), 224-242.
Evaluating what works
Students, teachers, and schools all play a role in creating the learning environment, which can, in turn, lead to differences in student achievement. In a recent paper, a group of researchers highlights the challenges of evaluating the success of an educational intervention within a multilayered system. Using data from students in grades three and eight from Kentucky, Maryland, Michigan, and North Carolina, researchers found greater variance in student test scores between teachers in the same schools than between schools themselves. This finding suggests “that evaluators must work with partner schools, districts, and states to obtain information on how student achievement varies by teachers within schools,” the researchers write.
Source: Mulolli, D., Hedberg, E.C., Bogia, M., Spybrook, J., Berglund, T., Unlu, F., & Opper, I.M. (2025, January-December). Improving the design of evaluations that include students, teachers, and schools: An empirical investigation of key design parameters. AERA Open, 11.
Gauging college-readiness post-COVID
For decades, a student’s high school grades were a leading indicator of their readiness for college. New research from ACT suggests that the value of grades has shifted since 2020. “More grade inflation took place after the pandemic than in the decade preceding it,” according to a new report from the testing company. As such, a student’s high school GPA is no longer as predictive of their success in first-year college courses, the report notes. ACT scores, meanwhile, remain a relatively stable predictor of “early success” in college. “For students, understanding how high school grade point average and ACT Composite score predict first-year grade point average can help them prepare effectively and seek appropriate support if needed, ensuring a smoother transition to college,” the report states.
Source: Sanchez, E.I. (2024, September). Changes in predictive validity of high school grade point average and ACT composite score after the COVID-19 pandemic. ACT.2024, May) Teens and Video Games Today. Pew Research Center.
This article appears in the Summer 2025 issue of Phi Delta Kappan, 106 (7-8), 5-7.

