Coming to agreement about the purpose of grading and establishing clearer and more accurate reporting structures can pave the way for more learning-focused grading systems.

Throughout the world today, school leaders are struggling to implement grading reforms. They recognize that many current grading policies and practices are outdated and inadequate. They also know these policies and practices don’t align well with recent changes in school curricula, instructional strategies, and procedures for assessing student learning. Yet despite their commitment and good intentions, these dedicated school leaders are facing unanticipated opposition.

Grading reform means challenging some of education’s longest held and most firmly entrenched traditions (Guskey & Brookhart, 2019). These challenges prompt concern among all stakeholders and serious opposition from some. In many cases, the most adamant opposition comes from parents and families, especially for reforms involving standards-based or competency-based grading (Franklin, Buckmiller, & Kruse, 2016; Young, 2023).

Sources of frustration

Ironically, few parents and families oppose the basic principles of standards-based or competency-based grading. Most support the idea of reporting students’ achievement in terms of specific learning goals. They also understand the rationale behind giving students multiple opportunities to demonstrate what they have learned. The frustration of parents and families, as well as many students, comes from the failure of reform efforts to address what they consider a primary obstacle to fairness and equity in grading: inconsistency in grading practices among teachers in the same school (Guskey & Link, 2019). Each time students change classes, the rules for grading change. What counts as part of the grade, what doesn’t count, and how different aspects of students’ performance are weighed in determining grades — all can be different (Guskey, 2024).

This inconsistency leads many students to see grading as a game they must learn to play to succeed in school — and some students play the game quite well. They become strategists in the grading game, constantly tallying points and calculating the minimum scores they must attain to get the grade they want. But for other students, the grading game remains a mysterious puzzle they must decipher in every class, and many struggle in that effort. So, when a parent asks at the dinner table, “What grade are you going to get in this class?” the student responds in all honesty, “I don’t know.”

Before standards-based or competency-based grading reforms can be implemented, this inconsistency in grading must be addressed. This doesn’t mean infringing on teachers’ professional freedom. It simply requires reaching consensus about the purpose of grading and then implementing grading policies and practices that evidence shows serve the best interests of students and their learning.

Gaining greater consistency in grading among teachers involves three crucial steps that lay the groundwork for
standards-based and competency-based grading reforms (Guskey, 2021):

Reach consensus on a clear and concise purpose statement for grading.
Use grading scales with four to seven categories of student performance.
Report academic and non-academic aspects of students’ performance separately.

Develop a clear and concise purpose statement

Teachers generally don’t agree on why they give grades in the first place (Russell & Airasian, 2011). When neither teachers nor school leaders agree on what grades mean or what they are for, grading procedures tend to vary from teacher to teacher, class to class, and school to school.

Establishing consensus

Successful grading reforms always begin with focused discussions on the purpose of grades and report cards (Brookhart, 2011). These discussions must address three questions:

What information will grades communicate?
Who is the primary audience for that information?
What is the intended goal of grading?

Reaching consensus on answers to these questions provides the foundation for determining the appropriateness of all grading policies and practices. It also establishes criteria for deciding the optimal form and structure of the report card.

Research by Jessica Gogerty (2016) showed that when the purpose of grading is clearly articulated, teachers become more deliberate in their approach to student learning. They prioritize curriculum standards and adjust their instructional procedures so that content, format, and difficulty of classroom assessments are more closely aligned. Teachers also express less tolerance of colleagues who fail to align their teaching and learning practices to the grading purpose. They see this failure as “negligence” that causes unnecessary confusion for students and families (p. 154). When the grading purpose is clear, teachers are expected to uphold that purpose.

Example purpose statements

To guarantee a shared understanding among all stakeholders, this purpose statement should be prominently featured on the report card and included in the introduction of all grading policy documents. This helps clarify the report card’s intent, the information it includes, and how to interpret that information.

Numerous examples of purpose statements for grading and report cards can be found online and in Developing Standards-Based Report Cards (Guskey & Bailey, 2010). Although these examples vary widely, the best succinctly address the three questions described earlier. An elementary-level example would be:

The purpose of this report card is to describe students’ learning progress to parents and families, based on our school’s learning goals for each grade level. It is intended to inform parents and families about learning successes and to guide improvements when needed.

This statement specifies the aim of the report card, for whom it is intended, and how the included information should be used. It is brief but clear and concise. Another example for the middle school or high school level is:

The purpose of this report card is to communicate with parents, families, and students about the achievement of specific learning goals. It identifies students’ current levels of performance regarding those goals, areas of strength, and areas where additional time and effort are needed.

This statement identifies parents, families, and students as important audiences for the report card. It further specifies that the information describes students’ “current level of performance,” not where they started or an average of scores over time. It also indicates how the information should be used to guide improvement.

A third example comes from the American School of Paris, an international school where the administrators and faculty have been especially thoughtful in their approach to grading and reporting reform:

The primary purpose of grading is to effectively communicate student achievement toward specific standards, at this point in time. A grade should reflect what a student knows and is able to do. Students will receive separate feedback and evaluation on their learning habits, which will not be included in the academic achievement grades.

Two parts of this purpose statement deserve attention. First, the phrase “at this point in time” makes clear that teachers do not determine students’ grades by averaging scores from the entire grading period. Instead, they assign grades based on the most current evidence they have on what students now know and can do. In other words, grades reflect where students are in their learning right now, not where they were weeks or months before.

Second, the statement “students will receive separate feedback and evaluation on their learning habits” emphasizes that achievement grades represent students’ performance on specific academic learning goals. Other aspect of students’ behavior related to learning habits, such as homework completion, class participation, and punctuality in turning in assignments, are reported separately.

Use grading scales with four to seven performance categories

Consistency in grading implies that teachers with comparable knowledge and experience, when presented with the same body of evidence on a student’s performance, agree on the grade. Researchers call this “inter-rater reliability” (Gwet, 2021; Hallgren, 2012). The number of levels or categories of performance in the grading scale plays a significant role in achieving that agreement. Scales that include large numbers of categories increase the potential influence of subjectivity and drastically reduce agreement among teachers.

Direct and indirect measures

The challenge of gaining acceptable levels of inter-rater reliability in grading is further complicated by the fact it requires teachers to summarize quantitative evidence gathered primarily through indirect approaches to measurement. Direct measurement involves explicitly measuring and quantifying the characteristic of a person that we want to report. For instance, to measure a student’s height, we would ask the student to stand with their back against a wall, place a level instrument like a ruler or book on the top of their head, mark the wall, and then measure the distance from the floor to that mark. The recorded number represents the direct measurement of the student’s height.

Most of the measures teachers use to determine grades are indirect measures. Indirect measurement involves measuring something else and converting it into a measurement of the characteristic in question (American Psychological Association, 2018). For example, we cannot directly measure students’ achievement or proficiency by placing a measuring device on them. Instead, we ask students to answer questions or perform certain tasks. We then make judgments or inferences about students’ level of achievement based on their responses or performance. Because these judgments involve personal interpretation, indirect measures are more susceptible to bias and interpretation errors than direct measures. This makes it extremely difficult to accurately discern and report subtle differences in students’ performance.

When neither teachers nor school leaders agree on what grades mean or what they are for, grading procedures tend to vary from teacher to teacher, class to class, and school to school.

Failure to recognize the difference between direct and indirect measures often leads to false assumptions about the numbers assigned to students. This is called the illusion of data validity, and it leads to the false belief that the information we collect from and about students is always honest, complete, and accurate (Jansen et al., 2022). This is rarely true when it comes to indirect measures of student achievement collected for grading.

The problem of the percentage scale

In this context, the percentage grade scale with 101 discrete levels of student performance, two-thirds of which typically designate failure, presents a noteworthy challenge. Some educators believe the large number of levels in the percentage grade scale makes it more precise than scales with fewer levels, such as the five-level letter-grade scale (A, B, C, D, and F) used in most colleges and universities. But the reality is far more complex. In the absence of a truly accurate measuring device, adding more levels to the measurement scale offers only the illusion of precision. In fact, the large number of levels in the percentage scale, coupled with the fine discrimination required to determine differences among those levels using indirect measures, can lead to greater subjectivity, increased error, and diminished reliability. Researchers have recognized these problems for well over a century (Starch & Elliot, 1912, 1913).

In defense of the percentage grade scale, some educators argue that the percentage of questions on an assessment that students answer correctly represents a direct measure of achievement. They reason that correctly answering 80% of the questions on an assessment means the student has learned 80% of the material or mastered 80% of the learning goals. While the percentage of questions answered correctly might seem like a direct measure, the interpretation of this percentage involves numerous complexities. The format, difficulty, and alignment of the questions to instruction, as well as other factors, can significantly impact the accuracy and reliability of percentage-based scores. This complexity underscores the challenges in achieving true precision in grading, even when using seemingly straightforward measures like percentage correct. The perceived precision of percentage grading methods is far more illusory than real, due to the inherent subjectivity and complexity of the indirect measures involved (Guskey, 2013).

Fewer levels, greater accuracy

Significant research shows that optimal discrimination, validity, and reliability are obtained using grading scales with four to seven levels or categories (Lozano, Garcia-Cuento, & Muniz, 2008; Preston & Colman, 2000). Teachers with comparable knowledge and experience are far more likely to agree when distinguishing an A level from a B level of performance than when distinguishing a 90 from an 89 using the percentage scale. The use of clear and well-defined scoring criteria, along with a limited number of grading categories, helps ensure a shared understanding among teachers and promotes more consistent grading practices. This understanding is particularly important for implementing grading reforms that prioritize fairness, transparency, and equity.

Report multiple grades

Every marking period, teachers gather multiple forms of evidence on students’ performance that reflect three different types of grading criteria: product, progress, and process (Guskey, 1994, 1996).

Product criteria show how well students have achieved specific academic learning goals, standards, or competencies, typically demonstrated through major assessments, classroom quizzes, compositions, projects, reports, and other culminating activities.
Progress criteria, sometimes called “growth” or “development” criteria, show how much students have gained or improved in their learning. Students could make outstanding progress, but still not be achieving at grade level, and highly skilled students might achieve the product criteria without making notable improvement.
Process criteria describe student behaviors that facilitate, broaden, or extend learning. These may include activities that enable learning, such as formative assessments, homework, and class participation. They also may reflect nonacademic social-emotional learning skills, such as collaboration, goal setting, perseverance, habits of mind, or citizenship. In some cases, they relate to students’ compliance with procedures, like turning in assignments on time.

A hodgepodge grade

At the end of each marking period, teachers assign weights to these different sources of evidence to tally a final score recorded on the report card (Sun & Cheng, 2013). Researchers call this a “hodgepodge” grade (Brookhart, 1991) because it mixes achievement and other factors related to behavior, attitude, effort, and improvement. It makes the report card grade a confusing amalgamation that is impossible to interpret clearly and accurately (Guskey, 2020). An A, for example, might mean that the student knew all the concepts before instruction began (product); that she did not achieve the learning goals but made significant improvement (progress); or that she put forth extraordinary effort (process).

Recognizing these problems, some grading reform advocates recommend that teachers use only product criteria in determining students’ grades. They point out that the more progress and process criteria come into play, the more subjective, biased, and inequitable grades become (Feldman, 2023). How can a teacher know, for example, how difficult a task was for students or how hard they worked to complete it? Many teachers point out, however, that if process elements like homework and punctuality in turning in assignments don’t count, students will lose all motivation to do homework or complete assignments on time — and evidence from schools implementing these practices confirm their apprehensions (Randazzo, 2023; We Are Teachers, 2023).

Multiple grades for multiple criteria

A far more effective solution is not to eliminate progress or process criteria from grading but to report these criteria separately. Teachers simply extract evidence on the important nonacademic aspects of students’ performance and report those in their own section of the report card and the transcript.

Although reporting multiple grades is relatively new in most U.S. schools, the practice has a long-established history in other countries. In Ontario, Canada, for example, teachers have reported multiple grades for students from 1st grade through high school for decades. Every marking period, in addition to academic grades, teachers record grades for responsibility, independent work, initiative, organization, collaboration, and self-regulation. A major component of students’ responsibility grade is “Completes and submits class work, homework, and assignments according to agreed-upon timelines.” Students’ grades for responsibility and other process elements are reported on a four-level scale with the categories Excellent, Good, Satisfactory, and Needs Improvement (Ontario Ministry of Education, 2023).

Benefits for students, parents, teachers, and more

Teachers using multiple grades say that knowing these aspects of performance will be reported on both the report card and transcript compels students to act more responsibly. Parents benefit because the report card provides a more detailed, comprehensive picture of their child’s performance. In addition, because product grades are no longer tainted by evidence based on behavior or compliance, those grades more closely align with external measures of achievement and content mastery, such as standardized test scores — a quality college and university admissions officers favor (Buckmiller & Peters, 2018). In essence, removing process elements from the achievement (product) grade makes grades more accurate, honest, and equitable indicators of student learning.

Establishing greater consistency in grading policies and practices doesn’t require all teachers to grade in the same way.

Most important, reporting multiple grades doesn’t require extra work for teachers. In fact, it’s less work. Teachers already gather evidence on product, progress, and process criteria. For example, most keep records of students’ scores on various measures of achievement, as well as homework completion, class participation, collaboration in projects, and so on. By simply reporting separate grades for these different aspects of learning, teachers avoid the dilemmas involved in determining how much to weigh each element when calculating a single grade.

A more accurate picture

Establishing greater consistency in grading policies and practices doesn’t require all teachers to grade in the same way. Just as assessment strategies must be adapted to fit the learning goals in different subjects, grading procedures must be similarly adapted to accurately communicate students’ achievement of those learning goals.

Schools where educators reach consensus on a purpose statement, adopt a grading scale with four to seven categories of student performance, and report academic achievement and nonacademic learning goals separately have the necessary foundation for more meaningful and effective grading reform. With these three crucial steps accomplished, most teachers find it easy to transition to standards-based or competency-based grading. They recognize how they can break down an overall achievement grade to report on the different standards that it summarizes. Many see this transition as a natural progression in their efforts to provide meaningful summaries of students’ performance. Without adding to teachers’ workload, these steps address the greatest concerns of parents and families; facilitate better communication between school and home; and ensure greater honesty, accuracy, and equity in grading.

References

American Psychological Association. (2018). Indirect measurement. In APA dictionary of psychology. https://dictionary.apa.org/indirect-measurement

Brookhart, S.M. (1991). Grading practices and validity. Educational Measurement: Issues and Practice, 10 (1), 35-36.

Brookhart, S.M. (2011). Starting the conversation about grading. Educational Leadership, 69 (3), 10-14.

Buckmiller, T.M. & Peters, R.E. (2018). Getting a fair shot? School Administrator, 75 (2), 22-25.

Feldman, J. (2023). Grading for equity: What it is, why it matters, and how it can transform schools and classrooms (2nd ed.). Corwin.

Franklin, A., Buckmiller, T., & Kruse, J. (2016). Vocal and vehement: Understanding parents’ aversion to standards-based grading. International Journal of Social Science Studies, 4 (11), 19-29.

Gogerty, J.I. (2016). The influence of district support during implementation of high school standards-based grading practices [Unpublished doctoral dissertation]. Drake University, Des Moines, Iowa.

Guskey, T.R. (1994). Making the grade: What benefits students. Educational Leadership, 52 (2), 14-20.

Guskey, T.R. (1996). Reporting on student learning: Lessons from the past — Prescriptions for the future. In T.R. Guskey (Ed.), Communicating student learning (pp. 13-24). ASCD.

Guskey, T.R. (2013). The case against percentage grades. Educational Leadership, 71 (1), 68-72.

Guskey, T.R. (2020). Breaking up the grade. Educational Leadership, 78 (1) 41-46.

Guskey, T.R. (2021). Learning from failures: Lessons from unsuccessful grading reform initiatives. NASSP Bulletin, 105 (3), 192-199.

Guskey, T.R. (2024). Engaging parents and families in grading reforms. Corwin.

Guskey, T.R. & Bailey, J.M. (2010). Developing standards-based report cards. Corwin.

Guskey, T.R. & Brookhart, S.M. (2019). What we know about grading: What works, what doesn’t, and what’s next. ASCD.

Guskey, T.R. & Link, L.J. (2019, April). Understanding different stakeholders’ views on homework and grading [Paper presentation]. Annual Meeting of the American Educational Research Association, Toronto, ON, Canada.

Gwet, K.L. (2021). Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters (5th ed.). AgreeStat Analytics.

Hallgren K.A. (2012). Computing inter-rater reliability for observational data: An overview and tutorial. Tutor Quant Methods Psychol, 8 (1), 23-34.

Jansen, B.J., Salminen, J., Jung, S., & Almerekhi, H. (2022). The illusion of data validity: Why numbers about people are likely wrong. Data and Information Management, 6 (4), 1-14.

Lozano, L.M., Garcia-Cuento, E., & Muniz, J. (2008). Effect of the number of response categories on the reliability and validity of rating scales. Methodology, 4 (2), 73-79.

Ontario Ministry of Education. (2023). Elementary and secondary report card templates. www.ontario.ca/page/elementary-and-secondary-report-card-templates

Preston, C.C. & Colman, A.M. (2000). Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104 (1), 1-15.

Randazzo, S. (2023, April 26). Schools are ditching homework, deadlines in favor of “Equitable grading.” Wall Street Journal.

Russell, M.K. & Airasian, P.W. (2011). Grading. In Classroom assessment: Concepts and applications (7th ed.). McGraw-Hill.

Starch D. & Elliott, E.C. (1912). Reliability of the grading of high school work in English. School Review, 20, 442-457.

Starch, D. & Elliott, E.C. (1913). Reliability of the grading of high school work in mathematics. School Review, 21, 254-259.

Sun, Y. & Cheng, L. (2013). Teachers’ grading practices: Meaning and values assigned. Assessment in Education: Principles, Policy & Practice, 21, 326-343.

We Are Teachers Staff. (2023, September 7). “No zeros” is sold as an equity shortcut. It’s not. We Are Teachers. www.weareteachers.com/equitable-grading/.

Young, J. (2023, November 30). Some schools are changing how they grade students. Here’s why some parents are upset. USA Today.

This article appears in the May 2024 issue of Kappan, Vol. 105, No. 8, p. 52-57.

ABOUT THE AUTHOR

Thomas R. Guskey

Thomas R. Guskey is professor emeritus at the College of Education, University of Kentucky, Lexington. He is the author of Engaging Parents and Families in Grading Reforms (Corwin, 2024) and Get Set, Go! Creating Successful Grading and Reporting Systems (Solution Tree, 2020).

Addressing inconsistencies in grading practices

Coming to agreement about the purpose of grading and establishing clearer and more accurate reporting structures can pave the way for more learning-focused grading systems.

Sources of frustration

Develop a clear and concise purpose statement

Establishing consensus

Example purpose statements

Use grading scales with four to seven performance categories

Direct and indirect measures

The problem of the percentage scale

Fewer levels, greater accuracy

Report multiple grades

A hodgepodge grade

Multiple grades for multiple criteria

Benefits for students, parents, teachers, and more

A more accurate picture

ABOUT THE AUTHOR

Thomas R. Guskey

Recent Posts

Addressing inconsistencies in grading practices

Coming to agreement about the purpose of grading and establishing clearer and more accurate reporting structures can pave the way for more learning-focused grading systems.

Sources of frustration

Develop a clear and concise purpose statement

Establishing consensus

Example purpose statements

Use grading scales with four to seven performance categories

Direct and indirect measures

The problem of the percentage scale

Fewer levels, greater accuracy

Report multiple grades

A hodgepodge grade

Multiple grades for multiple criteria

Benefits for students, parents, teachers, and more

A more accurate picture

ABOUT THE AUTHOR

Thomas R. Guskey

Related Posts

The state playbook for teacher recruitment, development, and retention

The Affordable Care Act and school-based mental health services

Digital platforms aren’t mere tools — they’re complex environments

ESEA at 50

Recent Posts