Measuring the quality of the teacher workforce

Looking beyond test scores to a broader set of outcomes can give leaders a richer — and more complex — picture of how teachers impact student success.

At a Glance

Two ways of measuring teacher effectiveness — their impact on test scores and on non-test outcomes — reveal three key findings about how teachers impact student success:

A teacher’s long-term impact is shaped not only by their individual strengths but also by how those strengths align with their students’ academic abilities.
Teachers vary widely in their ability to improve student outcomes, but excelling at raising test scores doesn’t necessarily translate to success with non-test measures, and vice versa.
Students taught by highly effective teachers fare better after graduation, but the nature of that success depends on the teacher’s particular strengths — whether in boosting test scores or other outcomes.

What defines a good teacher? In the simplest terms, a good teacher is one who helps students succeed. But it’s not so simple to determine how much influence a teacher has over student performance.

Test-based measures are one way of attempting to determine teacher effectiveness. These often take the form of value-added measures (or VAMs), which use statistics to separate a teacher’s contribution to their students’ test achievement from other factors that influence test scores. Although first introduced in the 1970s (Hanushek, 1971), VAMs gained widespread use in education policy and research beginning in the early 2000s.

Many states added test-score VAMs to their teacher evaluation systems in the 2010s during the federal government’s Race to the Top initiative. While some places have rolled back these policies, 27 states still mandate or permit the use of student test scores to evaluate teachers (National Council on Teacher Quality, 2022). There are good reasons for this.

Although VAMs based on test scores are controversial (Pivovarova, Amrein-Beardlsey, & Geiger, 2016), they do tell us something meaningful about teacher effectiveness. When teachers are better at improving test scores (i.e., have higher VAMs), for example, research shows their students are more likely to attend a good quality college and earn more money as adults (Chetty, Friedman, & Rockoff, 2014).

Why multiple measures matter

At the same time, test-based VAMs clearly don’t tell us everything. After all, teachers do lots of things we care about that affect student learning but don’t show up on tests. Teachers can instill good habits in their students, for example, or cultivate a classroom environment that makes students feel safe and welcome.

Teachers do lots of things we care about that affect student learning but don’t show up on tests.

About a decade ago, researchers started applying value-added frameworks — again, typically used for test scores — to examine non-test outcomes like student attendance, grades, behavior, and grade progression (e.g., Jackson, 2018; Kane McAffrey, & Staiger, 2013;). These non-test measures are arguably too imprecise to make high-stakes judgments (especially without years of data), and they don’t capture the deeper aspects of teaching — such as fostering students’ sense of belonging or critical-thinking skills. But they’re routinely included in administrative data and offer readily available indicators of teacher influence beyond standardized tests.

With this, some states and districts have expanded how they understand teacher quality. Massachusetts, for example, now requires evaluators to include evidence of teacher impact on student learning in performance ratings as one of multiple measures (Backes et al., 2024). Indeed, in its latest teacher evaluation rubric, Massachusetts expands the meaning of “impact on student learning” to include academic and non-academic outcomes, such as student engagement and sense of belonging.

When we take a closer look at teachers’ impact on test scores and on non-test outcomes, using data from Massachusetts, we learn three key things about teachers and student success.

Success in one measure doesn’t mean success with all measures

First, teachers vary widely in their ability to improve student outcomes, but excelling at raising test scores doesn’t necessarily translate to success with non-test measures, and vice versa. The scatter plot in Figure 1 uses a sample of around 8,000 teachers from Massachusetts, with each dot representing one teacher. The distribution of the dots shows teachers’ standardized scores for both test and non-test measures from 2019, with test-based VAM on the horizontal axis and non-test VAM on the vertical axis.

Scatter plot illustrating teacher effectiveness on test and non-test measures.

FIGURE 1. Teachers vary on test and non-test measures of effectiveness

The horizontal spread of dots shows teachers vary widely in their effectiveness at increasing test scores. And the vertical spread shows how much teachers vary in their ability to support students’ attendance, grades, and behavior. We can see from the graph that teachers vary on both measures. Some are relatively better or worse at increasing standardized test scores, and some are relatively better or worse at non-test measures.

Figure 1 also shows us the two measures are not strongly correlated. If they were, the dots would form an upward- or downward-sloping pattern. Instead, they are spread out in a cloud in the middle of the figure. Teachers in the upper left of the figure are good at improving non-test outcomes but not test scores. Teachers in the lower right are good at improving test scores but not non-test outcomes.

How students fare depends on their teachers’ strengths

Second, students taught by highly effective teachers fare better after high school graduation, but the nature of that success depends on the teacher’s particular strengths — whether in boosting test scores or other outcomes. Figure 2 shows the relationship between both types of VAM and four important outcomes: high school graduation, attending college, attending a four-year college, and attending a selective college.

Graph showing test and non-test VAM impact on postsecondary outcomes with horizontal lines. Test VAM data points are consistently on the left, and non-test VAM points on the right, demonstrating varied impact levels of these measures on educational outcomes.

FIGURE 2. Teachers’ test and non-test measures and postsecondary outcomes

As the figure suggests, teachers who are good at improving test scores increase the chances a student attends a selective college. Teachers who are good at improving non-test outcomes enhance the likelihood a student enrolls in college and attends a four-year college.

Excelling at raising test scores doesn’t necessarily translate to success with non-test measures, and vice versa.

What drives these contrasting effects remains unclear. Teachers who are effective at raising test scores may help students gain admission to more selective colleges by strengthening their academic skills. Meanwhile, those who improve non-test outcomes may support college enrollment more broadly by fostering the behaviors and mindsets that encourage students to pursue higher education. Understanding these nuanced relationships warrants deeper study.

Alignment between teacher and student affects outcomes

Third, a teacher’s long-term impact is shaped not only by their individual strengths but also by how those strengths align with their students’ academic abilities. Figure 3 shows how the impacts of the two VAM measures vary across the distribution of student achievement in high school (represented in deciles on the x-axis). The bars in the charts reflect the standardized change in each outcome associated with a one standard deviation change in VAM. As with Figure 2, test VAM is shown in blue and non-test VAM is in gold.

Four bar charts showing teacher impacts on academic outcomes across deciles of achievement for students who graduated high school, students who attended college, four-year college, and selective college with Non-test VAM in gold and Test VAM in blue.

FIGURE 3. Varied teacher impacts across deciles of student achievement

The gold bars show that students with lower achievement levels have better graduation and college attendance outcomes when they have a teacher with high non-test VAM. By contrast, high-achieving students have better selective college attendance outcomes when they have a teacher with a high test VAM. These figures suggest teachers with different skill sets may play distinct but valuable roles in educating students across the achievement distribution. We need to know more about how these distinctions play out in practice and what their full implications are for educational opportunity and equity.

Teachers with different skill sets may play distinct but valuable roles in educating students across the achievement distribution.

What we need to know about teacher effectiveness

Given the critical role teachers play in student success, system leaders need to focus on the quality of their workforce and how teachers are distributed across students and schools. Leaders can use test-based and non-test VAMs to better understand their workforce by asking questions like:

How do test and non-test measures of teacher quality vary across our district and schools?
How do the measures vary based on teacher experience? What are the implications for how we prepare, support, and develop teachers?
What practices do teachers who are effective on both measures use in the classroom? What can the system and other teachers learn from the most effective teachers?
How do test and non-test measures vary by different pathways into the profession (e.g., traditional versus alternative certification; different teacher preparation programs)?
Are we keeping the most effective teachers on both measures? What is the gap in retention between the most and least effective teachers on both measures?

Answers to these questions might help leaders target resources to the places where they can do the most good. Instead of responding to testing critics by abandoning tests as measures of school and teacher performance (or turning to measures with questionable validity), leaders should seek different ways of measuring teacher quality that capture a richer picture of how teachers contribute to student success.

References

Backes, B., Cowan, J., Goldhaber, D., & Theobald, R. (2024). How to measure a teacher: The influence of test and nontest value-added on long-run student outcomes. Journal of Human Resources, 60 (3).

Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014). Measuring the impacts of teachers II: Teacher value-added and student outcomes in adulthood. American Economic Review, 104 (9), 2633-2679.

Hanushek, E.A. (1971). Teacher characteristics and gains in student achievement: Estimation using micro-data. American Economic Review, 61 (2), 280-288.

Jackson, C.K. (2018). What do test scores miss? The importance of teacher effects on non-test score outcomes. Journal of Political Economy, 126 (5), 2072-2107.

Kane, T.J., McCaffrey, D.F., Miller, T., & Staiger, D. O. (2013). Have we identified effective teachers? Validating measures of effective teaching using random assignment. Bill & Melinda Gates Foundation.

National Council on Teacher Quality. (2022). State teacher policy database [Data set].

Pivovarova, M., Amrein-Beardlsey, A., & Geiger, T. (2016). Value-added models: What the experts say. Phi Delta Kappan, 98 (2), 35-40.

This article appears in the Fall 2025 issue of Kappan, Vol. 107, No. 1-2, pp. 60–62.

ABOUT THE AUTHORS

Ben Backes

Ben Backes is the principal economist at the Center for Analysis of Longitudinal Data in Education Research, American Institutes for Research

James Cowan

James Cowan is principal econometrician at the Center for Analysis of Longitudinal Data in Education Research, American Institutes for Research

Michael DeArmond

Michael DeArmond is director of policy at the Center for Analysis of Longitudinal Data in Education Research, American Institutes for Research

Dan Goldhaber

Dan Goldhaber is the director and vice president of the National Center for Analysis of Longitudinal Data in Education Research at the American Institutes for Research and the director of the Center for Education Data and Research and a professor in the School of Social Work at the University of Washington.

Roddy Theobald

Roddy Theobald is the deputy director and managing researcher at the Center for Analysis of Longitudinal Data in Education Research, American Institutes for Research.