Evaluating coaches on the basis of student performance can cause coaches to shift their focus away from the teachers they are supposed to support.

Teacher evaluation has been a hot topic in education research over the past decade (Berliner, 2018; Close, Amrein-Beardsley, & Collins, 2019; Darling-Hammond et al., 2012). However, few studies have examined how best to evaluate the instructional coaches who support teachers. While practitioner-focused books (Killion & Harrison, 2006) and reports (Kowal & Steiner, 2007) have begun to address this important topic, additional empirical research is needed.

In our recent coaching study, we partnered with two instructional coaches (Meg and Claire, both pseudonyms, like all names in this article) and five teachers to understand the kinds of discursive learning opportunities that teachers engage in during one-on-one coaching (Saclarides & Lubienski, 2018b, Saclarides & Lubienski, 2018c). As we conducted our study, it became apparent that several administrative policies and expectations exerted a powerful influence on the coaches (Saclarides & Lubienski, 2018a; Saclarides & Lubienski, in press). Here, we focus on one such policy — the Evaluation Tool (ET) — which was a new evaluation system for both teachers and coaches in our partner district.

Study context and the Evaluation Tool

Our study took place in a midsize urban district — which we will call Midtown — serving a socioeconomically and racially diverse student population (35% White/non-Latinx, 36% Black, 12% Latinx, 9% Asian, and 8% other). One full-time instructional coach was assigned to each school, and their primary responsibility was to provide individual as well as group professional development that would help teachers improve their instruction.

During the 2016-17 school year, the district implemented the new ET process. As part of this process, coaches and teachers were required to work with a group of students for a specified time frame (four or nine weeks), pick an area of instructional focus based on student data, and then gather pre- and post-student assessment data to determine the extent to which students met their growth goals. Teachers and coaches had to enter all ET-related information, such as individual student growth goals and pre-/post-assessment data, into an online portal that principals and district-level administrators could access. Depending on the amount of growth the students made, the coaches and teachers were assigned a certain level of proficiency (Excellent, Proficient, Needs Improvement, and Unsatisfactory). Given that the process required classroom data, it was most efficient for the coaches to complete it while engaging in a coaching cycle with a teacher. Furthermore, the coach and teacher could choose to complete their evaluation together, with identical data for student growth used for both.

We launched our study of coaches Meg and Claire while the Evaluation Tool was being implemented. Overall, we observed 15 planning or reflection meetings and 23 modeled or cotaught lessons that the coaches conducted with five elementary teachers, and we conducted 27 interviews with administrators, coaches, and teachers. By chance, two of the coaching cycles we observed involved the coaches completing their evaluation. In particular, Coach Meg worked with Teacher Michelle, and Coach Claire worked with Teacher Cathy to gather student assessment data and complete their ET requirements together. It became apparent to us that the ET seemed to have a strong hand in shaping coach-teacher interactions in these particular coaching cycles. What, specifically, did we observe?

Tensions

The purpose of the Evaluation Tool was to encourage teachers and coaches to use data to inform instructional decisions and to focus their attention on student growth. These are certainly important goals. However, we noticed two key ways in which the ET detracted from the coaches’ efforts to foster teacher improvement.

Coaches engaged in abnormally long coaching cycles. Although Midtown’s coaching cycles were typically two to four weeks long, the ET process was designed to take four to nine weeks. The longer nine-week period would give coaches and teachers more time to help students meet growth goals, thereby enhancing their evaluation scores.

While Coach Meg completed a six-week ET with Teacher Michelle, Coach Claire and Teacher Cathy agreed to complete a nine-week ET together. However, Claire said that she would rarely decide to engage in a nine-week coaching cycle as part of her everyday practice, particularly given that Cathy was already an exceptional educator — which is why Claire chose to complete the ET with Cathy in the first place. Claire ultimately felt frustrated the ET prompted her to spend more time in a coaching cycle with Cathy than she needed to, limiting the time she could spend with other teachers who might benefit more from her support.

Coaches did too much for teachers. Because the student data produced from ET-driven coaching cycles would be directly tied to the coaches’ own evaluations, Meg and Claire did much more for their teachers when engaged in the ET than they typically did. For example, Meg modeled whole-group instruction in Michelle’s classroom to complete her ET. This, in itself, was not unusual. But, contrary to what is considered best practice for modeling (Clarke, Triggs, & Nielsen, 2014; West & Staub, 2003), Meg never released the classroom to Michelle to let her teach on her own. Instead, Meg modeled for the entire duration of the ET before giving the post-test, enabling her to teach lessons precisely as she felt they should be taught. Meg herself was conscious of this choice: “I want to actually model for the first couple of weeks to almost even three because this is my Evaluation Tool, so I am responsible for the scores.”

Similarly, when completing her ET with Cathy, Claire acknowledged that she took on more of the work than she typically would have, by, for example, independently creating the small groups and schedule rather than cocreating them with the teacher. Claire said that she needed to take on this work because her evaluation was linked to the coaching cycle: “I do feel like the Evaluation Tool really drove that for me a lot more with Cathy because I wouldn’t have normally done that, but I felt like I had to because I needed to do my part . . . and . . . it was tied to my evaluation.”

Given that Meg and Claire did so much for, rather than with, Michelle and Cathy during the coaching cycles, both teachers expressed concerns at the end of the coaching cycle about their ability to continue to implement what their coaches initiated. For example, Cathy shared that it would be difficult for her to continue with the guided mathematics groups Claire modeled without having Claire in the room to assist.

Perverse incentives

The Evaluation Tool seems appropriate for evaluating teachers, as their primary role is teaching students. Coaches, on the other hand, are supposed to help teachers improve their practice. Imposing the teacher evaluation system on coaches created perverse incentives for the coaches we studied. Specifically, it prompted the coaches to spend more time than was appropriate in their ET-related coaching cycles and to do much of the work for teachers instead of with them. Because the ET focuses on student, instead of teacher, growth, the coaches were motivated to sidestep the teachers and focus on generating strong student data, rather than ensuring that teachers were consistently engaged in substantive learning opportunities. The evaluation system made it possible for the coaches to receive a very positive evaluation without having made much, if any, impact on the teachers.

Evaluating coaches’ multifaceted work

Coaches support teaching and learning in various ways across an entire school or even district. In
a given day, a coach may begin by modeling a number-sense activity in a kindergarten classroom, then facilitate a grade-level team meeting for 5th-grade teachers, then co-teach with a 3rd-grade teacher during guided reading, and end the day by leading a schoolwide professional development session on student sense-making in mathematics.

Instead of evaluating coaches on the basis of student performance, the coaching evaluation system should attend to these varied components of coaches’ work. That is, coaches should receive feedback on how well they provide learning opportunities for teachers in one-on-one settings (such as modeling and co-teaching), as well as in small- and large-group professional development settings across various grade levels and content areas.

How can this be done? There are several options:

Administrator observations of coaches. Administrators could observe coaches as they provide professional development for individuals and groups of teachers. But what should administrators look for? In their 2018 book Systems for Instructional Improvement: Creating Coherence from the Classroom to the District Office, Paul Cobb and his colleagues outline five principles that should be followed for professional development to support teacher learning. Below, we outline each principle and then provide questions for administrators to consider as they observe coaches:

Principle 1: Professional development should be coherent and connected.Does the coach connect the focus of group professional development with the focus of one-on-one professional development? For example, if the coach provides group professional development on how to effectively launch a mathematical task, does the coach also seek opportunities to support individual teachers in their classrooms as they launch mathematical tasks with their own students?
Principle 2: Professional development should be sustained over time and allow teachers to engage with the same group of colleagues.Does the coach provide long-term, rather than intermittent, support for a group of teachers? For example, returning to the task launch example from above, to what extent does the coach provide sustained, (e.g., yearlong) support for a group of teachers on effectively launching mathematical tasks?
Principle 3: Professional development should be close to teachers’ practice and promote the use of high-leverage teaching practices that are thought to increase student achievement.Does the coach focus on real instructional issues teachers face in their classrooms (rather than imposing something disconnected from teachers’ practice)? Does the coach support teachers’ use of the district-provided curriculum instead of other materials not sanctioned by the district? Does the coach help teachers integrate instructional practices that have been proven to positively affect students?
Principle 4: Professional development should not only provide images of high-quality practice, but should also enable teachers to try out these new practices while receiving feedback in a low-stakes setting.Does the coach ensure that teachers have opportunities both to observe good instruction and to implement practices demonstrated by the coach or others? For example, does the coach show and ask teachers to analyze video recordings of an effective task launch and then provide opportunities for teachers to rehearse the launch while the coach provides feedback?
Principle 5: Individuals who facilitate professional development for teachers must be instructional experts themselves and understand teacher development.During individual and group professional development, does the coach demonstrate strong content knowledge and deep knowledge of pedagogy? Does the coach actively engage teachers in learning? Does the coach facilitate discussions likely to deepen teachers’ content knowledge and pedagogical skills?

As part of the evaluation process, the administrator should meet with the coach to debrief observations and set goals for the coach’s future work. Furthermore, just as teachers are typically observed several times each year, coaches should be observed at least as often, with observations taking place in both one-on-one and group settings.

Teacher surveys about coaches. Administrators could ask teachers to complete brief, confidential surveys about their interactions with coaches. For example, coaches could provide the names of teachers they have recently worked with in individual and group settings, and then those teachers could be surveyed using a combination of Likert-scale items and open-ended responses. Likert-scale items could ask teachers to rate the extent to which they agree with several statements, such as:

Working with my coach has helped me improve my teaching.
Working with my coach has helped me better understand how to use the district-provided curriculum.
Working with my coach has helped deepen my understanding of the content I am expected to teach at my grade level.

The open-ended questions could ask teachers to describe the ways in which the coach worked with them and to identify areas of strength and improvement for their coach. After reading through the survey data, the administrator could then debrief with the coach, summarizing the main survey findings to protect teacher confidentiality.

Student data as evidence of teacher improvement. A central idea behind coaching is that coaches should help teachers improve their practice, which should ultimately affect student achievement. Hence, coach evaluations could be partially based on the improvement of teachers’ instruction, as measured by student growth. Admittedly, doing this would require both access to and careful analysis of meaningful student achievement data.

To provide a concrete example, suppose that a coach provided yearlong, whole-group professional development for K-2 teachers on how to implement number talks, which are activities designed to enhance students’ number sense (e.g., “How many pairs of numbers can you find that add up to 15?”). Furthermore, suppose that this coach also worked individually with teachers to support their integration of number talks into their mathematics instruction. Thus, to evaluate the effectiveness of the coach at enhancing teachers’ number-sense instruction, student achievement data specific to number sense could be examined.

Specifically, the number-sense gains of participating teachers’ students could be compared with: (1) those same students’ gains in other topics during the current year, (2) those students’ gains in number sense in prior years, (3) the number-sense gains of students taught by the participating teachers in previous years and/or (4) the number-sense gains of teachers who did not work with the coaches. Although no single analysis should be considered definitive, a pattern of student improvement could emerge to reveal the influence of coaches on teacher and student learning.

Coach self-study. One last method that could supplement any of the methods described above is a coach self-study of their work with teachers. As part of a self-study, a coach might consider asking teachers to complete a brief questionnaire before and after a coaching cycle in which teachers state what they hoped to learn and evaluate whether and how the coaching cycle helped them reach these learning goals. The coach could also consider asking teachers to keep an informal reflective journal to document their learning during a coaching cycle. Finally, the coach could conduct formal or informal observations of the teacher’s instruction before and after the coaching cycle to identify any changes in instruction.

Measuring what matters

Effectively evaluating the work of instructional coaches is complex. Although the ultimate goal of coaching is to improve student learning, solely relying on student performance data may lead coaches to focus their work on students, rather than teachers. To avoid this, we recommend that district leaders take a more holistic view of coaches’ work by observing coaches in action, asking both teachers and coaches to reflect on their work together, and carefully using student data to examine evidence of teacher and student improvement. The Evaluation Tool we studied rewarded coaches for sidestepping teachers, but the approaches we’ve suggested above are intended to help coaches keep their focus where it should be — on fostering teachers’ instructional improvement.

References

Berliner, D.C. (2018). Between Scylla and Charybdis: Reflections on and problems associated with the evaluation of teachers in an era of metrification. Education Policy Analysis Archives, 26 (54).

Clarke, A., Triggs, V., & Nielsen, W. (2014). Cooperating teacher participation in teacher education: A review of the literature. Review of Educational Research, 84 (2), 163-202.

Close, K., Amrein-Beardsley, A., & Collins, C. (2019). Mapping America’s teacher evaluation plans under ESSA. Phi Delta Kappan, 101 (2), 22-26.

Cobb, P., Jackson, K., Henrick, E., & Smith, T.M. (2018). Systems for instructional improvement: Creating coherence from the classroom to the district office. Cambridge, MA: Harvard Education Press.

Darling-Hammond, L., Amrein-Beardsley, A., Haertel, E., & Rothstein, J. (2012). Evaluating teacher evaluation. Phi Delta Kappan, 93 (6), 8-15.

Killion, J. & Harrison, C. (2006). Taking the lead: New roles for teachers and school-based coaches. Oxford, OH: National Staff Development Council.

Kowal, J. & Steiner, L. (2007). Instructional coaching (Issue Brief). Washington, DC: Center for Comprehensive School Reform and Improvement.

Saclarides, E.S. & Lubienski, S.T. (2018a). Exploring the content and depth of coach-teacher talk during modeling and co-teaching. Paper presented at the American Educational Research Association, New York, NY.

Saclarides, E.S. & Lubienski, S.T. (2018b). Tensions in teacher choice and professional development. Phi Delta Kappan, 100 (3), 55-58.

Saclarides, E.S. & Lubienski, S. (2018c). Where’s the math? A study of coach-teacher talk during modeling and co-teaching. In T.E. Hodges, G.J. Roy, & A.M. Tyminski (Eds.), Proceedings of the 40th Annual Meeting of the North American Chapter of the International Group for the Psychology of Mathematics Education (pp. 334-341). Greenville, SC: University of South Carolina & Clemson University.

Saclarides, E.S. & Lubienski, S.T. (in press). The influence of administrative policies and expectations on coach-teacher interactions. The Elementary School Journal.

West, L. & Staub, F.C. (2003). Content-focused coaching: Transforming mathematics lessons. Portsmouth, NH: Heinemann.