A meta-analysis attempts to discover what kinds of teacher learning are most likely to improve students’ performance in STEM disciplines. 

 

National survey data suggest that teachers of STEM (science, technology, engineering and mathematics) devote significant amounts of their professional learning time to studying state standards, analyzing instructional materials, deepening their understanding of content and student thinking about content, learning about assessment, and studying student data (Banilower et al., 2018). But do such activities actually lead to improved student academic outcomes?  

To find out, we performed a meta-analysis, meaning a comprehensive search for and rigorous statistical analysis of existing research in this area. Overall, we found, the evidence suggests that the most effective programs focus on topics — including curriculum materials, academic content, and how students learn — that build knowledge teachers can directly use during instruction. We argue that such learning opportunities support teachers in making more informed in-the-moment instructional decisions.  

A study of studies 

When we set out to conduct a meta-analysis of the research on STEM teacher development programs, we had two goals in mind: 1) to calculate how much of an effect these programs have, on average, on student outcomes, and 2) to figure out which aspects of those programs are associated with learning gains. Until recently, it hasn’t been possible to do the second kind of analysis, because there simply weren’t enough rigorous studies to review. However, following calls for stronger research into the effects of educational interventions (e.g., Shavelson & Towne, 2001), federal research portfolios began in the early 2000s to prioritize rigorous quantitative studies that can reveal causal relationships between program characteristics and improved student outcomes. This effort significantly expanded the knowledge base in this field and allowed us to conduct the review we describe below.  

As is typical in a review of this kind, we conducted extensive database searches, combed through older research syntheses, and contacted the principal investigators of studies with unpublished findings. We included mainly studies that relied on random assignment, whereby some teachers or schools were randomly chosen, before data collection began, to implement the given teacher development program, while others were randomly chosen to serve as a control group, for purposes of comparison (for details, see Lynch et al., 2019). However, we also included nonexperimental designs when they included a comparison group chosen before data collection began and when differences on pretest measures of student achievement were small.  

Altogether, we located 89 research studies of programs that included professional development for STEM teachers; of these, 71 also included new curriculum materials for teachers to use in classrooms, suggesting that program developers often paired professional development with new classroom materials. Because many see curriculum materials as an important source of teacher professional learning in and of themselves (Ball & Cohen, 1996), we also included six additional research studies that focused on curriculum materials but that did not contain professional development, bringing the total number of studies to 95. 

We then read through these studies and created a data set that contained information on the size of the program’s impact on student achievement, the type(s) of assessment used to measure students’ outcomes (standardized tests or researcher-designed assessment), and the type of programs studied (professional development, curriculum materials, or both). For professional development (PD) programs, we also included information on the length of the PD, its focus, the activities that teachers engaged in, and the program format. We then set out to determine the extent to which these programs affected student achievement and to understand whether specific characteristics of those programs correlated with stronger or weaker student gains (Lynch et al., 2019).  

Professional learning programs improve student outcomes  

Figure 1 shows average program effects, both across all 95 programs (overall) and broken down by the type of assessment used in the evaluation. Our analysis showed that by the end of the program, the average student whose teacher participated in the program scored in the 58th percentile, while the average student in the control or comparison group scored in the 50th percentile. This effect is much larger for researcher-designed assessments than for standardized assessments. For all assessment types, however, our analysis shows that the difference between students whose teachers participate and do not participate in the program is likely not zero, though in the case of standardized assessments, it is not large.   

Were some program characteristics particularly effective in boosting student outcomes? As shown in Figure 2, we found that programs featuring both professional development and new curriculum materials had a greater effect than those featuring either curriculum materials or professional development alone.  

In Figure 3, we dig a bit deeper into the 89 professional development studies, looking further for characteristics associated with stronger student outcomes. We found that programs that featured two goals in particular — 1) helping teachers learn how to use curriculum materials, and 2) improving teachers’ content knowledge, pedagogical content knowledge, and knowledge of student learning — saw better student outcomes than programs that did not emphasize these goals.  

In other words, the more effective programs combine curriculum and professional development (as we saw in Figure 2) and also provide targeted support for teachers to improve their content knowledge and knowledge of student learning. A reading of these studies suggested that such programs engaged teachers in solving mathematics problems, taking part in scientific investigation, watching facilitators model instruction, and studying student work.  

Programs that included learning about integrating technology and content-specific formative assessment yielded positive gains for participating students, but our statistical models could not confirm that those gains were different, on average, than those in programs that did not include these features. This pattern likely occurred because of the small number of technology and content-specific formative assessment programs included in our analysis; small numbers mean more uncertainty about average program effects.  

Figure 4 shows the extent to which different PD formats were associated with differences in student outcomes. In particular, three formats — same-school collaboration, implementation meetings, and summer workshops — yielded stronger student gains than PD programs that did not employ these formats. Same-school collaboration occurred when teachers participated in the professional development session alongside other teachers in their school. Implementation meetings allowed teachers to convene briefly with other PD participants to discuss how to surmount obstacles they encountered when putting the program into practice. Summer workshops, often thought to be less effective than school-year learning, may have provided participants with concentrated opportunities for deep and sustained learning. Professional learning with an online component yielded lower impacts on student learning than programs that were entirely face-to-face. And although the effects of coaching appear strong in Figure 4, programs with coaching were quite variable in their impacts on student outcomes, which meant that our statistical models could not confirm that those gains were different, on average, than in programs that did not include coaching. However, few programs focused on extended 1:1 coaching; instead, coaching appeared more as an add-on to traditional professional development.  

Finally, the specific activities teachers engaged in during PD (reviewing student work, solving problems, developing curriculum materials, and reviewing both generic and their own students’ work) did not, in and of themselves, have any effect on student outcomes in our formal analysis. Nor was the duration of the PD related to student outcomes.  

What can we say about STEM instructional improvement programs?  

We found that the programs significantly associated with above-average student gains included:  

  • PD focused on new curriculum materials. 
  • Programs aimed at improving teachers’ knowledge of content, pedagogy, and/or how students learn.
  • Programs that included meetings to troubleshoot and discuss classroom implementation of the program, same-school participation and collaboration, and/or summer workshops that allowed for concentrated learning time.

Programs with only some or few of these characteristics may still have positive effects; however, when programs included all these characteristics, student outcomes were improved well above the average program effect.  

We believe that this meta-analysis highlights the importance of professional knowledge for teaching. This professional knowledge encompasses knowing how content, student thinking, and curriculum come together, and then making good instructional decisions based on the particulars of the situation (Ball, Thames, & Phelps 2008; Lampert, 2001). Programs that outperform others in our analysis tend to focus on strengthening this form of knowledge in particular, rather than promoting general pedagogical knowledge or knowledge of more peripheral topics.  

What can’t we say?  

Meta-analyses have the advantage of examining programs implemented across a wide variety of contexts, providing some robustness to findings. However, this meta-analysis is limited in what it can say about professional learning systems “on the ground” in U.S. schools and districts.  

First, each of the programs we examined was implemented in a specific context; whether it would succeed in another context is an open question. We found a slight trend toward smaller effects in high-poverty settings, suggesting that interventions may work better, on average, in districts serving more advantaged students. However, we found no further interactions by student race, ethnicity, district type (urban, suburban or rural), or size of the treatment group. That said, other aspects of district and school context appear to affect how well programs perform. We know from studies of policy implementation, for instance, that leadership and peer support can matter quite a bit (e.g., Matsumura, Garnier, & Resnick, 2010; Wanless et al., 2013), and the presence of competing instructional guidance and initiatives (e.g., instructional pacing guides, conflicting advice on what and how to teach) tends to dampen teacher change (Hill, Corey, & Jacob, 2018). The studies we reviewed contained no information about these factors.  

Second, the programs described here tended to be small and intensive, with teachers participating voluntarily, often with support from university academics or researchers. By contrast, local professional development can involve myriad offerings, with teachers spreading their time across several different settings and topics in sessions led by other teachers or school or district leaders. In some systems, teachers have at least partial choice over their professional development, while in other systems they have very little.   

All of this is to say that we don’t yet know whether the features that worked in the schools in our analysis will work in typical U.S. schools.  

A wider context 

STEM teachers in U.S. schools typically engage in a wide variety of professional learning activities, often in a single year. This leads to an important question for districts: How can they make more time for the kinds of learning opportunities that posted better gains in our analysis? Teachers already report feeling overwhelmed by the sheer volume of reform and ever-increasing instructional responsibilities (American Federation of Teachers, 2017), so districts are likely to need to scale back or eliminate another activity to make room for more effective forms of professional learning.  

So what should they eliminate? We would nominate data team meetings, where teachers study student data in hopes of individualizing and improving instruction. These programs use data from interim or benchmark tests (as opposed to formative assessment programs, in which teachers create and study their own assessments). One review of data-study programs produced only two positive results and one negative result, out of a total of 19 analyses relating program participation to student test score outcomes (Hill, in press). In addition, qualitative research suggests that teachers studying data does not itself lead to improved instruction (Barmore, 2018; Goertz, Oláh, & Riggan, 2009), and our own observations suggest that data-team discussions often ascribe poor student performance to factors other than instruction itself.  

Yet, recent national surveys suggest that schools have made large investments in having teachers study student assessment data (e.g., Banilower et al., 2018). Redirecting these meetings toward building expertise in curriculum materials and content seems natural; we caution, however, that districts will have to do so carefully, using routines and structures that focus attention squarely and deeply on instruction.  

That these STEM instructional improvement programs boost student outcomes should be a reason for optimism among policy makers and leaders. Our findings may, for instance, help shape how states and districts choose to spend Title II dollars, funds aimed at improving teacher quality. They also suggest how leaders may narrow the scope of teacher professional learning in ways likely to increase the effectiveness of those efforts.  

References 

American Federation of Teachers. (2017). 2017 educator quality of life survey. Washington, DC: Author. 

Ball, D.L. & Cohen, D.K. (1996). Reform by the book: What is — or might be — the role of curriculum materials in teacher learning and instructional reform? Educational Researcher, 25 (9), 6-14. 

Ball, D.L., Thames, M.H., & Phelps, G. (2008). Content knowledge for teaching: What makes it special. Journal of Teacher Education, 59 (5), 389-407. 

Banilower, E.R., Smith, P.S., Malzahn, K.A., Plumley, C.L., Gordon, E.M., & Hayes, M.L. (2018). Report of the 2018 NSSME+. Chapel Hill, NC: Horizon Research, Inc. 

Barmore, J.M. (2018). Journey from data into instruction: How teacher teams engage in data-driven inquiry. Cambridge, MA: Harvard University.  

Goertz, M.E., Oláh, L.N., & Riggan, M. (2009). From testing to teaching: The use of interim assessments in classroom instruction. Philadelphia, PA: University of Pennsylvania.  

Hill, H.C. (in press). Stop doing that…and try this: Studying student assessment data. Education Week. 

Hill, H.C., Corey, D.L., & Jacob, R.T. (2018). Dividing by zero: Exploring null results in a mathematics professional development program. Teachers College Record, 120 (6), n6. 

Lampert, M. (2001). Teaching problems and the problems of teaching. New Haven, CT: Yale University Press. 

Lynch, K., Hill, H.C., Gonzalez, K.E., & Pollard, C. (2019). Strengthening the research base that informs STEM instructional improvement efforts: A meta-analysis.Educational Evaluation and Policy Analysis, 41 (3), 260-293. 

Matsumura, L.C., Garnier, H.E., & Resnick, L.B. (2010). Implementing literacy coaching: The role of school social resources. Educational Evaluation and Policy Analysis, 32 (2), 249-272.  

Shavelson, R.J. & Towne, L. (Eds.). (2001). Scientific research in education. Washington, DC: The National Academies Press. 

Wanless, S.B., Patton, C.L., Rimm-Kaufman, S.E., & Deutsch, N.L. (2013). Setting-level influences on implementation of the Responsive Classroom approach. Prevention Science, 14 (1), 40-51. 

ABOUT THE AUTHORS

default profile picture

Cynthia Pollard

CYNTHIA POLLARD is a Ph.D. candidate in education at the Harvard Graduate School of Education.

default profile picture

Heather C. Hill

HEATHER C. HILL  is the Jerome T. Murphy Professor in Education at the Harvard Graduate School of Education, Cambridge, MA.

default profile picture

Kathryn E. Gonzalez

KATHRYN E. GONZALEZ  is a Ph.D. candidate in education at the Harvard Graduate School of Education. 

default profile picture

Kathleen Lynch

KATHLEEN LYNCH  is a postdoctoral research associate at the Annenberg Institute at Brown University, Providence, RI.