Toward better ways of measuring school climate

The limitations of the existing school climate surveys should come as no surprise, given how new most of these tools are.

Twenty years ago, with the passage of the No Child Left Behind Act (NCLB), a “good school” came to be defined mainly by its performance on accountability metrics, especially the scores its students received on standardized reading and math tests. Ever since then, critics have decried this narrow focus on academic achievement (in just a couple of subjects, no less), arguing that a school shouldn’t be deemed successful unless it also helps young people become healthy and well-rounded individuals, active citizens, and responsible members of the community. Moreover, a fast-growing body of research has shown that focusing on social-emotional learning, civic education, and other goals doesn’t take anything away from academic learning. To the contrary, when educators create a school climate in which students feel safe, supported, and connected to their classmates and teachers, those students become more likely to succeed academically (Melnick, Cook-Harvey, & Darling-Hammond, 2017; National Commission on Social, Emotional, and Academic Development, 2019).

Informed by such criticism, the Every Student Succeeds Act (ESSA), which replaced NCLB in 2015, requires states to use a somewhat more expansive definition of school quality (Buckley et al., 2017; Temkin & Harper, 2017). Under ESSA, states must hold schools accountable for students’ proficiency in reading and math, their English language proficiency, the amount of progress they make in those areas, and (in the case of high schools) their graduation rates. But states must also hold schools accountable for their performance on at least one School Quality/Student Success Indicator (SQSS) — a “fifth indicator” of the state’s choosing that shows how schools have performed in another “nonacademic” area. For example, the SQSS could focus on the extent to which schools improve student attendance, engage students in learning, provide a safe and supportive learning environment, or pursue some other goal, as long as the indicator is consistent and measurable, allowing the state to compare schools’ performance and track their progress over time.

Currently, there exist no standardized measures with which to assess school quality in these broader ways.

Herein lies the problem: Currently, there exist no standardized measures with which to assess school quality in these broader ways. This has created a real conundrum for states, which have struggled to identify nonacademic measures that are reliable and valid (meaning that they actually measure what they claim to measure).

When states submitted their initial ESSA plans to the U.S. Department of Education for approval in 2017, six of them included a school climate measure as part of their SQSS — that is, they planned to use a tool, such as a survey, that would show how supportive, safe, and engaging the school environment is for students (Temkin & Harper, 2017). An additional six states indicated they were still exploring such measures but did not commit to using them. And others discussed school climate surveys in other sections of their plans. We saw that as a promising sign of state and local policy makers’ growing recognition of just how important it is to ensure that our public schools don’t just focus on a narrow set of academic goals but enable children to flourish in all sorts of ways.

By 2019, eight states (Idaho, Illinois, Iowa, Maryland, Montana, New Mexico, North Dakota, and South Carolina) planned to include a measure of school climate in their SQSS (Jordan & Hamilton, 2020; Temkin & Harper, 2017), and five other states (California, Delaware, Georgia, Massachusetts, and Nevada) said they planned to measure school climate, but not for accountability purposes (Jordan & Hamilton, 2020). Most states instead chose indicators, such as a school’s rate of chronic absenteeism among students, that are associated with the quality of the school’s climate but do not measure the climate itself (Temkin & Harper, 2017).

In short, it appears that while many states see the value of including measures of school climate in their accountability systems, many of them have been put off by the technical challenges involved in doing so. However, our research suggests that these challenges can be overcome. If state officials rethink the way they design and use surveys and other tools, they’ll find that there is no reason to put aside their initial plans to measure school climate. They can and should hold schools accountable to the goal of providing every child with a healthy environment in which to learn and grow.

School climate surveys: Common problems

There are some well-designed surveys of school climate available. (For a compendium of existing surveys, see the National Center on Safe Supportive Learning Environments, n.d.) However, while these surveys can help inform improvement efforts within a school, researchers caution against trying to use them to compare schools or for accountability purposes (Buckley et al., 2017; Duckworth & Yeager, 2015; Jordan, 2020; Temkin, Ryberg, & Her, 2020). Surveys may provide valid measures of individual students’ perceptions of their school’s climate, but combining those individual perceptions into a single aggregated score won’t necessarily result in a valid measure for the whole school (Cornell & Huang, 2019; Ryberg et al., 2020; Schweig, 2014).

An aggregate score on a school climate survey can also mask critical differences between and within schools (Voight et al., 2015). Imagine, for instance, that all the students in one school rate their school climate right in the middle of the scale, but in a second school, half the students give the highest possible rating and half give the lowest. While both schools get exactly the same aggregate score, the survey results tell entirely different stories: The first school provides a mediocre learning environment for everybody, but the second school appears to provide a wonderful environment for half its students, leaving the other half to feel unsafe, unsupported, and unengaged.

To make sense of students’ differing experiences of school climate, state officials can disaggregate survey results by student subgroup. In the second school above, this could reveal, for instance, that the members of one student population (native English speakers, say, who make up 50% of the school’s enrollment) gave the school climate a high rating, while members of another subgroup (the 50% of students who are English learners) gave it a low rating — this would suggest an urgent need to provide a better environment for the latter. However, disaggregating the data won’t necessarily reveal any demographic patterns that can help explain why students have such polarized perceptions, particularly if data do not include key demographic information. For example, research consistently shows that LGBTQ+ students often experience a more negative school climate than their heterosexual, cisgender peers, yet few school climate survey tools include measures of sexual orientation and gender identity. If officials consider only the available demographic indicators and find no significant differences, they may wrongly conclude that the students are all experiencing the school climate similarly.

Another limitation of the available climate surveys is that they typically rely on students’ subjective, self-reported impressions, which raises concerns about honesty and accuracy. For instance, school officials may pressure students to give the school a positive rating, effectively “gaming” the survey results. When accountability is a factor, even the best measures can be corrupted, and this is even more true when there is no objective way to validate the data (Buckley et al., 2017; Schanzenbach et al., 2016).

Also, the validity of school climate measures depends on who completes the surveys. Ideally, every student does so. But in reality, it’s rare to achieve anything close to a 100% response rate, and the students who choose not to complete the survey may have fundamentally different perceptions of their school’s climate than those who do — e.g., perhaps the least-engaged students will skip the survey, resulting in a higher average score (Buckley et al., 2017). Statistical techniques, such as data weighting, can help correct for these biases, but those techniques aren’t perfect, and they can’t be used at all if entire groups of students have skipped the survey (Brick, 2013).

Finally, the definition of school climate (or school quality more broadly) varies considerably among researchers and, as a result, among measurement tools (Thapa et al., 2013). While many survey tools share some commonalities, such as a focus on student-to-student and student-to-teacher relationships, they also differ from one another in all sorts of other ways, such as whether they give more attention to negative aspects of school climate or positive ones, or whether and how much attention they give to topics such as substance use, equity, diversity, bullying, the availability of school counselors, and so on. As long as a state uses a valid and reliable tool and does so consistently across all schools, it can meet ESSA’s requirement to include a nonacademic indicator in its accountability system. Still, given the wide variety of survey tools available, each state might be tempted to choose the one that makes the largest number of its schools look “good,” rather than the one that provides the most useful data, pointing to specific areas in which each school ought to improve.

Two ways to improve upon surveys

The limitations of the existing school climate surveys should come as no surprise, given how new most of these tools are. We can expect to see improvements over time, as survey designers make it a priority to create tools that allow for valid comparisons among schools. Further, we can expect that researchers will gradually come to a consensus as to which elements of school climate are most important to measure. However, while school climate surveys will likely improve, there will still be some serious challenges to confront, including how to ensure that all kinds of students are equally likely to fill out their school’s survey, how to interpret the varying responses that might be concealed within a school’s average score, and how to prevent schools from gaming the survey results.

So, if states aim to hold their schools accountable for providing a safe, engaging, and supportive learning environment — as they should — then what can they do? How can they measure school climate in ways that are valid, reliable, and comparable? We’ve taken a close look at two practices that might help: 1) analyzing the variance of survey scores within each school, and 2) combining the use of climate surveys with independent, structured observations. These approaches do not fully address our concerns, but they can give states much more credible and useful information about school climate than they can get from survey results alone.

Looking at the variance

When reporting survey results, statisticians distinguish between the mean score (i.e., the average of all the individual scores) and the variance, or “standard deviation,” of those scores (i.e., how far the individual scores spread out on either side of the average). If you want to understand how two sets of results (such as the survey scores from two schools) compare to each other, statistically speaking, you need both kinds of information (Tukey, 1949).

For instance, let’s say that at one school, students give the climate an average score of 8 on a 10-point scale. If the individual student scores vary only a little (for instance, all of their scores fall in the 7 to 9 range, clustering together around the average), this means the students strongly agree with each other that the school climate is very good (Kim & Choi, 2008). But let’s say that the scores show a high amount of variance (e.g., a lot of students give the school a rating of 9 or 10, but some give it a 5 or 6, and a few give it a 3 or 4). The average score is still an 8, but it’s clear that the students sharply disagree with each other about the school climate.

When survey results show a high level of variance, this indicates that students may perceive their school experience quite differently, which should put officials on notice that the data deserve a closer look. Before rushing to commend the school for providing students with an excellent learning environment, they ought to dig a little deeper. For instance, a significant number of students may have a legitimate complaint about their experience at the school. Variance provides important context that, if ignored, could mask useful information.

Conducting structured observations

In structured observations, people who have no personal stake in the school’s reputation systematically observe and assess specific kinds of interactions at the school, rating them on a scoring rubric. Currently, most structured observations occur at the classroom level, rather than in other parts of the school, and they occur most often in early childhood settings, especially in the 41 states that require preK programs to use the Quality Rating and Improvement Systems (QRIS) and in the overlapping set of 38 states that require the use of a standardized observation tool such as the Environment Rating Scale (ERS) or the Classroom Assessment Scoring System (CLASS) for preK programs (National Center on Early Childhood Quality Assurance, 2017).

However, there’s no reason structured observations can’t be used at all grade levels, and if so, then they might provide an effective and objective means of measuring school climate. Unlike student surveys, this approach comes with no risk that the results will be skewed because school leaders have tried to influence students’ ratings, or because certain students have chosen to skip it. Further, structured observations can make it easier to measure school climate consistently from one school to another, since the same observers, using the same rubrics, can visit many classrooms at many schools.

The benefits of a blended approach

If states were to use a combination of structured observations and student surveys, supplemented by the analysis of variance scores, would that allow them to measure school climate in a way that is sufficiently valid, reliable, and comparable to be used as part of an accountability system? To gauge the potential of this blended approach to measuring school climate, we analyzed survey results from 18 middle and high schools that had participated in an evaluation of a school climate intervention in Washington, D.C. During the 2017-18 school year, 7th- to 10th-grade students at the schools had been asked to respond to the ED School Climate Surveys (EDSCLS) designed by the U.S. Department of Education. We ran the results through EDSCLS’ automatic scoring algorithm and calculated the average scores students gave their schools on topics such as how safe they feel on campus, how engaged they are in their classes, and how supportive their teachers are. Using this method, all 18 schools had what the EDSLC defined as a “favorable” school climate (i.e., a score right in the middle between “least favorable” and “most favorable”).

The combined data allowed us to more precisely identify each school’s strengths, as well as specific areas in need of improvement.

For accountability purposes, these results wouldn’t be very helpful. The similar scores don’t permit officials to compare their performance and identify the schools that stand out for the quality of their climate. Nor do such results provide each school with information about its specific strengths or weaknesses or what they need to do to improve.

Around the same time that the climate surveys were conducted, a group of trained observers had visited a randomly selected set of classrooms at the 18 schools, using an observation tool (the Classroom Assessment Scoring System-Secondary, or CLASS-S) that had them rate the classroom climate on various features related to emotional support, instructional support, and classroom organization. In effect, this observational data provided a second perspective on students’ experience, which we could compare with the survey results. Further, the observations gave us another way to make sense of the variance of students’ ratings within each school — if students ranged widely in their perceptions of the school climate, the data from individual classrooms might help explain why, revealing how the learning environment differs for students who take different classes.

Interestingly, we found that the ratings from the structured observations overlapped significantly with the scores from the survey — in many cases, both methods produced a similar overall rating of the climate at the schools. However, while observation and survey data yielded similar average scores in most cases, they differed in others. Further, the combination of the two methods often gave us a more detailed picture than either method had provided on its own. It turned out that each method had some unique information to contribute, filling in some of the gaps that the other had missed. Additionally, schools that had similar average scores across observations and surveys did not necessarily have similar variances.

The upshot is that the combined data allowed us to more precisely identify each school’s strengths, as well as specific areas in need of improvement. For instance, say that the data from structured observations reveal that the teachers in one school are effective at managing behavior in their classrooms, but data from the school’s climate survey show that students hold low perceptions of safety overall. This would seem to suggest that while the classroom climate is fine, there’s an urgent need to attend to other parts of the school (such as the playground, hallways, bathrooms, and cafeteria).

In short, our analysis suggests that if states were to combine the use of student surveys with variance scores and data from structured observations, they would significantly improve their capacity to measure school climate. State officials would get a more detailed picture of where educators have succeeded in creating a healthy learning environment and where they ought to improve. Further, because this approach balances students’ subjective perceptions with ratings by trained observers, officials can be more confident in the data.

We can’t claim that this would solve all of the challenges involved in holding schools accountable for the learning environment; indeed, it could create some new challenges. For one, conducting structured observations can be expensive, and it can be difficult to establish and maintain the reliability of data collectors throughout the process. Also, while we found that observational ratings and survey scores were significantly correlated with each other (as well as providing complementary information about school climate), we also found some discrepancies between the two methods, likely because different people are involved in constructing the measures and defining the units of analysis. We don’t expect such discrepancies to negate the value of combining survey scores and observational data, but their presence does raise questions that need to be explored (such as whether the discrepancies have to do with the differing ways in which these tools define school climate, or perhaps from differences in the ways students and external observers think about what matters most in the school environment).

Still, while some technical challenges remain, it is critically important to continue designing valid and reliable ways to measure school climate, both to send educators a clear message that climate (and not just performance on standardized tests) matters to student success and to show educators precisely where they need to improve in this area. As the adage goes, “what gets measured gets done.” States should not be deterred by the first few hurdles they’ve encountered as they try to incorporate school climate survey data into their accountability systems. Instead, they should look beyond aggregated survey data alone, blending that data with variance scores and data from structured observations of classroom interactions.

References

Brick, J.M. (2013). Unit nonresponse and weighting adjustments: A critical review. Journal of Official Statistics, 29 (3), 329-353.

Buckley, K., Gopalakrishnan, A., Kramer, E., & Whisman, A. (2017). Innovative approaches and measurement considerations for the selection of the school quality and student success indicator under ESSA. Council of Chief State School Officers.

Cornell, D. & Huang, F. (2019). Collecting and analyzing local school safety and climate data. In M.J. Mayer & S.R. Jimerson. (Eds.), School safety and violence prevention: Science, practice, and policy (pp. 151-175). American Psychological Association.

Duckworth, A.L. & Yeager, S.Y. (2015). Measurement matters: Assessing cognitive ability for educational purposes. Educational Researcher, 44 (4), 237-251.

Jordan, P.W. (2020). Can school climate surveys measure school quality? FutureEd.

Jordan, P.W. & Hamilton, L.S. (2020). Walking a fine line: School climate surveys in state ESSA plans. FutureEd.

Kim, J. & Choi, K. (2008). Closing the gap: Modeling within-school variance heterogeneity in school effect studies. Asia Pacific Education Review, 9 (2), 206-220.

Melnick, H., Cook-Harvey, C., & Darling-Hammond, L. (2017). Encouraging social and emotional learning in the context of new accountability. Learning Policy Institute.

National Center on Early Childhood Quality Assurance. (2017). QRIS Compendium 2016 — Use of observational tools in QRIS. Author. https://childcareta.acf.hhs.gov/sites/default/files/public/qris_observational_tools_2016.pdf

National Center on Safe Supportive Learning Environments. (n.d.). School climate survey compendium. https://safesupportivelearning.ed.gov/topic-research/school-climate-measurement/school-climate-survey-compendium

National Commission on Social, Emotional, and Academic Development. (2019). From a nation at risk, to a nation at hope: Recommendations from the National Commission on Social, Emotional, and Academic Development. Aspen Institute.

Ryberg, R., Her, S., Temkin, D., Madill, M., Kelley, C., Thompson, J., & Gabriel, A. (2020). Measuring school climate: Validating the Education Department School Climate Survey in a sample of urban middle and high school students. AERA Open, 6 (3).

Schanzenbach, D.W., Bauer, L., & Mumford, M. (2016). Lessons for broadening school accountability under the Every Student Succeeds Act. The Hamilton Project.

Schweig, J. (2014). Cross-level measurement invariance in school and classroom environment surveys: Implications for policy and practice. Educational Evaluation and Policy Analysis, 36 (3), 259-280.

Temkin, D. & Harper, K. (2017, September 20). Some states are missing the point of ESSA’s fifth indicator. Child Trends.

Temkin, D., Ryberg, R., & Her, S. (2020). States and districts should exercise caution before using school climate survey data to compare schools. Child Trends.

Thapa, A., Cohen, J., Guffey, S., & Higgins-D’Alessandro, A. (2013). A review of school climate research. Review of Educational Research, 83 (3), 357-385.

Tukey, J.W. (1949). Comparing individual means in the analysis of variance. Biometrics, 99-114.

Voight, A., Hanson, T., O’Malley, M., & Adekanye, L. (2015). The racial school climate gap: Within-school disparities in students’ experiences of safety, support, and connectedness. American Journal of Community Psychology, 56 (3-4), 252-267.