Learning how 15-year-olds perform on an international exam may provide good information for national policy makers, but local schools may need other choices for their decision making.
Measuring and comparing educational outcomes at the international and national level has increased rapidly over the past 40 years with recent trends extending to the local level. As a case in point, the Organization for Economic Cooperation and Development (OECD) and its U.S. administrator, McGraw-Hill Education CTB, have recently concluded the first cycle of the OECD Test for Schools (OECD-TFS), which is designed to compare 15-year-olds on their proficiency in math, science, and reading.
As part of recruitment efforts by this highly respected international organization, Andreas Schleicher, OECD’s Paris-based director of education, is personally visiting U.S. schools to promote the new assessment (Borsuk, 2014) . Although the expansion of the Program for International Student Assessment (PISA) into individual schools offers many possibilities, ever-shrinking budgets and test fatigue in the U.S. raises concerns among educational stakeholders. As such, educators should carefully weigh the advantages of participating in yet another test against the cost of such participation, especially in terms of the important instruction time lost. Educators at all levels must understand the limitations of using a PISA-type test for improving instruction. What can and can’t PISA — as the foundation for the OECD’s school-based test — tell us about U.S. schools? What other aspects of the PISA design (and by extension, the OECD-TFS) should educators consider before participating in the school-based version of this international assessment?
OECD and PISA
OECD’s mission is to “promote policies that will improve the economic and social well-being of people around the world” (OECD, n.d.). Since the mid-1990s, OECD has emerged as a leader in education policy, with increased agency and influence on the world stage. As the largest OECD education study, PISA began in 2000 and was originally developed to measure math, science, and reading literacy in economically developed countries on a three-year cycle. Established by an international consortium of content, assessment, and survey experts, the PISA design is unique in that in each three-year cycle one content domain is considered a major domain — such as reading in 2009, math in 2012, science in 2015 — and the other two content areas are considered minor domains. Two-thirds of testing time is dedicated to the major domain, with the remaining one-third split over the two minor domains.
In addition to the achievement portion of PISA, students respond to a background questionnaire seeking information about them, their family, their school, and their home. Student test results are presumably comparable across countries and serve as a useful gauge for monitoring proficiency in the measured content areas. OECD and individual countries use the PISA results to inform education policy, usually at the national or education system level, and researchers use results to understand correlates of achievement.
Two unique aspects of PISA
PISA evaluates what a population of 15-year-olds knows and what they can do with what they know.
A key feature of PISA is that it is not designed to measure how well students master a specific curriculum. Rather, PISA assesses the degree to which young people at the end of compulsory education have the knowledge and skills necessary for participation in adult life and society. That is, PISA evaluates what a population of 15-year-olds knows and what they can do with what they know. Under this design, participating 15-year-olds in the U.S. were drawn from five different grades in 2012, with most students coming from grades 9 to 11. Given this method of selection, there is no linkage between participating students and their teachers. Such a disconnect likely limits the usefulness of PISA results for understanding the relationship between teaching and learning and for making meaningful changes in pedagogy, classroom climate, or other areas. Under the national Race to the Top initiative, a core reform area involves “building data systems that . . . inform teachers and principals about how they can improve instruction” (U.S. Department of Education, 2009, p. 2). As a result, states that receive Race to the Top funds are required to measure student achievement and to directly connect these results to the teacher. With a national goal of understanding the contribution of the teacher to student achievement at the state and local levels, educators must weigh the advantages and disadvantages of participating in a test of achievement that does not link teachers to their students.
In a similar vein, educators should seriously consider the implication of using a test that won’t be linked to the Common Core State Standards (or any state standards). Given that the creation of large-scale assessments for the Common Core would charitably be described as complex and difficult, how effective will the OECD-TFS be in providing information about student progress relative to the Common Core? The OECD itself, in a recent publication examining the link between PISA and the Common Core, noted that, “Analyzing PISA tasks in CCSSM [Common Core State Standards in Mathematics] terms is a nontrivial exercise” (OECD, 2013, p. 81). Indeed!
Educators also should consider the fundamental compromise that arises when PISA measures dozens of educational systems simultaneously. In the last cycle of PISA in 2012, 67 highly varied countries and educational systems elected to participate. Of these 67 participants, 34 countries are members of the OECD. And although PISA is ultimately by and for this group of wealthy countries, the test is also designed to make valid comparisons among economically developing countries like Indonesia and Colombia. The cultural, linguistic, and geographic variety of participating countries and systems necessitate a certain degree of accommodation and compromise during development to ensure that the test is relevant to each participating system and that the results are reasonably comparable (Meredith, 1993; Oliveri & von Davier, 2011; van de Vijver & Leung, 1997).
The previous two points — that PISA is intentionally divorced from curricula and that the test is developed to serve many diverse populations — illustrate that although PISA is a high-quality, well-developed instrument whose design and administration make it a good assessment for lots of different settings, it probably is not a great one for any particular country. Both points are particularly important as we move from the national level down to the local school level, where the most recent PISA initiative aims to make comparisons between individual schools and countries.
PISA and policy
PISA can only provide a snapshot of what a single age-group of students knows about a limited set of topics every three years. It is not a comprehensive, longitudinal view of all important aspects of an educational system.
As a reminder and for clarification, consider some PISA basics — that it is a test of 15-year-olds in math, science, and reading that occurs every three years and on a rotating basis emphasizes assessment in one content area and the other two to a lesser extent. Although this method minimizes testing time for students, testing experts have raised concerns about measuring achievement trends in minor domains (Mazzeo & von Davier, 2009). This issue would only be exacerbated for individual schools as the sample sizes (75 per school) would be a tiny fraction of the sample sizes of countries (around 5,000). Further, the assessment was designed specifically not to focus on what is taught in schools but rather what the OECD — an economic, market-oriented organization, based in Paris, with 34 member states — feels is important for students to know to operate in a global economy and society. With this orientation and design, PISA can only provide a snapshot of what a single age-group of students knows about a limited set of topics every three years. It is not a comprehensive, longitudinal view of all important aspects of an educational system. Any interpretations of PISA results should bear this in mind.
Any assessment is necessarily the product of some sort of agenda, be it political, altruistic, or otherwise. PISA is no exception.
According to Kane (1992), to “validate a test-score interpretation is to support the plausibility of the corresponding interpretive arguments with appropriate evidence” (p. 527). In other words, for a test interpretation to be valid, we must ensure that the evidence we use is appropriate for the conversation we wish to have. In the case of OECD-TFS, PISA provides the foundation from which to interpret and use the results. As such, it is important to decide if PISA provides “appropriate evidence” to evaluate the quality of a school, and local stakeholders — including but not limited to educators — are in the best position to make this decision. In many respects, validity questions are intuitive and reasonably straightforward to answer. For example: Are 15-year-olds the ideal population for comparison in your district? Does the PISA content represent those areas of emphasis that are important to your school? Are PISA questions aligned with your curriculum? If so, at what grade(s) is this content covered? Answering these and other questions relevant to the local context under consideration is key to determining whether a PISA-type test help an individual school make informed decisions.
In addition, remember that any assessment is necessarily the product of some sort of agenda, be it political, altruistic, or otherwise. PISA is no exception. As such, test consumers should be cognizant of who is administering the test and for what purpose, if only to understand what the test developers hope to learn and what information can reasonably be gleaned from the results. To be clear, we are not arguing that the OECD is promoting a nefarious, hidden agenda. Rather, we want to highlight that the fundamental guiding principle that underlies the OECD and its work is an economic one. As such, PISA has some clear benefits and drawbacks, the basics of which are important to understand when determining whether the benefits of participation at the local level outweigh the costs.
Alternatives to the OECD-TFS
We live in an ever-changing, interconnected, and globalized world, and overwhelming evidence suggests that these trends will only accelerate in coming years. Most schools have clear mandates to produce citizens who will be prepared to participate — and even thrive — in this global society. In recent years, a great deal of focus has been placed on workforce knowledge, and the OECD, an economic organization, has been at the ready to provide such information to national systems and, recently, to individual schools. Although PISA may provide necessary and useful information to national policy makers on the condition of their future workforce, it is not designed to directly measure the primary responsibility of schools and teachers: to teach a defined curriculum. In addition, an age-based rather than class-based sample influences the types of inferences that can and should be drawn from PISA results. As a monitoring device, PISA is simply one tool among many. Of course, the OECD and McGraw-Hill Education CTB (a for-profit publishing company) want schools to participate in these assessments. Testing is big business, with estimated annual costs of complying with federal accountability requirements at nearly $2 billion in the United States alone (Ujifusa, 2012). With shrinking state budgets and approximate costs paid to McGraw Hill for the 2013-14 school year of $8,000 to $11,500 per school (America Achieves, n.d.), educators might consider some lower-cost alternatives.
NAEP
The United States has one of the longest-running, comprehensive, and most innovative national assessments in the world. The National Assessment of Educational Progress (NAEP) spans a vast number of subjects and, unlike PISA, focuses on a specific grade and a common curriculum. As NAEP is administered in each state, it provides state-to-state comparisons, rather than comparing a school (with a total population of several hundred to a few thousand) to a country (with a population of millions of 15-year-olds). NAEP also has been designed to assess U.S. students in our specific cultural and educational context. As mentioned previously, comparability of assessment results across participating countries is necessary; however, bias arising from cross-cultural comparisons is a serious concern in international assessments (Ercikan, 2002; Grisay & Monseur, 2007; Hambleton, 2002). In this regard, NAEP has a clear advantage, in that these cultural differences are reduced, even in a country as varied as the U.S. This, to us, is a natural area for development by the NAEP program. Clearly, however, a NAEP test for schools would be lacking in terms of an international comparison component. As such, we propose the possible use of another international assessment.
TIMSS
The Trends in International Mathematics and Science Study, which started in 1995, is an international assessment that precedes PISA. As the name indicates, the assessment covers mathematics and science and is administered every four years at 4th and 8th grade, with an additional version of the assessment available for 12th grade. Unlike PISA, TIMSS directly measures an internationally agreed-upon curriculum that students have had an opportunity to learn. Further, TIMSS is grade-based, providing a direct link between selected students and their teacher, offering the possibility of understanding how teachers contribute to student achievement. Unlike PISA, however, TIMSS lacks an explicit link to workforce knowledge and is less innovative than PISA (which also measures financial literacy and includes more diverse background measures and item types). Finally, the International Association for the Evaluation of Educational Achievement (IEA) — the organization responsible for TIMSS — has yet to develop a school-based assessment along the lines of the OECD-TFS. This leads us to another suggestion for a low-cost way that schools could create their own assessment that would provide them with internationally comparable information.
A modest proposal
One reasonable, accessible, and inexpensive alternative to the OECD-TFS may be creating an assessment with released items from NAEP or TIMSS. This would be a fairly simple endeavor as each of these programs regularly releases operational test questions from past cycles. To that end, the National Center for Educational Statistics (NCES) offers a collection of released TIMSS items for grades 4, 8, and 12 (see http://nces.ed.gov/TIMSS/educators.asp). Schools may use these items to create their own assessment and then benchmark answers against national and international performance. In the case of TIMSS, released items are available back to 1995 and include information on the main topic along with the measured content and cognitive domains. Locally tailored assessments like these could have pedagogical use beyond a simple numeric value intended to represent a school’s quality. This type of assessment also leaves design features in the hands of local educators who best know the needs of their students. Such a plan would require that local experts choose questions according to a specification appropriate to the school’s curriculum. Interpreting the results would only require basic calculations, such as percentages. A disadvantage of this approach is that schools would not receive an overall score that is comparable internationally. Instead, they would gain information at an individual question level. But given the margin of uncertainty around individual school scores on the OECD-TFS, the reported achievement band is often nearly meaningless (e.g., it is possible for a school to be well above or well below the OECD average level of achievement).
Conclusion
Schools in the U.S. must contend with global forces that are changing the way we understand education. In a recent TED talk, Andreas Schleicher (2012), director of the education division of OECD, claimed that PISA is “really a story of how international comparisons have globalized the field of education that we usually treat as an affair of domestic policy” (para. 2). This suggests that OECD can lay claim to the tools necessary for schools to understand their own educational system in a globalized world. But globalization is also a local process affecting local people. Tests shouldn’t globalize education but rather give us a better understanding of how our students are able to operate in a globalized world. PISA is one tool that may be able to accomplish this, but we contend that, depending on what local schools want to know, there may be better (even free) tools.
PISA is a decent measure of education achievement for many countries and systems (67 at last count). However, the international assessment program necessarily can’t be targeted for the needs of a particular country. Even meeting the needs of the OECD, which is the specific mandate of PISA, is meeting the general needs of 34 heterogeneous countries with a common thread that they are relatively “rich” in a global sense. What has emerged over the past 15 years are a series of policy prescriptions stemming from this “decent” but not ideal measure. And although we believe there is a lot to be gained by comparing ourselves to the rest of the world, at the local level, perhaps we should not be looking to an economic organization to provide education policy advice based on an assessment that was never intended to directly link to our local schools. A focus on 15-year-olds’ test performance may be good for policy makers at the national level, but schools primarily need accurate measures of what they have been specifically tasked to do by their communities.
References
America Achieves. (n.d.). The OECD test for schools: Frequently asked questions. www.americaachieves.org/docs/OECD/FAQ-OECD.pdf
Borsuk, A.J. (2014, April 19). Why 14 high schools take international standardized test. Journal Sentinel. www.jsonline.com/news/education/why-14-wisconsin-high-schools-take-international-standardized-test-b99251016z1-255870191.html
Ercikan, K. (2002). Disentangling sources of differential item functioning in multilanguage assessments. International Journal of Testing, 2 (3-4), 199-215. doi:10.1080/15305058.2002.9669493
Grisay, A. & Monseur, C. (2007). Measuring the equivalence of item difficulty in the various versions of an international test. Studies in Educational Evaluation, 33 (1), 69-86. doi:10.1016/j.stueduc.2007.01.006
Hambleton, R. (2002). Adapting achievement tests into multiple languages for international assessments. In A.C. Porter & A. Gamoran (Eds.), Methodological advances in cross-national surveys of educational achievement. Washington, DC: National Academies Press.
Kane, M.T. (1992). An argument-based approach to validity. Psychological Bulletin, 112 (3), 527-535. doi:10.1037/0033-2909.112.3.527
Mazzeo, J. & von Davier, M. (2009). Review of the Programme for International Student Assessment (PISA) test design: Recommendations for fostering stability in assessment results. Presented at the NCES Conference on the Program for International Student Assessment: What Can We Learn from PISA? Washington, DC: U.S. Department of Education, Institute for Educational Sciences, National Center for Education Statistics.
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58 (4), 525-543. doi:10.1007/BF02294825
OECD. (2013). Lessons from PISA 2012 for the United States. Paris: OECD Publishing. www.oecd-ilibrary.org/education/strong-performers-and-successful-reformers-in-education_2220363x
OECD. (n.d.). About the OECD — Organisation for Economic Cooperation and Development. www.oecd.org/about/
Oliveri, M.E. & von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Psychological Test and Assessment Modeling, 53 (3), 315-333.
Schleicher, A. (2012). Use data to build better schools. www.ted.com/talks/andreas_schleicher_use_data_to_build_better_schools/transcript?language=en
U.S. Department of Education. (2009). Race to the Top program executive summary. Washington, DC: Author. www2.ed.gov/programs/racetothetop/executive-summary.pdf
Ujifusa, A. (2012, November 29). Standardized testing costs states $1.7 billion a year, study says. Education Week. www.edweek.org/ew/articles/2012/11/29/13testcosts.h32.html
Van de Vijver, F.J.R. & Leung, K. (1997). Methods and data analysis for cross-cultural research. Thousand Oaks, CA: Sage.
Citation: Rutkowski, D., Rutkowski, L., & Plucker, J.A. (2014). Should individual U.S. schools participate in PISA? Phi Delta Kappan, 96 (4), 68-73.
ABOUT THE AUTHORS

David Rutkowski
David Rutkowski is an assistant professor of education leadership and policy studies in the School of Education at Indiana University, Bloomington, Ind.

Leslie Rutkowski
Leslie Rutkowski is a professor of research methods in the School of Education at Indiana University, Bloomington.
