Assessing our assessments: Paper vs. computer

Research shows how a test is delivered could affect student performance, so changes in test format shouldn’t be taken lightly.

At a Glance

Computer-based testing is common, but it’s unclear whether the results are comparable to those of paper-based tests.
The mode of testing may affect the results because not all students are comfortable with technology and reading from a screen may affect comprehension.
New York state moved from paper- to computer-based testing in 2024.
A review of test results from 2023 and 2024 found that scores declined when assessments began to be delivered on computers.

Computer-based testing has surged in recent years, becoming a primary mode for delivering assessment across the U.S. This trend has occurred at both the elementary and secondary education levels and in all main content areas — reading, math, science, and social studies (Bennett, 2003). Advocates of computer-based testing (CBT) believe it holds great potential for state assessments (Wang et al., 2008).

CBT offers several advantages over traditional paper-and-pencil tests or paper-based tests (PBT). These include immediate scoring and reporting of results, enhanced security, efficient administration, flexible scheduling, cost reduction, and the ability to improve accessibility by incorporating multimedia, audio, and large print accommodations. Furthermore, CBT can be administered offline, online, or through networks, making it usable in most schools. Technology integration into education has made computer-based assessments a logical step.

These advantages, however, are tempered by several challenges. Schools’ insufficient capacity and infrastructure (e.g., technology devices and connectivity) to administer assessments to all students simultaneously serves as one of the primary drawbacks (National Center for Education Statistics, 2013; Randall et al., 2012; Thurlow et al., 2010). Additionally, a lack of school staff to keep equipment running, technical difficulties with assessment software, the need to train staff to administer tests with fidelity, and security threats present challenges (Davis, 2014; National Center for Education Statistics, 2013; Thurlow et al., 2010). Furthermore, assessment experts, researchers, practitioners, and users have expressed concern about the comparability of scores between the two test administration modes (Wang, et al., 2008).

Standardized testing in New York

In New York, students in third through eighth grade take state assessments in English language arts and math. Required under the Every Student Succeeds Act, the New York State Grades 3-8 English Language Arts and Mathematics Tests are administered in the spring — near the end of the academic year — to all public and charter school students in these grades. Each assessment is focused on grade-level material and is aligned to the state standards. The test questions match what students learn in the classroom throughout the year, and the tests are designed to measure how well students are meeting the state’s learning standards and to identify learning needs.

Assessment experts, researchers, practitioners, and users have expressed concern about the comparability of scores between the two test administration modes.

Student performance on standardized assessments significantly impacts not only the students, but also teachers, schools, and communities (Scrimgeour & Huang, 2022). These scores influence which students are identified for gifted and talented and special education programs, as well as decisions about grade promotion and retention, course placement, and student graduation. They also affect teacher evaluations, school accountability plans, and school funding. As schools transition to computer-based testing, it’s important to ensure that paper and digital formats give comparable results. Inconsistent scores across delivery modes can undermine data validity and mislead decision-makers.

New York state moved from paper-based to computer-based testing in the 2023-24 school year for students in fifth and eighth grades. In addition, many districts opted into computer-based testing from third to eighth grades. Some teachers and administrators questioned this switch and its effectiveness. We wanted to contribute to the current research by studying how elementary students perform when assessments are converted to computer-based tests. We reviewed existing research about the different testing modes, examined scores both before and after the transition, and spoke to teachers about the shift.

The evolving testing landscape

The transition from PBT to CBT in New York state has sparked numerous discussions about the implications for student learning and achievement. A critical question is whether the format of the assessment influences student performance.

Test mode effects

Research shows mounting evidence that identical paper-based and computer-based tests will not obtain the same results due to “test mode effects” (Clariana & Wallace, 2002; Wang et al., 2008). Computer-linked factors such as screen size, font size, and resolution of graphics may change the nature of a task so dramatically that CBT and PBT no longer measure the same concept (McKee & Levinson, 1990).

When discussing the transition to CBT, New York teachers expressed concern that the assessment amounted to a test of both knowledge and digital skills. Third-grade teachers were worried that students had not received sufficient instruction in using technology, despite having one-to-one devices. In a discussion of the digital skills students need, Ms. Johnson, one of the teachers we interviewed, told us:

Typing is a skill that improves with practice and repetition. Students obtain limited typing practice as this is an exercise completed within the classroom, and there is just not enough time to dedicate to proper typing skills. This can significantly limit the amount of writing they can produce in a typed essay.

Problems with screens

The growth of e-reader, computer, and tablet usage has brought about a shift toward screen-based reading (Clinton, 2019). While using screens for both educational and recreational reading offers convenience, it may not be the more desirable medium.

Some researchers even suggest that screen reading may hinder performance and self-awareness compared to paper reading, a phenomenon known as “screen inferiority.”

Although screens are nearly ubiquitous, many students initially learned to read on paper, and their comprehension strategies are based on a paper-based environment. A third-grade teacher explained:

Many of our students are used to a traditional approach to reading, where they follow along with their finger and take notes on paper. Now, we’re trying to help them adapt to a more digital-friendly method. We’re teaching them to use line reader tools to chunk text and follow along with the words on the screen.

Other ongoing concerns relate to screen fatigue and slower reading speeds on screens without corresponding comprehension benefits (Daniel & Woody, 2013). One investigation examined children’s reading rate, comprehension, and recall. It found that “while children, if given enough time, may be able to comprehend equal amounts of information from paper and computer, when reading time is accounted for, children are comprehending less efficiently when reading from computer” (Kerr & Symons, 2006, pp. 13-14). Some researchers even suggest that screen reading may hinder performance and self-awareness compared to paper reading, a phenomenon known as “screen inferiority” (Ackerman & Lauterman, 2012).

Technology familiarity

Computer-based tests require some computer literacy, so students’ familiarity with technology and their experience interacts with the exam and may impact their results (Backes & Cowan, 2019). In this way, CBT measures not only students’ academic abilities but also their technological abilities.

As Nathan Dadey and his colleagues (2018) argue, students need to be comfortable with the technology to access the assessment properly. Studies by Laurie Laughlin Davis and colleagues (2016) support this, showing that students who regularly use the testing device in class score better. To complete assessments on the computer, students must be comfortable typing, using a mouse or trackpad, and scrolling. Moving back and forth between questions and items is also of concern. Unfamiliarity with features like small screens, touchscreens, or using a mouse can put students at a disadvantage. Ms. Rowan found that the transition to CBT added to the test-prep burden:

With Grade 3 being the first year of computer-based assessments, a lot of the preparation for being able to assess on the computers is placed on the classroom teacher. Learning the tools available on computer-based assessments, when to use and how to use — this adds another component to test prep that was not previously of concern.

Familiarity with devices allows students to focus on the test content rather than struggling with the technology itself.

Test anxiety and motivation

Test anxiety occurs when students lack confidence in their abilities, therefore, underperforming on examinations. As Moshe Zeidner (1998) points out, “examination anxiety” is a condition that encompasses a range of mental, physical, and behavioral reactions that indicate worry or unease about adverse outcomes or underperformance in a judgmental context. But does taking a test on a computer lessen the anxiety or make it worse? A study of both test mode (individual vs. group) and medium (paper vs. computer) suggests that while format did not directly affect test scores, students reported feeling more anxious when taking computerized tests in a group setting compared to other conditions (Brüggemann et al., 2023).

Along with test anxiety, test motivation has an impact on student performance. The Test-Taker Motivation Model (Pintrich, 1989) specifies that the effort a test-taker directs toward a test is a function of how well the individual feels they are going to do on the test, how they perceive the test, and their affective reactions regarding the test.

There are reasons to expect that testing on a screen can improve motivation among elementary school students. Empirical studies have found that digital media can increase children’s motivation to read (Picton, 2014) as well as their test-taking motivation (Chua, 2012). The motivation of test-takers is, therefore, worth investigating in testing mode comparability studies, because it can pose a threat to the validity of inferences made regarding assessment test results (Shuttleworth, 2009).

Results in New York

In New York, paper-based state assessments were administered in spring 2023 and computer-based state assessments were administered in spring 2024. In theory, both tests were equally valid. We analyzed data from students in a select number of New York City schools who took the paper-based test one year, followed by the computer-based test the following year. While the tests differed to account for growth in student knowledge from one grade level to the next, the main difference was the test format.

The state assessment is constructed in such a way that students who perform on or above grade level should have similar scores from one year to the next. Students who are struggling academically should perform the same or better, because of increased assistance and attention. Therefore, on whole, students should score the same or higher from one year to the next.

We analyzed both mathematics and English language arts (ELA) assessments from third grade (2023 PBT) to fourth grade (2024 CBT); fourth grade (2023 PBT) to fifth grade (2024 CBT); and fifth grade (2023 PBT) to sixth grade (2024 CBT). Our review included just over 100 students per grade who had scores from each year.

Overall averages went down in every grade band once the format changed to computer-based testing (see Table 1). For grades 4-5 and 5-6, the reductions were statistically significant. While this is a relatively small sample considering all students in the New York school system, it shows reason for concern about computer-based testing.

	Mathematics Grade 3-4	ELA Grade 3-4	Mathematics Grade 4-5	ELA Grade 4-5	Mathematics Grade 5-6	ELA Grade 5-6
2023 PBT Average score	450.15	439.55	454.47	447.17	452.58	445.59
2024 CBT average score	448.23	437.98	440.18	437.80	439.44	437.51
Sample size	n = 116	n = 122	n = 111	n = 105	n = 108	n = 100

Table 1. Change in average test scores

Looking ahead

To improve the outcomes, teaching staff have shared how they would start the process of implementing CBT resources and practice more consistently from the start of the year, paying specific attention to the tools students found useful when taking tests. This includes how to highlight on the computer screen, take notes, eliminate answer choices, and read line by line.

Teachers have been using a digital curriculum to provide students with practice reading and answering questions on the computer and taking computer-based unit assessments too, but they continue to find students tend to perform better on the same assessments when issued on paper. They also have found students are getting distracted by some of the digital tools. For example, in the space to show work for the math test, students would doodle or draw and erase their work if they were unsure how to proceed.

Future research should continue to explore the diverse factors influencing student performance in various assessment formats, ultimately striving for the most accurate and meaningful evaluation of student learning possible.

Essentially, teachers have found that, along with teaching content, they have the added burden of teaching the tools needed for students to be successful on the test. This raises the question, are we making teachers teach even more to the test in computer-based testing?

The transition to CBTs offers potential benefits like adaptability and real-time data, but thorough research is needed to understand their impact on student learning and achievement. Additionally, factors beyond format, like screen size, assessment methods, and design, warrant further investigation to gain a comprehensive understanding of the complex interplay between assessment format and student performance. Future research should continue to explore the diverse factors influencing student performance in various assessment formats, ultimately striving for the most accurate and meaningful evaluation of student learning possible.

Note: The teacher names used in this article have been changed to maintain confidentiality.

References

Ackerman, R. & Lauterman, T. (2012). Taking reading comprehension exams on screen or on paper? A metacognitive analysis of learning texts under time pressure. Computers Human Behavior, 28 (5), 1816-1828.

Backes, B. & Cowan, J. (2019). Is the pen mightier than the keyboard? The effect of online testing on measured student achievement. Economics of Education Review, 68, 89-103.

Bennett, R.E. (2003). Online assessment and the comparability of score meaning (RM-03-05). Educational Testing Service.

Bridgeman, B., Lennon, M.L., & Jackenthal, A. (2003). Effects of screen size, screen resolution, and display rate on computer-based test performance. Applied Measurement in Education, 16, 191-205.

Brüggemann, T., Ludewig, U., Lorenz, R., & McElvany, N. (2024). Effects of test mode and medium on elementary school students’ test experience. European Journal of Psychological Assessment, 40 (4), 282-289.

Chakraborty, A. (2023). Exploring the root causes of examination anxiety: Effective solutions and recommendations. International Journal of Science and Research, 12, 1096-1102. 10.21275/SR23220002911.

Chua, Y.P. (2012). Effects of computer-based testing on test performance and testing motivation. Computers in Human Behavior, 28 (5), 1580-1586.

Clariana, R. & Smith, L. (1988). Learning style shifts in computer-assisted instruction. Annual meeting of the International Association for Computers in Education (IACE), New Orleans, LA.

Clariana, R. & Wallace, P. (2002). Paper-based versus computer-based assessment: Key factors associated with the test mode effect. British Journal of Educational Technology, 33 (5), 593-602.

Clinton, V. (2019). Reading from paper compared to screens: A systematic review and meta‐analysis. Journal of Research in Reading, 42 (2), 288-325.

Dadey, N., Lyons, S., & DePascale, C. (2018). The comparability of scores from different digital devices: A literature review and synthesis with recommendations for practice. Applied Measurement in Education, 31 (1), 30-50.

Davis, M.R. (2014). Online testing glitches causing distrust in technology. Education Week, 33 (30), 20-21.

Davis, L., Janiszewska, I., Schwarts, R., & Holland, L. (2016). NAPLAN device effects study. Pearson.

Kerr, M.A. & Symons, S.E. (2006). Computerized presentation of text: Effects on children’s reading of informational material. Reading and Writing, 19 (1), 1-19.

Mazzeo, J. & Harvey, A.L. (1988). The equivalence of scores from conventional and automated educational and psychological tests: A review of literature (College Board Report No. 88-8). Educational Testing Service.

McKee, L.M. & Levinson, E. M. (1990). A review of the computerized version of the Self-Directed Search. Career Development Quarterly, 38, 325-333.

National Center for Education Statistics. (2013). Testing integrity symposium: Issues and recommendations for best practice. U.S. Department of Education, Institute of Education Sciences.

Picton, I. (2014). The impact of ebooks on the reading motivation and reading skills of children and young people: A rapid literature review. National Literacy Trust.

Pintrich, P.R. (1989). The dynamic interplay of student motivation and cognition in the college classroom. In C. Ames and M. Maehr (Eds.). Advances in Achievement and Motivation, 6, 117-160.

Pommerich, M. (2004). Developing computerized versions of paper-and-pencil tests: Mode effects for passage-based tests. Journal of Technology, Learning, and Assessment, 2 (6).

Randall, J., Sireci, S., Li, X., & Kaira, L. (2012). Evaluating the comparability of paper- and computer-based science tests across sex and SES subgroups. Educational Measurement: Issues and Practice, 31 (4), 2-12.

Scrimgeour, M.B. & Huang, H.H. (2022). A comparison of paper-based and computer-based formats for assessing student achievement. Mid-Western Educational Researcher, 34 (1), Article 5.

Shuttleworth, M. (2009). Repeated measures design. Experiment Resources.

Thurlow, M., Lazarus, S.S., Albus, D., & Hodgson, J. (2010). Computer-based testing: Practices and considerations (Synthesis Report 78). University of Minnesota, National Center on Educational Outcomes.

Wang, S., Jiao, H., Young, M.J., Brooks, T., & Olson, J. (2008). Comparability of computer-based and paper-and-pencil testing in K-12 reading assessments: A meta-analysis of testing mode effects. Educational and Psychological Measurement, 68 (1), 5-24.

Zeidner, M. (1998). Test anxiety: The state of the art. Plenum Press.

This article appears in the Summer 2025 issue of Kappan, Vol. 106, No. 7-8, pp. 17-21.

ABOUT THE AUTHORS

Kristen Panzarella

Kristen Panzarella is a primary school principal with an Ed.D. in leadership, innovation, and continuous improvement. Her commitment to effective strategies to support our educational system is evident by her participation in NASEP Mastermind and the PDK Emerging Leaders program.

Angela Walmsley

Angela Walmsley is president and owner of Interactive College Prep, LLC and a professor of education at Concordia University Wisconsin, Mequon, WI. She is a past chair of the PDK International Board and current chair of the PDK Foundation Board.

Visit their website at: www.ic-prep.com