Collaborate with generative AI to improve classroom assessments

Large language models can help teachers quickly create questions aligned to standards. Here’s how.

At a Glance

Creating assessment questions aligned to standards is a time-consuming task for teachers, but large language models such as ChatGPT can help.
Teachers can follow a three-step process to create assessments:
- Ask ChatGPT to break standards into measurable targets.
- Determine how much time to spend on each target.
- Ask ChatGPT to generate questions at different depths of knowledge for each target.
At each step, teachers should review results from ChatGPT and ask for revisions as needed.

Meaningful assessments are crucial to the teaching-learning cycle because they help teachers see how well students are learning and where they might need extra support (Drost & Levine, 2023; Schmoker, 2018). Having a clear picture of each student’s progress enables teachers to adjust teaching methods to better suit student needs. That, in turn, contributes to a responsive educational environment that promotes continuous growth and success for all. Without assessments, it is often unclear what students have learned because student learning is unpredictable (Wiliam, Fisher, & Frey, 2024).

Despite this importance, for the last 30 years, research has found that designing effective classroom assessments is a weakness among teachers (DeLuca & Bellara, 2013; DeLuca, Chavez, Bellara & Cao, 2013; Popham, 2020; Stiggins & Conklin, 1992). Although many teachers have not been trained in effective classroom assessment practices (Drost & Levine, 2023; Schmoker, 2023), we must continue to improve in this area so that we can boost students’ growth and achievement (Cunningham, 2019; Darling-Hammond et al., 2005; Knips, Lopez, Savoy & Laparo, 2023; Popham, 2020; Schmoker, 2023).

What if there was a tool that could help you create effective assessments with meaningful questions so that you could draw more accurate conclusions about student progress and achievement? This tool could help you create diverse, engaging assessments with great questions while you focus on the more personal aspects of teaching. Enter ChatGPT (Drost & Shryock, 2025).

What if there was a tool that could help you create effective assessments with meaningful questions so that you could draw more accurate conclusions about student progress and achievement?

This powerful tool can be your new best friend in crafting questions, providing instant feedback, and even generating creative prompts to get your students thinking. Let’s dive into how ChatGPT could make your teaching life easier as a collaborative partner. It can bring its large information base to your assessment routine.

As easy as 1-2-3

Designing effective classroom assessments with meaningful questions is easy when you follow a three-step process:

Unpack standards.
Determine the percentage of time to spend on content.
Create questions and assign depth.

As you move through this process, you can keep track of this information using a table of specifications (TOS). It will show the unpacked standards in relation to learning objectives, depth of knowledge, and the content that has been covered in class. The TOS will help you map out what your assessment will ask, how it will be asked, and to what degree each topic or skill will be assessed. This will ensure that the assessment is comprehensive and balanced, covering all the important areas of the curriculum in the right proportions. Additionally, it will provide a clear framework for both teachers and students, clarifying what is expected and reducing the likelihood of bias or gaps in the assessment.

As we walk through the steps to collaboratively design a sixth-grade mathematics assessment with ChatGPT, we’ll place the resulting information in the example TOS in Table 1.

Step 1: Unpack the standards.

The first step is to break down broad educational standards into specific, measurable, and achievable components (Drost & Levine, 2015). Because many standards do not identify the specific knowledge, skills, and dispositions students need to master, you will need to break them down into discrete components called objectives, learning targets, or claims. Having well-defined learning targets will help you design specific questions in Step 2.

For example, take these two Common Core State Standards for sixth-grade math:

RP.1 “Understand the concept of a ratio and use ratio language to describe a ratio relationship between two quantities.”
RP.2 “Understand unit rates and use them to solve problems.”

We asked ChatGPT to break these down into measurable targets, a task that can often take a significant amount of time. We used the following prompt:

Suggest learning outcomes for the following standards: 6.RP.1 “Understand the concept of a ratio and use ratio language to describe a ratio relationship between two quantities.” And “6.RP.2 Understand unit rates and use them to solve problems.”

From this prompt, ChatGPT generated a series of 17 outcomes. After reviewing the choices ChatGPT generated, we selected the most suitable learning outcomes for our TOS, making sure to have a balance of procedural skill and fluency, application, and conceptual understanding — key math principles needed in every unit (Almarode et al., 2019).

#	Learning Outcome (Claim)	Standard	% of Time	DOK 1	DOK 2	DOK 3	DOK 4
1	Describe ratios as relating parts to a whole.	6RP1	5%	1
2	Describe ratios as relating parts to a part.	6RP1	5%	2
3	Identify part-to-part ratios when the parts being compared do not comprise the whole (3 squares, 2 triangles, and 4 circles has a part-to-part ratio of 3:2, 2:4, etc. as one shape is discounted in the comparison).	6RP1	10%	3	4
4	Represent ratios using tape diagrams, double number lines, and ratio tables.	6RP1	10%		4	5
5	Solve real-world problems involving ratios.	6RP1	15%		6	7	8
6	Create equivalent ratios from a given ratio.	6RP1	15%	9	10
7	Describe unit rate.	6RP2	10%	11		12
8	Identify common or prevalent real-world unit rates and explain what those rates determine.	6RP2	15%		13
9	Use unit rates to solve real -world problems.	6RP2	15%		14		8

Table 1: Table of specifications for a sixth grade math assessment

Step 2: Determine percentage of time to spend on content.

Now that we have unpacked our standards and created learning outcomes, we can think about the percentage of time we’re going to spend on instruction and learning for each learning outcome. Ideally, the time spent on each item indicates its relative importance during instruction. Mapping this out in advance ensures that assessments cover all important instructional priorities or skills adequately, prevents assessments from focusing too much on minor topics or skills instead of more important ones, and ultimately helps ensure you’ve created a balanced assessment. This important step is often overlooked, but it is necessary to ensure that assessments align with instructional priorities and accurately reflect the emphasis placed on different skills (Popham, 2020).

For the example of the sixth-grade math standards, we placed percentages into Table 1 based on our review of an open-access sixth-grade math series. You can customize these percentages in any way that you want, but it should reflect the emphasis you placed on each outcome during your teaching and your district or state’s standard expectations.

Step 3: Create aligned questions and assign depth.

Now that we know our distribution of time, we can create aligned questions at various depths of knowledge (DOK). Norman Webb’s DOK (1997) framework can be used to categorize the complexity of educational tasks and assessments. It focuses on the cognitive demand required to successfully complete a task, rather than just its difficulty level. By planning for depth of knowledge in assessment tasks, you gain a more comprehensive understanding of student learning and enhance the quality and effectiveness of the assessment itself (Hess, 2023).

Understanding DOK

In Webb’s framework, items at levels 1 and 2 are routine tasks that have specific right answers that require students to go about solving the question the same way each time. Level 1 tasks involve just one step/skill, and level 2 tasks involve multiple steps/skills. Items at levels 3 and 4 are non-routine tasks that cannot be solved following a step-by-step procedure and that often occur in new or unfamiliar situations. The difference between levels 3 and 4 comes from whether the activity is teacher-directed (DOK 3) or student-directed (DOK 4).

While Bloom’s taxonomy focuses on the behavior students need to produce (the verb), cognitive demand describes the way in which they need to interact with the content (the noun and adjectives).

Adapting from the published work of Hess (2018), we can compare the DOK levels to game shows. Jeopardy focuses on the recall of facts and is a level 1 task, while Top Chef, which requires contestants to demonstrate multiple skills following different procedures, is a level 2. A level 3 task, like the TV show Survivor, provides everything on the island for its competitors. But on Shark Tank, the contestants must bring everything themselves to the situation (level 4).

Flowchart illustrating the Depth of Knowledge (DOK) framework with steps from routine and non-routine tasks: Routine Tasks has two options: Option 1 is One-Step? then Level 1. Rote & Recall (Jeopardy). Option two is Multiple Skills? then Level 2. Applied Concepts (Top Chef),
Non=routine tasks has two options. Option 1 is Teacher Directed? then Level 3 Strategic Thinking (Survivor),
Option 2 is Student-directed, then Level 4 Augmented Thinking (Shark Tank). — Figure 1. Representation of DOK levels

While Bloom’s taxonomy focuses on the behavior students need to produce (the verb), cognitive demand describes the way in which they need to interact with the content (the noun and adjectives). Even when the verb in two tasks is the same, the cognitive demand is different. For example, consider these two tasks:

The ratio of students to teachers in a school is 4:1. Describe this ratio.
In a science experiment, the ratio of salt to water in a solution is 2:7. Describe how you would create a solution with 90 grams of water while maintaining the same ratio.

For both tasks, students may write (the behavior) a statement to explain their thinking. However, what they have to include in their statements (the demand) is significantly greater in the second than the first.

Creating questions

Creating questions that meet various depths of knowledge can be challenging for anyone (Francis, 2016). Collaborating with generative AI makes it much more manageable. Begin by prompting ChatGPT to create a set of questions and tasks that will gather evidence of student thinking at DOK 1, 2, and 3 for each learning standard. Here is an example of a prompt using Learning Outcome 6 from our TOS:

Create a set of questions and tasks using DOK 1,2, and 3 that will gather evidence of student thinking on the following learning outcome; Create equivalent ratios from a given ratio. Organize your response in a table and include a column that indicates if it is measuring procedural fluency, application, or conceptual understanding. This is from Ohio Learning Standard Math 6RP1.

Figure 2, an excerpt of what ChatGPT produced (OpenAI, 2024), is useful in helping us think about the specific kinds of questions we want students to engage with.

DOK Level	Question/Task	Description	Evidence Type
DOK Level 1: Recall and Reproduction	Define a Ratio	What is a ratio? Give an example of a ratio.	Conceptual Understanding
	Identify Equivalent Ratios	Which of the following are equivalent ratios to 3:4? a) 6:8 b) 9:12 c) 2:3	Procedural Fluency
	Simple Calculation	If the ratio of apples to oranges is 2:3, what is the equivalent ratio when there are four apples?	Procedural Fluency
DOK Level 2: Skills and Concepts	Generate Equivalent Ratios	Create two equivalent ratios for the given ratio 5:7.	Procedural Fluency
	Reasoning with Ratios	Explain why the ratio 8:12 is equivalent to the ratio 2:3.	Conceptual Understanding
	Ratio in Context	A recipe requires 4 cups of flour for every 5 cups of sugar. If you want to make a larger batch using 8 cups of flour, how many cups of sugar will you need?	Application
DOK Level 3: Strategic Thinking	Complex Problem Solving	Given the ratio 3:5, if you have 15 units of the first quantity, how many units of the second quantity would make the ratios equivalent?	Application
	Application to Real-World Situations	You are mixing paint and the ratio of red to blue is 4:9. If you have 36 parts of blue paint, how many parts of red paint do you need to maintain the same ratio? Show your work and explain your reasoning.	Application
	Comparing and Contrasting Ratios	Compare the ratios 7:10 and 14:20. Are they equivalent? Justify your answer with calculations and a clear explanation.	Conceptual Understanding

Figure 2. Sample questions

It’s important to carefully review what ChatGPT produces. When we reviewed our example, we felt some questions weren’t fully aligned to the standards, wouldn’t apply to our students, or included miscategorized evidence types.

If you find flaws in the results, you can ask ChatGPT 4 to revise. For example, ChatGPT used paint mixing as the example for an application to real-world situations. We didn’t think that would fit our students. So we asked it to revise the question with a focus on computer gaming and got this question instead:

You are designing a game level, and the ratio of health potions to mana potions is 4:9. If you have 36 mana potions, how many health potions do you need to maintain the same ratio? Show your work and explain your reasoning.

You can also ask generative AI to change the format of the question. For example, for the question on creating two equivalent ratios, we asked ChatGPT to make this a “select all of the equivalent ratios” question. It gave us this question:

Select all of the equivalent ratios for the given ratio 5:7.

A: 10:14

B: 15:21

C: 20:28

D: 25:30

E: 30:42

After revising our questions to fully align with our expectations, we updated our TOS chart with the question numbers under the DOK columns to show our distribution. You’ll note that some objectives have multiple questions with different depths assigned to them. This is done to ensure the alignment to the original standard and to honor our time spent on the material.

Integrating ChatGPT as a collaborative partner in creating effective classroom assessments can revolutionize the way teachers design and implement their assessments.

Our final step, if this were to be a traditional summative assessment, would be to assign point values to various questions, remembering that higher depth of knowledge does not necessarily equal higher points and vice versa. If this were to be formative in nature, we could simply incorporate the questions into various activities or gamify them using Kahoot! or Quizziz.

Collaboration in the assessment process

Integrating ChatGPT as a collaborative partner in creating effective classroom assessments can revolutionize the way teachers design and implement their assessments. As teachers continue to strive for work-life balance, generative AI tools can reduce the time they spend on designing questions and ensure questions are more strongly aligned to the standards. Because the tool can assist in developing questions that span various depths of knowledge and cater to the individual population of students that teachers are working with, individualized assessments can also be created in a fraction of the time.

As teachers focus on meeting the diverse needs of their students, we encourage educators to consider using ChatGPT as a collaborative tool, as it can serve as a valuable resource, fostering creativity, precision, and adaptability in classroom assessments. Embracing this technology not only streamlines the workload for teachers but also enriches their assessment practices, ultimately promoting deeper learning and mastery of the content for students.

References

Almarode, J.T., Fisher, D., Assof, J., Hattie, J., & Fray, N. (2019). Teaching mathematics in the visible learning classroom. Corwin.

Cunningham J. (2019). Missing the mark: Standardized testing as epistemological erasure in U.S. schooling. Power and Education, 11 (1), 111-120.

Darling-Hammond, L., Banks, J., Zumwalt, K., Gomez, L., Gamoran Sherin, M., Griesdorn, J., & Finn, L. (2005). Educational goals and purposes: Developing a curricular vision for teaching. In L. Darling-Hammond & J. Bransford (Eds.), Preparing teachers for a changing world: What teachers should learn and be able to do (pp. 169-200). Jossey-Bass.

DeLuca, C. & Bellara, A. (2013). The current state of assessment education: Aligning policy, standards, and teacher education curriculum. Journal of Teacher Education, 64 (4), 356-372.

DeLuca, C., Chavez, T., Bellara, A., & Cao, C. (2013). Pedagogies for preservice assessment education: Supporting teacher candidates’ assessment literacy development. The Teacher Educator, 48, 128-142.

Drost, B. & Levine, A. (2015). An analysis of strategies for teaching standards-based lesson plan alignment to preservice teachers. Journal of Education, 195 (2).

Drost, B. & Levine, A. (2023). An analysis of strategies for teaching and assessing standards-based assessment design to preservice teachers. Journal of Education, 203 (3).

Drost, B. & Shryock, C. (2025). Using generative AI for instructional design. In M. Stevkovska, M. Klemenchich, & N. Ulutas (Eds.), Reimagining intelligent computer-assisted language education. IGI Global.

Francis, E. (2016). Now that’s a good question: How to promote cognitive rigor through classroom questioning. ASCD.

Hess, K. (2018). A local assessment toolkit to promote deeper learning: Transforming research into practice. Corwin.

Hess, K. (2023). Rigor by design, not chance: Deeper thinking through actionable instruction and assessment. ASCD.

Knips, A., Lopez, S., Savoy, M., & Laparo, K. (2023) Equity in data: A framework for what counts in schools. ASCD.

OpenAI. (2024). ChatGPT (GPT-4). [Large language model].

Popham, W.J. (2020). Classroom assessment: What teachers need to know (9th ed.). Pearson.

Schmoker, M. (2018). Focus: Elevating the essentials to radically improve student learning (2nd ed.). ASCD.

Schmoker, M. (2023). Results now 2.0: Untapped opportunities for swift, dramatic gains in achievement. ASCD.

Stiggins, R.J. & Conklin, N.F. (1992). In teachers’ hands: Investigating the practices of classroom assessment. State University of New York Press.

Webb, N.L. (1997). Criteria for alignment of expectations and assessments in mathematics and science education (Research monograph no. 6). Council of Chief State School Officers.

Wiliam, D., Fisher, D., & Frey, N. (2024). Student assessment. Corwin.

This article appears in the Summer 2025 issue of Kappan, Vol. 106, No. 7-8, pp. 22-27.

ABOUT THE AUTHORS

Bryan Drost

Bryan Drost is the executive director for educational services for the Rocky River City Schools in Ohio and a faculty member at several Ohio colleges.