GBA article robot sorter image normal size

Behind the Scenes of ETS Research: Exploring Game-Based Assessments

Suchi Rudra

To the restless middle school student, game-based learning activities might seem light-hearted and fun. But ETS researchers know that games can be serious, powerful tools.

As games play an increasing role in the classroom, game-based assessments (GBA) may improve teaching and learning by providing teachers and students with immediate feedback on which skills students have mastered and which they still need to learn. Researchers in ETS’s Cognitive, Accessibility, and Technology Sciences Center are discovering that GBAs can offer meaningful results — as long as they also keep learners engaged.

Two members of this team, research scientists Malcolm Bauer and Tanner Jackson, are learning how to leverage the potential for games to engage students and inform teachers.

Piecing together the evidence

One example produced from game-based assessment research is a futuristic adventure game called Mars Generation One: Argubot Academy, which was created in collaboration with GlassLab, initially a part of the Institute of Play. The game, which takes place in 2054 in the first colony on Mars, is based on an argumentation learning progression developed by researchers working on ETS’s Cognitively Based Assessment of, for, and as Learning (CBAL®) project, a K–12 formative assessment system research initiative.

Aligned with Common Core Standards, the iPad®-based game teaches and tests middle school students’ argumentation skills by having them build debating robots (“argubots”) based on a certain statement, known as a claim, and a supporting piece of evidence, and that can battle other robots. While developing the game, Bauer, Jackson and their team tried to help designers at GlassLab identify what activities should be integrated and which pieces of information should be tracked within the game to help assess the level of the students’ argumentation skills, including:

  • When the student finds a new piece of evidence, does the student keep it or trash it?
  • Whenever the student tries to build a robot, which pieces of evidence do they match withclaims, and how strong were those pairings?
  • Which type of argument scheme did the student use?
  • Did the student use one solution strategy or employ a diversity of strategies?
  • How effective are the student’s attacks and defenses when the student’s robots are on the battlefield?

ETS researchers can then interpret data from the game play, matching it to the learning progression, and provide those insights as formative feedback to students as well as teachers at both the individual and classroom levels.

The students were given an argumentation assessment after playing the game to explore if their skills in argumentation were reflected in their game performance.

Patterns of their game play correlated with their performance on the assessment, so if they performed well on the game, they tended to do better on the test.

Malcolm Bauer

Although Mars Generation One was designed to be used as an integral part of classroom materials in an argumentation unit, ETS researchers are also exploring GBAs that could be used as a quick check-in to assess skills, a way to promote discussion, or as a group activity.


We are trying to explore not only content and grade level, but also how those factors are paired with a particular game type.

Tanner Jackson

Making a solid argument, for example, is a complex task that requires students to use a lot of evidence. It generally requires a longer game for teachers to get enough data to fully understand a student’s skill level. A short game, comparable in length to just one or two multiple-choice items, may not be the right game type to measure performance on complex tasks but has other benefits.

Zooming in with micro games

Quick-to-play games, called micro games, can measure very particular skills and be played repeatedly, providing researchers with important information to separate out indications that students are learning the game from indications they are learning the skill.

Among other projects, ETS and the Game Lab at American University in Washington, D.C., have created two micro games that also focused on argumentation skills. The resulting prototypes, Robot Sorter and Text Persuasion, have been played by a sample of over 300 middle school students across the country.

Straightforward and with typical game features such as points, graphics, and robot components, Robot Sorter asks students to identify whether a piece of evidence matches a claim. It’s a task that can be completed 10 to 30 times in the span of just a few minutes.

The other game, Text Persuasion, challenges students to use a phone texting simulator to convince friends to attend a party. The students must guide invitees through potential conflicts that could prevent their attendance. Those who successfully persuade invitees earn emojis such as smiley faces and dancing bananas. Failure to persuade results in a frowny face emoji.

When Jackson ran a study comparing the two very different micro games, Text Persuasion prevailed as more fun and authentic.

“You don’t have to have a lot of game features,” he says. “It’s really about matching the fun, interactive playfulness of these games with the assessment elements you’re targeting.”

What Jackson and others are investigating is whether scores and evidence combined from multiple micro games can offer an accurate representation of a complex set of skills and competencies, like the understanding of a student’s overall argumentation skills.

“Can we separate out these small skills and then combine them? It’s an open question,” Jackson says. “It could be that these skills are so highly interactive and complex that when you break them apart, they don’t actually fit together as a whole.” In that case, the combined scores from multiple micro games would not give teachers a clear understanding of a student’s argumentation skills.

Could assessments like these be used for summative purposes? That’s really not the intention at this point in time, Jackson says.

As with all digital assessments, the use of GBAs as higher stake summative assessments has raised several concerns, including risks to privacy by the school’s recording of student performance and results, and how to compare student performance over a variety of electronic learning environments across the country.

Moreover, “we don’t know enough yet about user behavior to use GBAs as higher stake assessments,” he says. “If a student answers incorrectly, is it because the student didn’t understand the game or doesn’t understand the concept?”

Jackson is working with a fellow research scientist at ETS, Blair Lehman, to study another angle of student performance that could eventually help GBA researchers better understand user behavior: the emotional reaction of the student during game play.

After students play different versions of the micro games, they watch videos of their performance to note how they were feeling about the choices they had made and whether they felt frustrated or confused. The researchers analyze the students’ responses to discover and fix parts of the games that are unclear or confusing.

Designing differently for assessment

Many studies on educational game design are in agreement on a critical consideration in game design: matching the format of the game with the subject matter that is the game’s focus. Not all game formats will work with all subjects.

But Bauer points out that the researchers at ETS are adding another constraint to the game design process—an assessment portion.

While there are plenty of players in the learning game market, the main difference with the games ETS is developing is the organization’s ability to provide strong empirical or psychometric evidence for the educational claims of each game.

By collaborating intensively and continuously sharing feedback, researchers are discovering that sweet spot where game mechanics align with learning, provide the evidence needed to assess what is being learned and identify the strategies being used for learning.

“We have a design process that brings the components together and aligns all the pieces in a coherent way,” Bauer says.

For more information, read Three Things Game Designers Need to Know About Assessment.

Suchi Rudra is a writer specializing in education, sustainable design, travel and entrepreneurship. Her work has been featured in The New York Times, The Guardian, Slate, and BBC Travel, and she recently published an e-book, Travel More, Work Less.