Performance Assessments Vs. Multiple-Choice Answers

by Anthony Petrosky

Anthony R. Petrosky is the associate dean of the School of Education at the University of Pittsburgh and holds a joint appointment as a professor in the school and the English Department.

Performance assessments were originally created to assess people’s abilities to do complex activities that are important to their jobs. Surgeons collect electronic portfolios of their surgeries on lifelike mannequins or in computer simulations. Architects create plans for buildings in extensive portfolios of drawings. Designers make drawings to specifications for clothes and present them in portfolios to buyers. Flight control radar operators demonstrate their skills through computer simulations of challenging scenarios with multiple planes flying at various speeds.

Performance Assessments of Students’ Work
Performance assessments for students usually ask them to do the sort of academic work that defines their subject. In English, they write essays and often collect them into portfolios to demonstrate their work over time and with different kinds of writing tasks. In mathematics, students respond to word problems that represent big mathematical concepts, such as ratio and proportion, by solving the problems and explaining their solutions in writing.  These mathematic performances might also be collected in portfolios to demonstrate work since it often takes multiple performances to assess conceptual success.  

The emphasis in performance assessments is usually on multiple demonstrations of work samples that can evaluated or judged, often as they represent the work over periods of time, against criteria for excellence and satisfactory performance. Such criteria, when applied to complex performance and portfolios of work samples, can be diagnostic for teachers in ways that are impossible with multiple-choice tests. Teachers can learn, for instance, that over time and on multiple essays, students struggle with explaining how textual evidence connects to the points they’re making in their essays.

Performance Versus Identification and Repetition
The strength of performance assessments comes from their ability to capture authentic examples of work samples rather than proxies for work samples. We know, for example, that students’ scores on multiple-choice tests of editing skills in writing generally correlate with students’ writing abilities, but these tests are proxies for the students’ writing—not authentic assessments of the writing. Many assessment experts think that such multiple-choice tests of editing are not even good indications of a students’ editing abilities because they are being asked to edit sentences not of their own making.

The Result of Multiple-Choice Testing
The end result of multiple-choice testing in the U.S. has been deeply documented—it’s led to months of test preparation in which students complete exercises that look like the ones they’ll take on the tests. Instruction has bent and shaped itself to prepare students for these tests, in which the emphasis in class is on identifying and repeating information rather than creating or making summaries or explanations or arguments or even poems in writing or in talk.

The nation turned away from its large-scale experiments with performance assessments in writing during the 1990s because of the expense involved in gathering and preparing teachers to rate portfolios against criteria. The National Board for Professional Teaching Standards (NBPTS), to its credit, has maintained its focus on assessing teachers for board certification through sophisticated portfolios of practice, and the National Assessment of Education Practice (NAEP), our nation’s best assessment effort to date, continues to solicit writing samples from students that are scored against relevant criteria.

Lately, and partly in response to the ways that instruction has been driven to look like multiple-choice testing, performance assessments are making a come back. The two major national assessment consortia—The Partnership for Assessment of Readiness for College and Careers (PARCC) and Smarter Balanced—have incorporated sophisticated exercises in their assessments that ask students to write evidence-based explanations and arguments based on their readings of single and multiple texts on the same topics.

The Big Take Away
The result of our history thus far with multiple choice and performance assessments is that we know evaluation drives instruction to look like the evaluation. Thus, it’s no wonder that students tested over and over on multiple-choice tests end up being taught in curricula that looks like multiple-choice tests. They spend a lot of time identifying and repeating information instead of creating and applying information in their talking and writing.

If we want students to become sophisticated in their written and spoken explanations and arguments, in their abilities to engage in evidence-based critique and debate, then we need to assess these sophisticated skills with performance assessments. Multiple-choice testing—no matter how slickly conceived for computer applications, including for adaptive testing in which students are given more or less sophisticated items based on their answers, will not get us there. It will keep us and our students and our teachers in the same old identification and repeating ruts that NCLB institutionalized in testing and consequently in instruction.