17 Apr 2014

On Formula Scoring

Submitted by Karl Hagen
This is the second installment of my commentary on the changes to the SAT. Part 1 is here

There are a few changes to the new SAT that I know people will be talking a lot about but which actually matter less than you might think they would to the test taker, although they matter quite a bit to the people making the test. Of these, one has received much press attention since the initial announcement: no more deduction for wrong answers.

I know what most high school students are thinking: no more deduction for incorrect answers. Yay! I'll have a higher score! But hold on before you break out the sparkling cider. (If you're directly concerned about your score on the SAT you're too young for champagne.) It's true that your raw score will likely be higher. But so will everyone else's. And it's not the raw score that's reported to the colleges. It's the scaled score, and that will adapt to the new higher raw scores.

Dropping the "penalty" for incorrect answers, or formula scoring as it is technically known, will chafe a few people. There has been a long argument among specialists over whether or not it's appropriate to calculate raw scores this way. To understand what's at stake here, you need to know both how formula scoring works and what the motivation for introducing it was.

Formula scoring was an attempt to address a fundamental issue with multiple-choice questions: there's a chance that you can guess the right answer with no true understanding of the question at all. Imagine that we have a group of students. Some of them are risk-averse: they are reluctant to answer a question at all if they don't know the answer. These students leave tough questions blank. Others in the group are willing to take risks. When they don't know the answer, they guess and move on. With simple number-correct scoring, the risk-averse students are at a disadvantage, the argument goes, because the risk-taking students will guess, getting some additional number of points and raising their scores without merit. The correction for wrong answers in formula scoring is meant to create a disincentive for random guessing.

The amount of the correction is almost always chosen to be $-\dfrac{1}{k-1}$ where k is the number of answer choices. The logic for that number is that, mathematically, the expected value of random guessing is equal to the expected value of leaving the same number of problems blank. For the SAT, all multiple-choice problems on the current test have 5 choices, so the correction is -0.25. If you guess on 5 questions, the most likely outcome is 1 correct answer and 4 incorrect ones, for a raw score of 0, just as if you had omitted them.

The argument for formula scoring taps into our sense of fairness. We have an intuitive sense that you shouldn't get an unearned benefit. But in the real world, it's far from clear that formula scoring actually serves to protect anyone. For one thing, it makes the game theory about guessing more complex. The optimal strategy with formula scoring is to guess if you can eliminate one or more incorrect answers. But the notion that there is a "penalty" can bias students against guessing when it is to their advantage to do so. With a number-correct scheme, the game theory is simple: always guess rather than leave blank. As long as that strategy is clearly conveyed to all test-takers, there's no solid reason to think that anyone is actually at a disadvantage.

Whether you use formula scoring or not, there is always an optimal strategy and suboptimal strategies. For the test-taker who pursues the optimal guessing strategy, the raw scores with and without formula scoring are just linear transformations. And if we're concerned about fairness, picking a scheme that results in a simpler game theory is desirable because it minimizes differences among test takers with regards to their test wisdom.

From the test maker's point of view, the claims that formula scoring produces a more reliable test are tenuous. (Studies have indicated the effect is small, if it exists at all). It also adds to the mathematical complexity of the models used to calculate score scales. It's notable that the ACT, the GRE, and many other major standardized tests do not use formula scoring. If you're interested in a more detailed account of the arguments for and against formula scoring, this article, albeit old, gives a good survey of the arguments on both sides.

Part 3