28 Aug 2018

Who is really affected by the SAT security breach?

Submitted by Karl Hagen
College Board's response to the compromised August SAT is unfolding the way I predicted:

This statement strikes me as both typical and silly. We know the test was compromised. That doesn't require College Board's confirmation. What does require a response is why they departed from their previous practices.

In reusing this particular test, in this context, I submit that College Board has violated the Standards for educational and psychological testing. These standards codify the practices set out jointly by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education. College Board is a member, and so these are its own self-proclaimed standards:

Standard 5.6: "Reasonable efforts should be made to assure the integrity of test scores by eliminating opportunities for test takers to attain scores by fraudulent means."

We may argue on the margins over what constitutes a "reasonable effort," but this isn't a close call: College Board's practices here were unreasonable. Their previous pattern had been to reuse tests originally given in the US for lower-volume test dates and international ones. In this case, they took a test that had already been used internationally and used it on a major test date in the US. To my knowledge, this is the first time they've done that. Given the fact that tests given internationally are frequently compromised, a reasonable person could have predicted problems even if the test hadn't been floating around on the open Internet for months. And since the test was leaked, as even a cursory google search would show, there is no excuse.

The "quality control" steps that College Board refers to in this announcement are statistical tests to see if they can identify cheaters. There are other, non-psychometric ways to detect cheating, namely direct observations of behavior in the testing room, but garden-variety cheating that goes on every test and that's separate from the issue here. As I explained briefly last time, these tests may identify some low-hanging fruit, but they are certainly not going to identify everyone who had access to this test ahead of time.

How much are the test takers affected?

As with the problems with the June SAT, I'm seeing a lot of outrage on social media over the test, not all of which is well-informed outrage. (You're right to be upset, but I want you to be upset for the right reasons.) It's important to note that the effect a leaked test has on various test takers is not uniform. So it's worth considering separately how different groups are affected by the security breach.

No Exposure

Most students will not have seen the test ahead of time. For these, students, the test was just like any other. Unlike the June SAT, there don't appear to be any technical flaws with the test. Some people on social media have been fretting that the scores of these students will be depressed because of the cheaters, but that worry is misplaced. The score scale is not recalculated each administration. It was developed the first time the test was given (in June 2017), and hasn't changed since. That means that the individual scores of these students will still be valid. (In passing, I'd like to not that most of what you read on the SAT subreddit about score scales is wrong. I've written an explainer on how score scales really work that you can find here.)

However what is threatened is the validity of comparing scores between different students. If you're one of these students, the 1400 that you get still reflects your honest work. But the scores of other students for the same test aren't necessarily valid, and most of those invalid scores won't be detected statistically. Because test scores are not the sole determining factor in US college admissions, it's impossible to know how severe the problem will be in practice, but it's easy to envision a scenario in which students with inflated scores could take the admissions spots of some honest students who were on the bubble but lost out because of the test-score differential.

Previously Exposed

Another group of students took the test twice: once either in June or October of last year, and then again this August, but did not see the test outside of that context. Many international students who took the test in October and then flew to the US for this test date will be in that category. I'm calling this group "exposed" because while they saw the questions previously they did not get a chance to pore over the individual questions after the fact.

You might expect that these students would benefit from the repetition, but it turns out that such students actually do not get any real benefit over unexposed test takers. This may seem counterintuitive, but there's been significant research on the subject. (See, for example, Feinberg and Haist (2015), "Repeat Testing on Credentialing Exams: Are Repeaters Misinformed or Uninformed?" Educational Measurement: Issues and Practice, v. 34.) Examinees may remember that they've seen the material before, but they still tend to make the same mistakes. In fact, even though repeat test-takers of all sorts see, on average, a score increase, those who repeat on the same form see a lower increase than those who repeat with a new form. See the article cited above for some explanation as to why that is.

In short, the students who, through no fault of their own, saw the test a second time won't get an unfair advantage, and they may even be slightly harmed.

Exposed and Studied

Students who were given the leaked test as part of their test preparation differ from the "exposed" group because they didn't merely take the test. Afterwards they had a chance to look carefully at the problems and discuss the answer choices, either among themselves or with a tutor. This is the group who really threaten the validity of the scores. Because they have a chance to go over the items at their leisure and understand why each answer is right or wrong, their performance is unlike that of those students who were merely exposed to the test. They have a much better chance of remembering many, if not all, of the details of the test. Studies such as the one I cited above do not apply to this scenario and you cannot infer that there will be minimal impact. Indeed, there's another standard that directly addresses this case.

Standard 13.11: "In educational settings, test users should ensure that any test preparation activities and materials provided to students will not adversely affect the validity of test score inferences."

In the commentary on this standard, it is specifically stated that "[w]hen inappropriate test preparation activities occur, such as teaching items that are equivalent to those on the test, the validity of test score inferences is adversely affected."

This is the group I worry most about. It is potentially very large, and does not just consist of the students from Asia who are taking the brunt of the complaints. Not only are there unscrupulous test-prep outfits in the US too, but there are also plenty of students studying on their own or in groups who seek out every bit of authentic material they can find, regardless of the legality of its source. Those students didn't expect that they would get a test they'd already seen. They just wanted more practice. But regardless of their motives, their scores are not a trustworthy reflection of their true ability. They're also unlikely to be caught by College Board's psychometric screens for cheating. Because they didn't know ahead of time that this particular test would be used, they didn't go into the test room having memorized the pattern of choices. They would be re-solving the test using what they remembered to help. That is, most of them are still likely to make some mistakes, just fewer than they otherwise would have, and not necessarily the same ones on the leaked answer key. For those who've already taken the SAT or PSAT, there may be a suspicious score jump, and it's possible that College Board could ask such people to retake the test, but industry standard practice is not to cancel scores solely because of a large increase. There needs to be other information to support a suspicion. And in any case this screening doesn't affect first-time test takers at all.

As a result, there are likely to be a large number of invalid scores for this test that cannot be detected by College Board, making comparisons with the honest students fraught.

Mindless Cheating

The final category are the most blatant cheaters. In this group I'm imagining those who came into the test having memorized the answers and copied them down by rote, or who had the answers texted to them and were sneaking a look during the exam. Because the test was leaked, you don't need to assume any elaborate conspiracy involving stealing test booklets ahead of time for this to occur. All that has to happen is that a test-taker on the East Coast identifies the test as one that was previously used, reports that fact to a co-conspirator during a break, and then that person simply looks up the key for the test and sends it to the cheaters.

I actually worry about this group least in a practical sense, because while their scores will be grossly inflated more than any other group (1) there are likely much fewer of them and (2) they are much more likely to get caught than the previous group, particularly if they were given the key from the leaked tests without modification. The particular key for this leaked test contains a pattern of mistakes that would provide a distinctive fingerprint for anyone who tried to use it.

College Board has few good options here to clean up this mess. Invalidating the entire test day would harm a large number of honest students. Not doing so gives an unfair advantage to those who studied the test beforehand. We must not forget, though, that College Board got itself into this mess by making a moronic decision whose outcome was entirely predictable.

Update 8/29/18: In a late addition to this USA Today article on the topic, College Board says that they're going to increase the pace of test development and look harder for those spreading pirated material. I haven't seen an official announcement to this effect, but this is probably as close to a public acknowledgment that there was a problem with the August test as you're going to get. More tests is the best way to mitigate the problem. I just hope their quality control doesn't suffer too much. That's something they've had trouble with ever since they revised the test in 2016, and a faster pace to development is only going to make that headache worse.