The chess world was rattled in late 2022 when Magnus Carlsen, the current world champion, accused Hans Niemann, a 19-year-old US chess grandmaster, of cheating using a chess-playing artificial intelligence (AI) system. Niemann had defeated Carlsen, prompting Carlsen’s accusation; Niemann asserted that he had played Carlsen fairly even though he later admitted to having cheated twice in online chess games, at the ages of 12 and 16.
A month later, a 72-page investigation report drafted by Chess.com claimed that Niemann had “likely cheated” more than a hundred times while playing online chess. But the report also said, “There is no direct evidence that proves Hans cheated at the September 4, 2022, game with Magnus.”
Cheating in chess has become a major problem, especially in the online era. Among the more-than 500,000 accounts that Chess.com has terminated for cheating, more than 500 belonged to titled players (titling is a mark of skill). By the beginning of 2024, the site expects to close more than a million accounts.
How can you tell when a player has cheated? First, researchers build a statistical model using the database of millions of finished chess matches. Then they estimate the probability that a human player’s move will coincide with a move made by a chess engine using the fitted model.
This is somewhat like a DNA crime-scene analysis for every chess player in the world. Chess engines like Leela Chess Zero and Stockfish aren’t only better players than their human counterparts (on average) but also play differently. Stockfish has an Elo rating of more than 3,500, compared to Carlsen’s 2014 Elo score of 2,882, the highest that a human has ever achieved. Additionally, engines’ playing styles may well be from another planet because they’re developed differently than humans develop their styles. So the likelihood of cheating is said to increase when the correlation between a player’s moves and those of chess engines increases.
By feeding records of Niemann’s games into chess engines, some experts discovered that Niemann had played a lengthy series of AI-recommended moves in tournament games and that his tactics were frequently similar to those of a computer. But some experts contended that the onboard movements in actual games of many players could resemble those of an AI, since human players’ training, preparation, and practices are now affected by these engines as well.
The Carlsen-Niemann dispute may finally be decided in court: Niemann has sued Carlsen, Chess.com and chess prodigy Hikaru Nakamura, who also accused Niemann of cheating in online games, for $100 million for defamation. It will hardly be the first instance of statistics being crucial to legal proceedings. There are numerous instances in the USA, the UK and other countries where statistical theories – primarily those related to calculating probabilities – have been applied in both good and bad ways.
The Sally Clark case
Using statistics in court needs the utmost caution and expertise. An infamous criminal case from the UK involving a woman named Sally Clark is a prime example of how the use of false statistics resulted in an injustice.
Following the untimely deaths of two of her infant children from sudden infant death syndrome (SIDS) on separate occasions, Clark was accused of murder. A paediatrician said that the probability of a random SIDS death when the mother is older than 26, affluent, and a nonsmoker, is 1 in 8,543. So the probability of two such deaths, the expert continued, was computed as 1/(8,543^2), or 1 in 73 million. Clark was promptly convicted in 1999.
But the Royal Statistical Society disagreed and said there was “no statistical basis” for the paediatrician’s figure. In fact, the paediatrician had committed the ‘prosecutor’s fallacy’ by wrongly considering the two deaths to be independent. When Ray Hill, a mathematics professor at the University of Salford, examined additional data in 2002, he concluded that the chance of a second child dying of SIDS given that a first child had died of SIDS might be as high as only 1 in 60! Clark was thus released from jail in 2003.
In a 2011 paper, Norman Fenton, a professor of risk information management at Queen Mary, London, wrote, “Most common fallacies of statistical reasoning can be avoided by applying Bayes’ theorem, a rule that allows the evidence to be weighted.”
Let’s say a crime scene sample has yielded a partial DNA profile matching the equivalent portions of Swami’s profile with a random match probability of 2 in 1,000. So the prosecutor proclaims it’s 99.8% likely that Swami committed the crime because only 0.2% of people can have such a DNA match. Consider, however, that there were 10,000 people who could have been at the crime site. So Swami is just one of about 20 expected matching sources. Instead of 99.8%, then, the probability of Swami having committed the crime is merely 5%.
(Note that this method assumes that each of the 10,000 potential sources has an equal prior probability of having been the source.)
At a lecture in July 2021, Justice Lady Rose of the UK Supreme Court said, “There are some areas where humans are particularly fallible at making use of statistics to take rational decisions. An important one is in assessing risk and probability.”
Carlsen has expressed a belief that cheating is “an existential threat” to chess. It might be tempting, against this backdrop, to see the future of this 1,500-year-old game lying even in part in the hands of the Carlsen-Niemann case, specifically in the proper use of statistics and their interpretation. But there will be several ways to calculate and interpret them, just as the case itself can swing either way.
For example, according to analysis by an anonymous Chessbase user called gambit-man, Niemann has an unusually high number of games with 100% engine correlation. Niemann’s defence might be that his play is far less computer-like than Carlsen’s has been in the recent past.
There is a metric called centipawn loss: it measures how much worse a player’s moves were compared to the engine’s top choice. A lower value indicates a closer match to the engine’s choice. There’s another metric called depth: the number of forthcoming moves by a single player that a chess engine tries to predict. Compared to the open-source chess engine Stockfish (v. 15) at depth 18, Niemann’s and Carlsen’s centipawn loss scores are 25.6 and 16.9, respectively.
So does Niemann win the argument or does Carlsen?
It’s hard to say. Perhaps we will never know for sure if Niemann really cheated because statistical analyses only suggest whether cheating may have occurred; they don’t provide absolute verdicts. Experts will screen every aspect of these analyses – including their statistical rationales, propriety and interpretation – and based on that fashion similarly valid arguments and counter-arguments.
The only thing of which we can be reasonably certain is that whoever wins the case, an honest game of chess needn’t hang in the balance – but not for the reasons Carlsen is concerned about.
Atanu Biswas is professor of statistics, Indian Statistical Institute, Kolkata.