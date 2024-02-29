February 29, 2024 01:37 pm | Updated 02:16 pm IST

In a post-truth world, the ubiquitousness of AI-generated content has blurred the lines around authenticity. And with AI chatbots like ChatGPT getting better at word prediction it has become difficult for users to sift through AI spam and determine whether it was written by a man or machine. However, there are methods as well as tools that make it possible for users to determine, to an extent, whether a chatbot was used to create content. Here are some tools and methods that can be used to identify AI-generated text.

AI text detectors

Early last year, it looked like there would be a quick fix to the problem after a batch of AI text detectors were released. However, the proposal faltered and most of the tools turned out to be fairly easy to trick. Regardless, some of these tools are capable of identifying AI-generated text to varying degrees.

GPT Zero, an app built by Edward Tian, a computer science student from Princeton, gained a fair amount of attention after Tian claimed that the app could efficiently detect whether an assignment essay was written by ChatGPT or a student. Another AI detection tool called Originality was released by edtech company Turnitin in April last year. The company had initially claimed 98% accuracy for the tool. But these claims fell flat soon.

Debora Weber-Wulff, a professor of media and computing at the University of Applied Sciences, HTW Berlin worked with a team of researchers to test the ability of 14 such AI text detection tools, like GPT Zero, Turnitin and Compilatio, to find that these tools in fact weren’t good. While they were good at detecting when a copy was written by a human (96%, on an average) , they often struggled to pick up ChatGPT-generated text which had been slightly paraphrased or rearranged. When the text was directly produced by ChatGPT, they could detect it with 74% accuracy but this dropped to 42% after the small tweaks were made.

Backlash against these tools grew after instances of false accusations that were made due to the dependency on these tools within educational institutions. In August last year, Turnitin’s tool falsely categorised a student’s assignment as AI-generated. The student from John Hopkins had to then present the study material they had used to complete the assignment along with their early drafts.

OpenAI, the maker of ChatGPT, also shuttered an AI text classifier in July last year, months into having launched it, due to its low accuracy rate. Although the company has said that it is working on mechanisms to detect AI text, there’s no word on when it will release these aids. On their FAQ page when someone raised the question around whether AI detectors worked, OpenAI simply replied, “No, not in our experience.”

Watermarking

Detecting AI-generated text could also have a watermarking approach to them. These watermark stamps wouldn’t be visible to the human eye but can be detected by computer. If embedded within large language models by the companies while training, systems could automatically detect AI text.

OpenAI is reportedly currently experimenting with this technology but most of it is still kept under wraps. During training, the team will simply insert some special words into the text that ChatGPT generates. The system would have a ‘special list’ which puts down all these extra words inserted on purpose. If a text has a high percentage of words from the special list, it is more likely that it was AI-generated.

Researchers at the University of Maryland who are working on a similar method have been continuously testing it. While the research paper on this hasn’t been published yet, studies done by the team showed that the watermarks could identify AI-generated text with near perfect reliability.

Human detection

People themselves can also train their eyes to detect AI-generated text over time. Daphne Ippolito, a senior research scientist at Google Brain has spoken about the unique punctuation patterns in AI text. This usually shows up as an overuse of common words like ‘the’, ‘it’ or ‘is’ rather than using more specific terms. Text written by humans has an inherently different character of its own and includes a much wider range of phrases and sentences. Sometimes, a piece by a writer can also have different styles of writing and use colloquial language while AI-generated text is normally filled with jargon.

Ippolito also said that opposed to common perception, typos are a very “human” indicator of text. ChatGPT, on the other hand, produces text in a way that’s too polished. An AI tool also tends to repeat certain phrases or words when asked to explain a subject at length.

There’s also usually a more surface-level feel to it. If a text is giving generic information that is easily available on the internet there is a high chance that the text is AI-generated. Text written by humans usually tends to have more insights and perspective.

Because AI writing tools also are predicting words, the words used will be very predictable. Human writing includes more words that tend to surprise the reader. Plus, writers also tend to be creative with their choices when writing an introduction to a piece or explaining a concept.