How do AI checkers actually work?
Okay, you little cheaters. This is the article you’ve been looking for. I know that probably 99.99% of you are here because you think this article will tell you how to bypass content checkers. I know that, because when I did a search on “How do AI checkers work?” I didn’t get technical explanations about how a technology works.
No, I got hundreds of YouTube videos by people who can’t be bothered to do their own writing, showing others how to cheat using AI, which involves jumping through every hoop possible to convince AI checkers that AI-generated text isn’t AI-generated text.
Also: I tested 7 AI content detectors – they’re getting dramatically better at identifying plagiarism
This process is called “humanizing” text, and cheaters use it to sprinkle in a little bit of biological special sauce into the cold and humanless verbiage generated by the great Landru in the sky.
Thomas Edison, way back before computers, artificial intelligence (AI), streaming, and Marvel movies, once said, “There is no expedient to which a man will not go to avoid the real labor of thinking.”
You wanna know how to “humanize” your AI text, do you? Write it yourself!
But hey, of the thousands of you visiting this article to help you scam your editor or your teacher, there are probably six of you who care about the technology. For you folks, let’s discuss the actual meat and potatoes of how an AI checker works.
The technologies used by AI checkers
Let’s start with the obvious. An AI isn’t going to have written text like I did above. I’ve added in a couple of pop-culture references, accused readers of being scammers, used a bunch of sarcasm, thrown in a relevant yet fairly random quote, and meandered a bit before getting to the point.
Also: I tested 7 AI content detectors – they’re getting dramatically better at identifying plagiarism
That wasn’t me trying to “humanize” some AI content. That’s just how I write, and thankfully, why many of you come back each week to read my columns. But it’s also what AI checkers tend to look for: writing that doesn’t match predictable patterns.
AI checkers today use a variety of techniques, starting with text analysis. As with all prompt queries, the submitted text is broken down into tokens and then normalized, removing punctuation and other non-essential indicators. They then use a technique called vectorization, which converts the text into a mathematical hash code for comparison to other text.
Also: Grammarly to roll out a new AI content detector tool. Here’s how it works
Of course, all this normalization could remove clues of reprehensible human behavior, like using two spaces after a period or misusing the Oxford comma. But fortunately, AI checkers have more tools up their virtual sleeves — or, as in Landru’s case, under their holographic togas.
See that? AIs aren’t going to generate tangentially relevant callbacks to earlier bits of their shtick (as I just did with holographic togas). AI content checkers use contextual awareness to examine the context in which various phrases are used in order to identify common phrases and assign them weight. Uncommon contextual connections, like some of those above, will be rated with more weight as human-written.
This also applies to semantic analysis of text, where AI checkers attempt to understand the meaning of text rather than just examine sequences of words. This allows them to balance contextual awareness with an understanding of what the writer is trying to say.
Also: How does ChatGPT actually work?
One reason that the OpenAI content detector (once it’s re-released) is expected to perform so well is that it will be able to run summaries of the meaning of a submitted piece against the entire ChatGPT knowledge base, in order to see if the output of the suspect text shows a degree of similarity to what ChatGPT itself would have produced.
From an algorithmic perspective, content checkers may use n-grams, which are sequences of words, to extract context and meaning. Grammatical structure can also be examined to find patterns that reflect content written by an AI.
Cross-checking the internet
Then there’s the comparison process, where whatever text is being checked gets compared to the entire internet. This can be done using traditional search algorithms, which look for exact matches, paraphrased text, and even fuzzy matches (near matches, synonyms, and rephrased content).
OpenAI also has an advantage here. Given that ChatGPT was trained on pretty much all accessible human knowledge, innuendo, fiction, and any database that stood still long enough, comparisons for similar text or text with similar base n-grams can be added to the content scoring process.
This challenge would favor OpenAI, Google, Microsoft, Meta, and other data-rich players in the AI field.
Also: Beware of AI ‘model collapse’: How training on synthetic data pollutes the next generation
Other content checkers probably don’t have as vast a database for comparison. As a quick test, I tried a number of paragraphs copied from various articles I read today (some old, some new) and dumped them into plagiarism checkers. Almost all of them failed to note that text I copied and pasted from other sites was, in fact, copied and pasted from other sites. So, clearly, for content checkers to be able to use online content comparison, they have to have a big enough pool to pull comparison data from.
Once the comparison is done, most content checkers will provide some kind of report to the user. Ideally, this would be more than just a numerical score or the phrase “likely human written.” Ideally, it would show the areas of the document the content checker deemed suspect so that evaluators can look further into aspects of the content that may have been generated by an AI.
We’re seeing improvement
Understand, though, that AI checkers are improving. When I first did my test of AI checkers in early 2023, they all mostly failed to differentiate human from AI-generated text. But, by mid-2024, about half of them got it right. So even the technology I’m talking about here will change over time.
That’s especially true because this is an arms race. As AI checkers get better, some AI services will sprinkle in human foibles and styles to help the cheaters cheat.
Also: The best AI chatbots of 2024: ChatGPT, Copilot, and worthy alternatives
Then, AI checkers will get better and look for that.
Then, the AI cheating services will add more techniques.
And on it goes.
What about you? Are you a teacher or an editor trying to make sure the submission you got was written by the person who submitted it? Or are you a troublesome little cheater trying to find more ways to get out of work and create fake content? In either case, what has your experience been with AI content checkers? Let us know in the comments below.
You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.