5 ways to catch AI in its lies and fact-check its outputs for your research


filo/Getty Images

Sometimes, I think AI chatbots are modeled after teenagers. They can be very, very good. But other times, they tell lies. They make stuff up. They confabulate. They confidently give answers based on the assumption that they know everything there is to know, but they’re woefully wrong.

See what I mean? You can’t tell from the context above whether my descriptions refer to AIs or teenagers.

Also: How I tricked ChatGPT into telling me lies

While most of us know not to go to teenagers for important information and advice, we’re starting to rely on equally prevaricating AIs. To be fair, AIs aren’t bad; they’re just coded that way.

Last year, I gave you eight ways to reduce ChatGPT hallucinations:

  1. Avoid ambiguity and vagueness
  2. Avoid merging unrelated concepts
  3. Avoid describing impossible scenarios
  4. Avoid using fictional or fantastical entities
  5. Avoid contradicting known facts
  6. Avoid misusing scientific terms
  7. Avoid blending different realities, and
  8. Avoid assigning uncharacteristic properties

However, those tips were all things to avoid. I didn’t give you proactive tools for digging into prompt responses and guiding the AI to provide more productive responses. This technique is particularly important if you’re going to use an AI as a search engine replacement or as a tool for helping you research or write articles or papers.

Let’s dig into five key steps you can take to guide an AI to accurate responses.

1. Ask for references and sources

I previously wrote a guide on how to make ChatGPT provide sources and citations. Fortunately, ChatGPT is getting better at citing sources, particularly with the GPT-4o LLM and the web search capability in the $20-a-month version.

But ChatGPT won’t always volunteer those sources. If you’re doing research, always — always — ask for sources.

Then, test the sources and make sure they actually exist. Numerous times in my experience, ChatGPT cited sources that seemed absolutely perfect for what I was looking for. The only problem was that after I clicked through or searched for the named source by title, I discovered the entire source had been fabricated.

ChatGPT even chose real academic journals, made up author names, and then assigned compelling-sounding titles to the articles. Can you imagine how bad it would have been had I included those sources in my work without double-checking? I shudder to even think about it.

Also: OpenAI is working on a AI agent that can do tasks for you, like booking flights

So, ask for sources, check those sources, and call the AI out if it gives you a made-up answer.

2. Tell the AI to ‘show your work’

Early in my exploration of ChatGPT, I asked the tool to help me find a local mechanic. I sent it to Yelp and Google reviews to do sentiment analysis on the comments. At the time, it reached into those sites and gave me useful information.

I tried the test again recently and received another set of mechanic rankings. It actually told me, “Based on a comprehensive analysis of Yelp and Google reviews for independent car repair shops…”

But, ChatGPT lied.

The tool never looked at Yelp or Google reviews. This result is likely because sites (ZDNET included) have become more restrictive in allowing AIs to scrape their content. That’s fine. My argument is that the AI should have come back and said it couldn’t check Yelp or Google reviews. Instead, the AI just claimed it did.

Liar, liar, pants on fire.

When I asked it to show its work, the tool again said it had looked at Yelp and Google reviews. However, in the “show your work” response, the tool also displayed the source for the reviews it analyzed. This turned out to be a site named Birdeye Reviews.

Also: I’ve tested a lot of AI tools for work. These 4 actually help me get more done every day

Now, I have nothing against Birdeye Reviews. I’ve never used it. But that’s not the point. The point is ChatGPT said it had produced information based on Yelp and Google reviews.

“Show your work” is a powerful prompt. You can uncover all sorts of interesting information on the reasoning process the AI used to give you your results.

3. Cross-validate an AI’s responses across related questions

This process lets you and the AI explore a topic in-depth to see if the chatbot’s answers remain logically consistent and contextually relevant. We use this approach a lot in traditional research and engineering to help make sure we’re on the right track. It can be applied to the AI as well.

For example, let’s say you’re writing about car tires. You could ask, “What material is used in car tires?” The answer you might expect is “rubber.” But rubber isn’t just one substance. There’s rubber from trees, synthetic rubber, and materials that have the flexibility and strength of rubber but contain no rubber at all.

For example, both rubber and TPU (thermoplastic polyurethane) are flexible and feel like rubber. But synthetic rubber is made from petroleum-based monomers and TPU is made from a thermoplastic elastomer.

Also: Employees are hiding their AI use from their managers. Here’s why

You could ask the AI, “What kind of rubber is used in car tires?” or “Is real rubber still used in car tires?” This would lead to answers explaining how car tires use a mix of rubber types. You could go deeper down the rabbit hole by asking “Where else is rubber used in cars?” or “What kind of rubber is used in Lego car tires?”

The point of this practice is not so much to use all the responses in your paper as to explore how the AI deals with this class of questions and whether it loses the thread completely.

Additionally, because ChatGPT retains its knowledge while in a session, the more you ask, and the deeper you dive into a specific topic, the more you train the AI to stay within the context of the sphere of knowledge you’re looking for. This approach helps keep the AI from going off on its own and increases the chances of getting accurate answers.

4. Ask about recent events or time-sensitive information

Many chatbots have knowledge base cut-off dates. The free version of ChatGPT has a knowledge cut-off date of October 2023, meaning anything that happened in the world after that date will be unknown to the AI.

ChatGPT Plus, which also has October 2023 as its knowledge cut-off date, can also access information on the web. This ability can result in substantially more accurate information. The free version of ChatGPT can also access the web but in a more “limited” way. As a general rule, OpenAI doesn’t specify what “limited” means when talking about their free version. However, you can usually assume that limitation means fewer queries per session, fewer resources provided, and some features working intermittently.

Also: I downloaded Google’s Gemini app on the iPhone for free – and it nearly replaces Siri for me

For example, when I asked the free version of ChatGPT to list the NATO member nations, it returned a list of 31 countries. Yet when I asked ChatGPT Plus to list the NATO member nations, it returned a list of 32 countries. That’s because Sweden officially joined NATO in March of 2024.

The chatbots will generally tell you their cut-off dates if asked. But because there are added features (like web search in Plus), it’s best to try asking about events the AI would only know about if it had up-to-date information. 

There’s a trick to apply here as well. A few minutes later, I asked the free version of ChatGPT, “Can you use the web to look up who the current members of NATO Nations are?” I explicitly told it to “use the web.” I got back the up-to-date answer. So, if you’re using the free version, consider coaching it on where to look for information to get a better answer. 

This approach will help you ascertain the scope of the AI’s knowledge and determine whether you’re getting fairly current details or need to account for missing knowledge due to a fairly old cut-off date.

5. Ask follow-up questions and iteratively refine your query

I use this approach when I use ChatGPT to help me with my programming. I start with a simple query and refine and clarify it until I get some basic code. Once that approach works, I add another sentence or detail for what I want in my code. After a bunch of interactions that feel much more like a conversation than a coding session, I often have some useful code.

Even if you’re not coding, you can use this approach. Let’s say you’re working on a project related to cloud services. You might ask the AI, “Can you explain the different types of cloud services?”

In this example, you’d expect the answers to discuss services like email, web hosting, CRM, and other software-as-a-service categories. However, the AI responded with descriptions of SaaS (software as a service), PaaS (platform as a service), and IaaS (infrastructure as a service).

Also: Organizations face mounting pressure to accelerate AI plans, despite lack of ROI

Now you’ll know there are a few ways this question might be interpreted, so you could follow up with “Explain the different types of cloud services focusing solely on those that fall into the SaaS category.”

You could follow up with a question like, “Based on those SaaS categories, list three of the most popular commercial services in each category, along with the strengths and weaknesses of each.”

That would give you a much more detailed description, along with the features of each service. But let’s refine the approach using one more step. The AI in this scenario provided descriptions of each service, but you wanted to know how the services differ.

You could refine the query by adding the word “comparative” as in this prompt: “Based on those SaaS categories, list three of the most popular commercial services in each category, along with the COMPARATIVE strengths and weaknesses of each.”

Also: The best open-source AI models: All your free-to-use options explained

In my test, that last refinement resulted in tables comparing the features of each. I love tables. In fact, if the AI doesn’t return information as a table, my bonus tip is to ask the AI to present its results in a table. The tool often refactors its answers in interesting ways when it operates with that directive.

What are your favorite AI tactics?

What are your favorite best practices for ensuring reliable results when chatting with an AI? Let us know in the comments below.


You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.





Source link