Fact Check...Always - Artificial Intelligence | ChatGPT for Instructors

Fact Check

Fact-Checking is Always Needed

Why?
ChatGPT sometimes makes stuff up; the official AI term for this is hallucination. ChatGPT sometimes hallucinates because these systems are probabilistic (which incorporate randomness), not deterministic (information that is known to be true and accurate because it is supplied by people directly or is personally identifiable).

Which Models are Less Prone to Hallucinations?
GPT-4 (the more capable model behind ChatGPT Plus and Microsoft Copilot) has improved and is less prone to hallucination. According to OpenAI, it's "40% more likely to produce factual responses than GPT-3.5 on our internal evaluations." But it's still not perfect. So verification of the output is still needed.

ChatGPT Often Makes up Fictional Sources
One area where ChatGPT usually gives fictional answers is when asked to create a list of sources. See the Twitter thread "Why does ChatGPT make up fake academic papers?" for a useful explanation of why this happens. For help in what to do with a hallucinated citation, see: ChatGPT gave me citations that I can't find. What should I do?

Making Models More Truthful
There is progress in making these systems more truthful by grounding them in external sources of knowledge. Some examples are Microsoft Copilot and Perplexity AI, which use internet search results to ground answers. However, the Internet sources used could still contain misinformation or disinformation. But at least with Copilot and Perplexity you can link to the sources used to begin verification.

Scholarly Sources as Grounding
There are also systems that combine language models with scholarly sources. For example:

Elicit
A research assistant using language models like GPT-3 to automate parts of researchers’ workflows. Currently, the main workflow in Elicit is Literature Review. If you ask a question, Elicit will show relevant papers and summaries of key information about those papers in an easy-to-use table.
Consensus
An academic search engine, powered by AI, but grounded in scientific research. It uses language models (LLMs) and purpose-built search technology (Vector search) to surface the most relevant papers. It synthesizes both topic-level and paper-level insights. Everything is connected to real research papers. Source material used in Consensus comes from the Semantic Scholar database, which includes over 200M papers across all domains of science.