601 West Lombard Street
Baltimore MD 21201-1512
Reference: 410-706-7996
Circulation: 410-706-7928
Although many responses produced by AI text generators are accurate, AI also often generates misinformation. Oftentimes, the answers produced by AI will be a mixture of truth and fiction. If you are using AI-generated text for research, it will be important to be able to verify its outputs. You can use many of the skills you’d already use to fact-check and think critically about human-written sources, but some of them will have to change. For instance, we can’t check the information by evaluating the credibility of the source or the author, as we usually do. We have to use other methods, like lateral reading, which we’ll explain below.
Remember, the AI is producing what it believes is the most likely series of words to answer your prompt. This does not mean it’s giving you the ultimate answer! When choosing to use AI, it’s smart to use it as a beginning and not an end. Being able to critically analyze the outputs that AI gives you will be an increasingly crucial skill throughout your studies and your life after graduation.
A typical AI model isn't assessing whether the information it provides is correct. Its goal when it receives a prompt is to generate what it thinks is the most likely string of words to answer that prompt. Sometimes this results in a correct answer, but sometimes it doesn’t – and the AI cannot interpret or distinguish between the two. It’s up to you to make the distinction.
AI can be wrong in multiple ways:
Sometimes an AI will confidently return an incorrect answer. This could be a factual error or inadvertently omitted information.
Sometimes, rather than simply being wrong, an AI will invent information that does not exist. Some people call this a “hallucination,” or, when the invented information is a citation, a “ghost citation.”
These are trickier to catch, because often these inaccuracies contain a mix of real and fake information. In the screenshot above, the source looks real, but does not, in fact, exist. While the authors are all real people, and did write an article together in 2014, the exact title does not exist, and no article on autism exists in that issue of the Journal of Pediatrics. The closest match is instead this one, from the journal, Vaccine:
Taylor LE, Swerdfeger AL, Eslick GD. Vaccines are not associated with autism: an evidence-based meta-analysis of case-control and cohort studies. Vaccine. 2014 Jun 17;32(29):3623-9. doi: 10.1016/j.vaccine.2014.04.085. Epub 2014 May 9. PMID: 24814559.
When ChatGPT gives a URL for a source, it often makes up a fake URL, or uses a real URL that leads to something completely different. It’s key to double-check the answers AI gives you with a human-created source.
Currently, if you ask an AI to cite its sources, the results it gives you are very unlikely to be where it is actually pulling this information. In fact, neither the AI nor its programmers can truly say where in its enormous training dataset the information comes from.
Even an AI that provides real footnotes is not providing the places information is from, just an assortment of webpages and articles that are roughly related to the topic of the prompt. If prompted, the AI will provide the exact same answer but footnote different sources.
This matters because an important part of determining a human author’s credibility is seeing what sources they draw on for their argument. You can go to these sources to fact-check the information they provide, and you can look at their sources as a whole to get insight into the author’s process, potentially revealing a flawed or biased way of information-gathering.
You should treat AI outputs like fact-checking a text that provides no sources, like some online articles or social media posts. You’ll determine its credibility by looking to outside, human-created sources (see lateral reading).
AI can accidentally ignore instructions or interpret a prompt in a way you weren’t expecting. A minor example of this is ChatGPT returning a 5-paragraph response when it was prompted to give a 3-paragraph response, or ignoring a direction to include citations throughout a piece of writing. In more major ways, though, it can make interpretations that you might not catch. If you’re not too familiar with the topic you’re asking an AI-based tool about, you might not even realize that it’s interpreting your prompt inaccurately.
The way you ask the question can also skew the response you get. Any assumptions you make in your prompt will likely be fed back to you by the AI.
If you cannot take AI-cited sources at face value and you (or the AI programmers) cannot determine where the information is sourced from, how are you going to assess the validity of what AI is telling you? Here you should use the most important method of analysis available to you: lateral reading. Lateral reading is done when you apply fact-checking techniques by leaving the AI output and consulting other sources to evaluate what the AI has provided based on your prompt. You can think of this as “tabbed reading”, moving laterally away from the AI information to sources in other tabs rather than just proceeding “vertically” down the page based on the AI prompt alone.
With AI, instead of asking “who’s behind this information?” we have to ask “who can confirm this information?”
Lateral reading can (and should) be applied to all online sources, but you will find fewer pieces of information to assess through lateral reading when working with AI. While you can typically reach a consensus about online sources by searching for a source’s publication, funding organization, author or title, none of these bits of information are available to you when assessing AI output. As a result, it is critical that you read several sources outside the AI tool to determine whether credible, non-AI sources can confirm the information the tool returned.
Critical thinking about AI responses goes beyond determining whether the specific facts in the text are true or false. We also have to think about bias and viewpoint – two things we keep in mind when reading human authors, but you might be surprised to learn we have to keep in mind with AI as well.
Any text implicitly contains a point of view, influenced by the ideologies and societal factors the author lives with. When we critically think about news articles, books, or social media posts out in the wild, we think about the author’s viewpoint and how that might affect the content we’re reading. These texts that all of us produce every day are the foundation of generative AI’s training data. While AI text generators don’t have their own opinions or points of view, they are trained on datasets full of human opinions and points of view, and sometimes those viewpoints surface in its answers.
AI can be explicitly prompted to support a particular point of view (for instance, “give a 6 sentence summary of vaccines from the perspective of someone who is anti-vax”). But even when not prompted in any particular way, AI is not delivering a “neutral” response. For many questions, there is not one “objective” answer. This means that for an AI tool to generate an answer, it must choose which viewpoints to represent in its response. It’s also worth thinking about the fact that we can’t know exactly how the AI is determining what is worth including in its response and what is not.
AI also often replicates biases and bigotry found in its training data. It is very difficult to get an AI tool to arrive at the fact that people in positions of authority, like doctors or professors, can be women, without explicit prompting from a human. AI image editing tools have edited users to be white when prompted to make their headshot look “professional”, and can sexualize or undress women, particularly women of color, when editing pictures of them for any purpose.
AI also replicates biases by omission. When asked for a short history of 19th-century medicine, ChatGPT only includes medical history in the Western world, unless specifically prompted "in Asia". This is the case even if you ask in other languages, like Chinese and Arabic, so the AI tool is not basing this response on the user’s presumed region.
These are more obvious examples, but they also reveal the decision-making processes that the AI is using to answer more complex or subtle questions. The associations that an AI has learned from its training data are the basis of its “worldview,” and we can’t fully know all the connections AI has made and why it has made those connections. Sometimes these connections lead it to decisions that reinforce bigotry or give us otherwise undesirable responses. When this happens in ways we can see, it prompts the question: how is this showing up in ways that aren’t as obvious?