LibGuides: Generative AI & Legal Research : Hallucinations and reliability of AI Generated information

Hallucinations

Hallucinations are outputs from LLMs that deviate from facts or contextual logic. Hallucinated information includes false fakes, cases that do not exist, cases that do not cover the correct area of law or have been superseded, or incorrect statements of the law. Hallucinations often appear correct because the LLMs are designed to produce coherent sentences responding to the provided query. An LLM has no ability to understand the underlying reality provided by the prompt. An LLM uses its statistical model in relation to the given prompt to create an output that it predicts best satisfies the prompt.

LLMs can only mimic human language and reasoning based on their training data. To combat hallucinations requires persistent fact-checking and verification. LLMs are solid tools for document analysis or boosting your writing creativity, but they struggle to provide facts. And, the more obscure the topic, the more likely the LLM is to hallucinate information to complete the prompt.

What causes hallucinations?

When given a prompt, an LLM will synthesize terabytes of training data to provide an answer. While processing and generating an answer, the LLM is essentially in a "black box" where programmers cannot see how it is evaluating its training data and forming an answer to the query due to the massive amount of information being processed. Humans can only see what is entered into the LLM and the program response. Unsurprisingly, sometimes the LLM produces inconsistent or incorrect information due to misunderstanding or misevaluating its training data and, therefore, hallucinating.

Therefore, hallucinations generally occur for three reasons:

1. Data Quality: LLMs are trained on large corpora of data that may contain inconsistencies. These inconsistencies may confuse the LLM and may generalize from the data without being able to evaluate its relevancy and trustworthiness.

2. Generation Method: LLMs may hallucinate due to the complexity and randomness inherent in the process of generating new content. The AI uses a form of educated guesswork, drawing on patterns it has learned from its training data.

3. Input context: context refers to the information given to the model as an input prompt. Poorly crafted input prompts can confuse the LLM, providing inaccurate information.

Input context is the easiest--and only--way users can prevent hallucinations by using good prompt writing habits and techniques. For the other two users should remember to fact check all information provided by an LLM for accuracy, currentness, and relevance.