Day 22 - Why Do LLMs Hallucinate? The Role of Token Bias
Context:
I used to wonder about the term “hallucinations” in AI:
- How do they happen?
- What causes them?
- Is it just how the model works, or is there a deeper reason?
After reading and reflecting, I discovered two key concepts behind hallucinations:
What I Learned:
- Token Bias:
- Models tend to choose certain words more often than others based on training patterns, not necessarily logic.
- When given a prompt, the model calculates the most likely next word. Sometimes, it picks a word because it’s “used to” it, not because it’s correct.
- Example: Prompt: “5 of the mangoes are smaller out of 10 mangoes” The word “smaller” might make the model think of “subtract” or “minus,” even if that’s not intended.
- Hallucination:
- The visible mistake in the output — when the model gives a wrong answer.
Key Insight:
Statistical Pattern Matching → Token Bias → Hallucinations
Hallucinations are not random; they stem from how models learn and predict.
Why It Matters for QA / AI Testing:
- Understanding token bias helps testers design prompts that reduce ambiguity.
- Knowing the root cause of hallucinations aids in validating AI outputs for correctness.
- Essential for testing AI in critical workflows where accuracy matters.
My Takeaway:
Hallucinations aren’t magic errors — they’re a byproduct of statistical learning and probabilistic prediction.