Day 29 - How LLMs See Text: The Role of Tokenization
Context:
I always wondered how AI models like ChatGPT “read” text. Turns out, they don’t read words like humans — they see numbers.
What I Learned:
- Tokenizer:
- A core component of every LLM.
- Converts words into tokens (pieces of words) that the model can process as numbers.
- Why It Matters:
- Every LLM API cost is based on tokens, not words.
- More tokens = higher cost.
- A prompt that costs $0.01 with one model might cost $0.015 with another due to tokenization differences.
- Efficiency Tip:
- Writing shorter, well-structured prompts reduces token count and cost.
- Resource:
- OpenAI provides a great tool to visualize tokenization: Tokenizer Tool.
Why It Matters for QA / AI Testing:
- Tokenization impacts cost and performance in AI-driven testing workflows.
- Testers need to optimize prompts for efficiency without losing clarity.
- Understanding tokenization helps predict API usage and budget planning.
My Takeaway:
LLMs don’t see words — they see tokens. Efficient prompt design saves cost and improves performance.