Day 30 - Context Window in LLMs: How Much Can AI Remember?
Context:
After learning about Tokenizers in my previous post, I wondered: Does an AI model remember the entire conversation in a chat? The answer is no — it has a memory limit called the Context Window.
What I Learned:
- Context Window:
- The maximum number of tokens a model can process at once.
- Acts as the model’s short-term memory for a single interaction.
- What Fits in the Context Window:
- Input prompt
- Model’s response
- Chat history
- Fixed Size per Model:
- GPT-4o: 128,000 tokens
- GPT-4-Turbo: 128,000 tokens
- Claude 3 Opus/Sonnet: 200,000 tokens
- Older models (e.g., GPT-3.5): 4,096 tokens
Pro Tip:
- Calculate cost from tokens: Example: Total tokens: 470 GPT-4-Turbo input price: $10 per 1M tokens Cost = (470 / 1,000,000) * $10 = $0.0047 The chat cost about half a cent!
Why It Matters for QA / AI Testing:
- Context window limits affect how much history the model can consider when generating responses.
- Testers need to design prompts and workflows that fit within token limits for consistent results.
- Understanding token-based pricing helps manage AI testing costs effectively.
My Takeaway:
LLMs don’t have infinite memory — they work within a context window. Better prompt design and token awareness = smarter, cost-efficient AI usage.