Day 30 - Context Window in LLMs: How Much Can AI Remember?

By Srinivas Kadiyala - Thursday, December 25, 2025

Context:

After learning about Tokenizers in my previous post, I wondered: Does an AI model remember the entire conversation in a chat? The answer is no — it has a memory limit called the Context Window.

What I Learned:

Context Window:
- The maximum number of tokens a model can process at once.
- Acts as the model’s short-term memory for a single interaction.
What Fits in the Context Window:
- Input prompt
- Model’s response
- Chat history
Fixed Size per Model:
- GPT-4o: 128,000 tokens
- GPT-4-Turbo: 128,000 tokens
- Claude 3 Opus/Sonnet: 200,000 tokens
- Older models (e.g., GPT-3.5): 4,096 tokens

Pro Tip:

Calculate cost from tokens: Example: Total tokens: 470 GPT-4-Turbo input price: $10 per 1M tokens Cost = (470 / 1,000,000) * $10 = $0.0047 The chat cost about half a cent!

Why It Matters for QA / AI Testing:

Context window limits affect how much history the model can consider when generating responses.
Testers need to design prompts and workflows that fit within token limits for consistent results.
Understanding token-based pricing helps manage AI testing costs effectively.

My Takeaway:

LLMs don’t have infinite memory — they work within a context window. Better prompt design and token awareness = smarter, cost-efficient AI usage.

Search This Blog

My Journey of Testing, Learning & Growing!

Day 30 - Context Window in LLMs: How Much Can AI Remember?

What I Learned:

Why It Matters for QA / AI Testing:

My Takeaway:

Popular Posts

Website Testing using Chrome Web Dev Tools

JMeter Producing Error: Windows RegCreateKeyEx(...) returned error code 5

Understanding about Contract Testing