Day 3 - Understanding Quartiles in Pandas vs Traditional Math
Context:
While learning Pandas, I stumbled upon an interesting difference in how quartiles are calculated compared to what we learned in school.
What I Learned:
- In school, quartiles were calculated by splitting sorted data into halves and finding medians manually.
- Pandas uses
describe()which relies on NumPy’s percentile logic. - NumPy applies linear interpolation (default =
"linear"), making results slightly different for small or odd-sized datasets.
Why It Matters for QA / AI Testing:
- Statistical nuances can impact data validation and interpretation in AI models.
- Understanding how libraries compute metrics ensures accurate testing and reporting.
- Helps avoid confusion when comparing manual calculations with automated outputs.
My Takeaway:
Pandas is optimized for large-scale statistics, but knowing these subtle differences is key for accurate analysis.