So, you’ve got a bunch of data—maybe it’s test scores, sales numbers, survey responses, or how many cups of coffee you drink per week (no judgment). The big question is: what do you do with all that information? That’s where descriptive statistics swoop in to save the day!
Descriptive statistics are like the CliffsNotes for your data. Instead of staring at hundreds or thousands of individual numbers, you use descriptive stats to summarize, simplify, and spot patterns. Think of them as the highlights reel of your dataset.
Here’s what descriptive statistics help you figure out:
- Where your data tends to hang out (Is it mostly low numbers? High numbers? Somewhere in the middle?)
- How spread out your data is (Are all the numbers close together or all over the place?)
- What shape your data takes (Is it lopsided, symmetrical, super pointy, or pancake-flat?)
In this chapter, we’re going to break descriptive stats into three bite-sized chunks:
- Measures of Central Tendency – This is just a fancy way of saying “what’s the average or typical value?”
- Measures of Dispersion – How much do the numbers vary from each other?
- Shape of the Distribution – What does your data look like when you plot it out?
Ready to turn chaos into clarity?
Measures of Central Tendency – Mean, Median and Mode
Imagine you’ve got a group of friends, and you want to figure out who’s the “average” one—maybe in terms of height, income, or number of Netflix shows binged in a week. That’s where measures of central tendency come in. These are stats that give you a quick idea of the “centre” of your data set—the spot where your numbers tend to hang out.
There are three main players in this central-tendency game: mean, median, and mode. Let’s break them down.
Mean – The Classic Average
The mean is what most people think of when they hear “average.” You just:
- Add up all your numbers.
- Divide by how many numbers you’ve got.
Best used when:
- Your data is symmetrical (no weird outliers)
- You want a precise mathematical centre
- Your data is interval or ratio (e.g., height, temperature, income)

Watch out!
The mean is sensitive to outliers. One ridiculously high or low value can throw it way off.
Median – The Middle
The median is the number that falls right in the middle when your data is sorted from smallest to largest.
How to find it:
- Odd number of values? Pick the middle one.
- Even number of values? Average the two middle numbers.
Best used when:
- You have ordinal, interval, or ratio data
- You want to ignore outliers or skewed values
- Your data has extreme highs or lows
Bonus tip:
The median is great for reporting income or property prices, where one billionaire shouldn’t ruin the average for everyone else.
Mode – The Most Common Value
The mode is the value that shows up the most often. It’s the only measure you can use with categorical data (like favourite colours or pizza toppings).
Example:
Favourite pets: Dog, Cat, Dog, Fish, Dog → Mode = Dog
You can have:
- One mode (unimodal)
- Two modes (bimodal)
- More than two (multimodal)
- Or no mode (if no value repeats)
Best used when:
- You have nominal or ordinal data
- You want to know what’s most common
- You're dealing with survey data or labels
Measures of Dispersion – How Spread Out Is Your Data?
Okay, so you’ve found the “average” of your data—awesome! But here’s the catch: knowing the centre doesn’t tell you everything.
Imagine two classes both have an average test score of 75. In Class A, most students scored right around 75. In Class B, half bombed with 50s and the other half crushed it with 100s. Same mean—very different stories.
That’s why we need measures of dispersion. These tell us how spread out or clustered the data is.
Why Dispersion Matters
- Helps you understand variability
- Tells you how consistent the data is
- Crucial for comparing groups or spotting unusual data points
We’ll walk through the main ones:
- Range
- Interquartile Range (IQR)
- Variance
- Standard Deviation
- Coefficient of Variation (CV)
Range – From the Lowest to the Highest
The range is the simplest measure of dispersion. It tells you the distance between the smallest and largest data points in your dataset.

Quartiles and Interquartile Range (IQR)
Quartiles are just special values that split your dataset into four equal parts—kind of like slicing a cake into four chunks, where each slice holds 25% of your data.
Here’s how it breaks down:
- Q1 (First Quartile): This is the 25th percentile—meaning 25% of the data is below this value.
- Q2 (Second Quartile): This is just the median (50th percentile).
- Q3 (Third Quartile): This is the 75th percentile—meaning 75% of the data is below this value.

The Interquartile Range (IQR) is the range between Q1 and Q3. It tells us how spread out the middle half of the data is. Because it focuses on the middle, it ignores extreme values (a.k.a. outliers) that might skew other measures like the range or standard deviation.

Variance – Measuring Squared Differences
Variance tells you how spread out your data is by looking at the average of the squared differences from the mean. But there's a twist: the formula depends on whether you're working with a sample or the entire population.
Sample vs Population
Sample means you’re using only part of the data. Population is when you’re using all the data.
- If every data point is close to the mean, the variance will be small.
- If the data points are all over the place, the variance will be large.
And since we square the differences, larger gaps between data points and the mean get a lot more emphasis.
Sample Variance: When You Have Only Part of the Data

Population Variance (when you have every single value in the group)

Why are the denominators different?
Great question! It's all about bias and being fair when you’re working with just a sample.
- In a population, you're working with every value, so dividing by N (the actual count) gives you the real average.
- But in a sample, you're working with just a portion of the data, and using the sample mean makes the variance a bit too small. So, we divide by n-1 to correct for that. This is called Bessel’s correction, and it helps make the sample variance an unbiased estimate of the true population variance.
Think of it like this: we’re “stretching” the sample variance just a bit to make up for the fact that we don’t have all the data.
Standard Deviation – Making Variance Easy to Understand
Variance is useful, but it’s a little awkward to interpret because it's in squared units. For example, if your data is in kilograms, the variance is in kilograms squared, which doesn’t make much intuitive sense.
That’s where standard deviation comes in! It’s just the square root of the variance, so it puts the measure back into the original units of the data.

Coefficient of Variation (CV)
The Coefficient of Variation (CV) is a measure that compares the standard deviation to the mean. It tells you how much variability there is relative to the average.

Why is CV Important?
CV is a super handy tool when:
- You want to compare the spread of two different datasets with different units (e.g., comparing exam scores to blood pressure readings).
- You’re analysing risk vs. return (hello, finance!) — where a high CV can mean more volatility.
- You're working with multiplicative processes or skewed data, where standard deviation alone might not give the full picture.
It works best with ratio data (like height, weight, sales — where zero means none of the quantity exists).
Relative Measures
While most statistics like mean, median, standard deviation, etc., describe the dataset as a whole, relative measures do something different:
They describe how a single data point relates to the rest of the data.
So instead of summarizing the entire dataset, relative measures answer questions about individual values, like:
- How far is this score from average?
- How well did this person do compared to others?
- Is this value unusually high or low?
Let’s take a closer look at the most common ones — these are all about individual data points:
Z-Score
How many standard deviations away from the mean is this value?
- If someone scores a z = +2, they’re 2 standard deviations above the average — that’s pretty impressive!
- A z = -1.5 means they’re 1.5 SDs below average.
Z-scores standardize different datasets so we can compare values from different contexts — like comparing a student’s test score in English vs. Math.
Percentile
What percentage of the data falls below this point?
- If a child is in the 75th percentile for height, they’re taller than 75% of kids their age.
- Percentiles don’t tell you the average height — just where one child stands relative to the rest.
Quartiles, Deciles etc.
These are just named percentiles.
For example:
- Q1 = 25th percentile
- Q3 = 75th percentile
- D9 = 90th percentile
T-Scores
T-scores are similar to z-scores but are specifically used when dealing with small sample sizes

While z-scores standardize data to have:
- Mean = 0
- Standard Deviation = 1
The T-score formula standardizes data to have:
- Mean = 50
- Standard Deviation = 10
Both Z-Scores and T-Scores are considered normalised scores.
Conclusion
Understanding the numbers behind the story is like holding a map to hidden treasure. We've unravelled measures of central tendency to find the "typical" value, explored dispersion to see how data spreads, and uncovered relative measures to add context to it all. These tools don’t just quantify; they bring meaning to chaos.
But here’s the kicker: numbers don’t just lie flat on a page; they have shape, patterns, and quirks that can reveal even more. Curious? Stick around for Part 2, where we’ll decode the fascinating shapes data can take and what they say about the world around us. Get ready to see data like never before!
If you’re starting your statistics or data science learning journey, check out our beginner’s course on statistics designed to cover fundamental concepts like these and more.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra.