How are the mean, median and mode used to summarise the centre of a data set, and which is the most appropriate measure?
Calculate measures of central tendency, including the mean, median and mode, for both raw data and data presented in a frequency table
A focused answer to the HSC Maths Standard 2 dot point on the mean, median and mode. Finding all three from a raw list, the mean and mode from a frequency table, the mean from grouped data using class centres, and choosing the most appropriate measure when the data is skewed or has an outlier, with worked Australian examples.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
What this dot point is asking
NESA wants you to summarise a whole data set with a single "centre" value, and to know there are three of them: the mean, the median and the mode. You need to find all three from a raw list, find the mean and the mode from a frequency table, and estimate the mean from grouped data using class centres. Just as important is the judgement call NESA tests right beside the arithmetic: deciding which measure best describes a particular data set, because an outlier or a skewed shape can make the mean misleading. The calculations are short. The marks are won by setting out a clear table and by choosing the right measure with a one-line reason.
The answer
A measure of central tendency is a single number that stands for the middle, or typical value, of a data set. The three you must know are:
- Mean (): the arithmetic average, the sum of all values divided by how many there are.
- Median: the middle value once the data is put in order.
- Mode: the value that occurs most often.
Each answers "what is typical?" in a slightly different way, and they can give different numbers for the same data. The diagram below shows why that happens: when a data set has a long tail on one side (it is skewed), the mean is dragged toward the tail while the median stays near the bulk of the data.
The mean
The mean of a list is the total of all the values divided by the number of values:
For the marks there are values totalling , so . The symbol (read "x-bar") is the standard name for the mean. The mean uses every value, which makes it the most informative measure when the data has no extreme values - but it is exactly that sensitivity that makes a single outlier able to distort it.
The median
The median is the middle value of the ordered data. Always sort the data first. Then:
- if there is an odd number of values, the median is the single middle one (for values it is in position ),
- if there is an even number of values, the median is the average of the two middle values.
For (already ordered, ) the middle is the rd value, so the median is . For () the two middle values are the rd and th, and , so the median is . Because the median only cares about position, an outlier hardly moves it - which is its great strength.
The mode
The mode is the value (or category) that occurs most often. For the value occurs three times, more than any other, so the mode is . A data set can have:
- no mode, if every value occurs once,
- two modes (bimodal) or more, if several values tie for the highest frequency.
The mode is the only measure of centre you can use for categorical data, such as the most popular ice-cream flavour, because you cannot add up or order categories.
The mean and mode from a frequency table
When data is given in a frequency table you do not write out every value. Instead you add an column - each value multiplied by its frequency - and use:
where is the total frequency (how many data values there are altogether) and is the total of the column. The mode is simply the value with the highest frequency, and the median is the value at the middle position, found by counting down the frequencies. The worked set below shows the full layout.
Estimating the mean from grouped data
Sometimes data is grouped into class intervals (for example , ) and the individual values are lost. To estimate the mean you replace each class by its class centre, the midpoint of the interval:
Then you treat each class centre as the value and use the same frequency-table formula, . The answer is an estimate, because every value in a class is assumed to sit at the centre.
Choosing the most appropriate measure
Different data sets call for different measures. The deciding factors are outliers and skew:
- Use the mean when the data is roughly symmetric with no outliers, because it uses all the information.
- Use the median when there is an outlier or the data is skewed, because the median is not dragged toward extreme values. House prices and incomes are the classic examples: a few very high values inflate the mean, so the median is reported instead.
- Use the mode for categorical data, or when the most common value is what matters (a shoe shop cares about the most-sold size, not the "average" size).
The dot plot below makes the effect of an outlier visible. Most values cluster between and , but a single value of pulls the mean () to the right, well above the median and mode (both ), which stay with the cluster.
How exam questions ask about central tendency
The wording maps straight to a method:
- "Find the mean / average" means for a list, or for a frequency table.
- "Find the median" means order the data first, then take the middle (average the middle two if is even). Forgetting to order is the classic lost mark.
- "Find the mode / the most common ..." means the value with the highest frequency; be ready to answer "no mode" or "bimodal".
- "Estimate the mean" from a grouped table signals class centres, and the word "estimate" is your cue to say the answer is approximate.
- "Which measure is most appropriate?" or "why is the mean misleading?" is asking about outliers and skew: name the outlier, and choose the median with a reason.
- "The mean is larger than the median - explain" wants you to identify a high outlier or a right (positive) skew pulling the mean up.
Exam-style practice questions
Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
2021 HSC-style3 marksThe dot plot below records the number of siblings of students. The values are . (a) Find the mode. (b) Find the median. (c) Find the mean, correct to one decimal place.Show worked answer →
Mode: (it occurs five times, more than any other value).
Median: with values, average the th and th ordered values, which are and , so the median is .
Mean: the total is , and .
Markers award one mark for each correct measure. A common error is reading the median off the dot plot as the tallest column (, the mode) instead of locating the middle position. The mean is higher than the median here because of the values and in the right tail, a point worth a one-line comment if the question asks about skew.
2022 HSC-style4 marksThe grouped frequency table shows the masses, in kilograms, of parcels. | Mass (kg) | | | | | | Frequency | | | | | (a) Write down the class centres. (b) Estimate the mean mass, correct to one decimal place. (c) Explain why your answer is only an estimate.Show worked answer →
Class centres: (each is the midpoint of its interval).
Mean: , and , so kg.
Why an estimate: grouping discards the exact values and replaces every parcel in a class with the class centre, so the calculation assumes the data is evenly spread within each interval, which is generally not exactly true.
Markers award a mark for the class centres, a mark for a correct , a mark for the division, and a mark for a clear reason that the class centre stands in for the real values. Using the upper end of each interval instead of the centre is the most common error and loses the accuracy marks.
2020 HSC-style3 marksA real-estate agent says the mean sale price of the seven houses she sold last month was $ million, but the median sale price was only $. (a) Explain how the mean can be so much larger than the median. (b) State which measure better represents a typical house price, with a reason.Show worked answer →
Part (a): the data must contain one or more very expensive sales (high outliers) that lift the mean. The mean adds in the actual dollar value of every sale, so a single multi-million-dollar property pulls it well above the bulk of the data, whereas the median only depends on the middle sale's position and is barely affected.
Part (b): the median of $ better represents a typical house price, because it is not distorted by the extreme high sales; the mean of $ million overstates what most buyers actually paid.
Markers reward the identification of outlier/skew as the cause and a justified choice of the median, using the idea that the mean is sensitive to extreme values while the median is resistant. A bare answer with no reference to outliers earns little.
Practice questions
Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.
foundation2 marksFor the data set , find (a) the mean and (b) the median.Show worked solution →
Part (a) - mean is the sum divided by how many. There are values, so add them and divide by :
so the mean is .
Part (b) - median is the middle of the ordered list. The data is already in order. With values the middle one is the rd: , so the median is . (Check: the median sits inside the data range to , as it must.)
foundation2 marksFor the data set , find (a) the mode and (b) the median.Show worked solution →
Part (a) - mode is the most common value. The value appears twice and every other value appears once, so the mode is .
Part (b) - median with an even number of values is the average of the middle two. The list is already ordered and has values, so the two middle ones are the rd and th, namely and :
so the median is . (Note the median need not be one of the data values when is even.)
foundation2 marksThe table shows the number of pets owned by students. Find the mean number of pets, correct to two decimal places. | Pets () | | | | | | Frequency () | | | | |Show worked solution →
Build the column. Multiply each value by its frequency:
Add the columns. The total frequency is and the total of the column is
Divide. The mean of a frequency table is
so the mean is pets. (Check: matches the students given, so no row was missed.)
core4 marksSeven small businesses report their annual profit, in thousands of dollars: . (a) Find the mean. (b) Find the median. (c) Find the mode. (d) State, with a reason, which of the three is the most appropriate measure of the typical profit.Show worked solution →
Part (a) - mean. Add the seven values and divide by :
so the mean profit is $.
- Part (b) - median
- The list is ordered and has values, so the median is the th value: , giving a median of $.
- Part (c) - mode
- The only repeated value is , so the mode is $.
- Part (d) - most appropriate measure
- The value is an outlier - it sits far above the others and drags the mean up to , which is larger than six of the seven businesses actually earned. The median of $ is the most appropriate measure of the typical profit, because it is not distorted by the one unusually high value. (Check: six of the seven profits are between and , so a "typical" figure near is far more representative than .)
core3 marksOver a season a netball player records the number of goals she scores each game in a frequency table. Find the mean number of goals per game. | Goals () | | | | | | | | Frequency () | | | | | | |Show worked solution →
Form the column. Multiply each goal value by its frequency:
Total the two columns. The number of games is
and the total number of goals is
Divide. The mean is
so she averages goals per game. (Check: is the total goals over games, and is a sensible game average.)
core4 marksThe grouped frequency table shows the time, in minutes, that customers waited to be served. Using class centres, estimate the mean waiting time. | Time (min) | | | | | | | Frequency () | | | | | |Show worked solution →
Find each class centre. The class centre is the midpoint of the interval, the average of its lower and upper ends:
Build the column using the class centre as :
Total the columns.
Divide. The estimated mean is
so the mean waiting time is about minutes. (This is an estimate, because grouping replaces every value in a class with its centre. Check: falls in the class, which holds the most customers, so it is sensible.)
exam6 marksA cafe records the number of cups of coffee sold each hour during a -hour morning shift: . (a) Find the mean, median and mode. (b) The owner later realises one hour was mis-recorded: the busiest hour actually sold cups, not the that was the largest value. Replace that value with and recalculate the mean and the median. (c) Explain which measure changed more, and which measure better describes a typical hour's sales after the correction.Show worked solution →
Part (a) - the three measures for the original data. Add the seven values:
The list is ordered with values, so the median is the th value: , giving a median of . The most common value is... and each appear twice, so the data is bimodal, with modes and .
Part (b) - replace the largest with . The new data set is . The new mean is
The new ordered list is , so the median is still the th value, .
Part (c) - which changed, and which is better. The mean jumped from to about , a rise of nearly , while the median did not change at all (it stayed ). The mean changed far more because it adds in the actual size of every value, so one large outlier () pulls it upward; the median only depends on the position of the middle value, so a single extreme value barely moves it. After the correction the median of better describes a typical hour, since six of the seven hours sold between and cups. (Check: the new mean is larger than of the data values, the tell-tale sign that an outlier has distorted it.)
Related dot points
- Organise, interpret and display data into appropriate tabular and graphical representations including frequency distribution tables, both ungrouped and grouped using class intervals and class centres, and cumulative frequency
A focused answer to the HSC Maths Standard 2 dot point on frequency tables. Tallying raw data into a frequency table, grouping data into class intervals, finding the class centre, and building the cumulative frequency column, with worked Australian examples and the totals checked so the cumulative frequency ends at the sample size.
- Calculate measures of spread, including the range, quartiles and interquartile range, and the population standard deviation using technology
A focused answer to the HSC Maths Standard 2 dot point on measures of spread. The range, the quartiles and interquartile range, the five-number summary, the population standard deviation from a calculator, and how to compare the spread of two data sets, with worked Australian examples.
- Determine outliers using the interquartile range, describe and interpret the shape and features of a distribution (symmetry, skewness, modality, centre, spread and outliers) and compare data displays using these features
A focused answer to the HSC Maths Standard 2 dot point on outliers and describing distributions. The 1.5 times IQR outlier test with lower and upper fences, telling symmetric from positively and negatively skewed data, unimodal versus bimodal shape, and writing a full describe-the-distribution answer covering shape, centre, spread and outliers, with worked Australian examples.
- Construct and interpret box-and-whisker plots and use them, including parallel (side-by-side) box plots, to compare data sets in terms of centre, spread, skewness and outliers
A focused answer to the HSC Maths Standard 2 dot point on box-and-whisker plots. Building a box plot from the five-number summary, flagging an outlier with the 1.5 times IQR rule, drawing parallel box plots, and comparing two groups by centre, spread, skew and outliers, with worked Australian examples.