Skip to main content
NSWMaths Standard 2Syllabus dot point

How are the mean, median and mode used to summarise the centre of a data set, and which is the most appropriate measure?

Calculate measures of central tendency, including the mean, median and mode, for both raw data and data presented in a frequency table

A focused answer to the HSC Maths Standard 2 dot point on the mean, median and mode. Finding all three from a raw list, the mean and mode from a frequency table, the mean from grouped data using class centres, and choosing the most appropriate measure when the data is skewed or has an outlier, with worked Australian examples.

Generated by Claude Opus 4.814 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

What this dot point is asking

NESA wants you to summarise a whole data set with a single "centre" value, and to know there are three of them: the mean, the median and the mode. You need to find all three from a raw list, find the mean and the mode from a frequency table, and estimate the mean from grouped data using class centres. Just as important is the judgement call NESA tests right beside the arithmetic: deciding which measure best describes a particular data set, because an outlier or a skewed shape can make the mean misleading. The calculations are short. The marks are won by setting out a clear fxfx table and by choosing the right measure with a one-line reason.

The answer

A measure of central tendency is a single number that stands for the middle, or typical value, of a data set. The three you must know are:

  • Mean (xˉ\bar{x}): the arithmetic average, the sum of all values divided by how many there are.
  • Median: the middle value once the data is put in order.
  • Mode: the value that occurs most often.

Each answers "what is typical?" in a slightly different way, and they can give different numbers for the same data. The diagram below shows why that happens: when a data set has a long tail on one side (it is skewed), the mean is dragged toward the tail while the median stays near the bulk of the data.

Mean versus median in a right-skewed data setA smooth hump that rises quickly on the left and trails off in a long tail to the right. A solid vertical line near the peak marks the median; a dashed vertical line further right marks the mean, which is pulled toward the long tail. The mean is greater than the median.A long right tail pulls the mean above the medianmedianmeanlong tail of high valuesvalue →

The mean

The mean of a list is the total of all the values divided by the number of values:

xˉ=sum of all valuesnumber of values=xn.\bar{x} = \frac{\text{sum of all values}}{\text{number of values}} = \frac{\sum x}{n}.

For the marks 4,7,7,9,134, 7, 7, 9, 13 there are n=5n = 5 values totalling 4040, so xˉ=40÷5=8\bar{x} = 40 \div 5 = 8. The symbol xˉ\bar{x} (read "x-bar") is the standard name for the mean. The mean uses every value, which makes it the most informative measure when the data has no extreme values - but it is exactly that sensitivity that makes a single outlier able to distort it.

The median

The median is the middle value of the ordered data. Always sort the data first. Then:

  • if there is an odd number of values, the median is the single middle one (for nn values it is in position n+12\tfrac{n+1}{2}),
  • if there is an even number of values, the median is the average of the two middle values.

For 4,7,7,9,134, 7, 7, 9, 13 (already ordered, n=5n = 5) the middle is the 33rd value, so the median is 77. For 3,5,5,6,8,93, 5, 5, 6, 8, 9 (n=6n = 6) the two middle values are the 33rd and 44th, 55 and 66, so the median is 5+62=5.5\tfrac{5+6}{2} = 5.5. Because the median only cares about position, an outlier hardly moves it - which is its great strength.

The mode

The mode is the value (or category) that occurs most often. For 36,37,38,38,38,39,40,40,4236, 37, 38, 38, 38, 39, 40, 40, 42 the value 3838 occurs three times, more than any other, so the mode is 3838. A data set can have:

  • no mode, if every value occurs once,
  • two modes (bimodal) or more, if several values tie for the highest frequency.

The mode is the only measure of centre you can use for categorical data, such as the most popular ice-cream flavour, because you cannot add up or order categories.

The mean and mode from a frequency table

When data is given in a frequency table you do not write out every value. Instead you add an fxfx column - each value xx multiplied by its frequency ff - and use:

xˉ=fxf\bar{x} = \frac{\sum fx}{\sum f}

where f\sum f is the total frequency (how many data values there are altogether) and fx\sum fx is the total of the fxfx column. The mode is simply the value with the highest frequency, and the median is the value at the middle position, found by counting down the frequencies. The worked set below shows the full fxfx layout.

Estimating the mean from grouped data

Sometimes data is grouped into class intervals (for example 0 to 100\text{ to }10, 10 to 2010\text{ to }20) and the individual values are lost. To estimate the mean you replace each class by its class centre, the midpoint of the interval:

class centre=lower end+upper end2.\text{class centre} = \frac{\text{lower end} + \text{upper end}}{2}.

Then you treat each class centre as the value xx and use the same frequency-table formula, xˉ=fxf\bar{x} = \dfrac{\sum fx}{\sum f}. The answer is an estimate, because every value in a class is assumed to sit at the centre.

Choosing the most appropriate measure

Different data sets call for different measures. The deciding factors are outliers and skew:

  • Use the mean when the data is roughly symmetric with no outliers, because it uses all the information.
  • Use the median when there is an outlier or the data is skewed, because the median is not dragged toward extreme values. House prices and incomes are the classic examples: a few very high values inflate the mean, so the median is reported instead.
  • Use the mode for categorical data, or when the most common value is what matters (a shoe shop cares about the most-sold size, not the "average" size).

The dot plot below makes the effect of an outlier visible. Most values cluster between 22 and 77, but a single value of 2020 pulls the mean (5.85.8) to the right, well above the median and mode (both 44), which stay with the cluster.

Mean, median and mode on a skewed dot plotA dot plot on a number line from two to twenty. Stacked dots cluster between two and seven, with one isolated dot at twenty. A solid pointer marks the median and mode together at four, and a dashed pointer marks the mean at about five point eight, sitting to the right because the value of twenty pulls the mean up.One outlier pulls the mean right of the median and mode2468101214161820median = mode = 4mean = 5.8outlier

How exam questions ask about central tendency

The wording maps straight to a method:

  • "Find the mean / average" means xn\dfrac{\sum x}{n} for a list, or fxf\dfrac{\sum fx}{\sum f} for a frequency table.
  • "Find the median" means order the data first, then take the middle (average the middle two if nn is even). Forgetting to order is the classic lost mark.
  • "Find the mode / the most common ..." means the value with the highest frequency; be ready to answer "no mode" or "bimodal".
  • "Estimate the mean" from a grouped table signals class centres, and the word "estimate" is your cue to say the answer is approximate.
  • "Which measure is most appropriate?" or "why is the mean misleading?" is asking about outliers and skew: name the outlier, and choose the median with a reason.
  • "The mean is larger than the median - explain" wants you to identify a high outlier or a right (positive) skew pulling the mean up.

Exam-style practice questions

Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2021 HSC-style3 marksThe dot plot below records the number of siblings of 2020 students. The values are 0,0,0,1,1,1,1,1,2,2,2,2,3,3,3,4,4,5,6,90, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4, 5, 6, 9. (a) Find the mode. (b) Find the median. (c) Find the mean, correct to one decimal place.
Show worked answer →

Mode: 11 (it occurs five times, more than any other value).

Median: with 2020 values, average the 1010th and 1111th ordered values, which are 22 and 22, so the median is 22.

Mean: the total is 0(3)+1(5)+2(4)+3(3)+4(2)+5+6+9=0+5+8+9+8+5+6+9=500(3)+1(5)+2(4)+3(3)+4(2)+5+6+9 = 0+5+8+9+8+5+6+9 = 50, and 50÷20=2.550 \div 20 = 2.5.

Markers award one mark for each correct measure. A common error is reading the median off the dot plot as the tallest column (11, the mode) instead of locating the middle position. The mean is higher than the median here because of the values 66 and 99 in the right tail, a point worth a one-line comment if the question asks about skew.

2022 HSC-style4 marksThe grouped frequency table shows the masses, in kilograms, of 5050 parcels. | Mass (kg) | 0 to 40\text{ to }4 | 4 to 84\text{ to }8 | 8 to 128\text{ to }12 | 12 to 1612\text{ to }16 | | Frequency | 1414 | 2020 | 1111 | 55 | (a) Write down the class centres. (b) Estimate the mean mass, correct to one decimal place. (c) Explain why your answer is only an estimate.
Show worked answer →

Class centres: 2,6,10,142, 6, 10, 14 (each is the midpoint of its interval).

Mean: fx=2(14)+6(20)+10(11)+14(5)=28+120+110+70=328\sum fx = 2(14)+6(20)+10(11)+14(5) = 28+120+110+70 = 328, and f=50\sum f = 50, so xˉ=32850=6.566.6\bar{x} = \frac{328}{50} = 6.56 \approx 6.6 kg.

Why an estimate: grouping discards the exact values and replaces every parcel in a class with the class centre, so the calculation assumes the data is evenly spread within each interval, which is generally not exactly true.

Markers award a mark for the class centres, a mark for a correct fx\sum fx, a mark for the division, and a mark for a clear reason that the class centre stands in for the real values. Using the upper end of each interval instead of the centre is the most common error and loses the accuracy marks.

2020 HSC-style3 marksA real-estate agent says the mean sale price of the seven houses she sold last month was $1.21.2 million, but the median sale price was only $680000680\,000. (a) Explain how the mean can be so much larger than the median. (b) State which measure better represents a typical house price, with a reason.
Show worked answer →

Part (a): the data must contain one or more very expensive sales (high outliers) that lift the mean. The mean adds in the actual dollar value of every sale, so a single multi-million-dollar property pulls it well above the bulk of the data, whereas the median only depends on the middle sale's position and is barely affected.

Part (b): the median of $680000680\,000 better represents a typical house price, because it is not distorted by the extreme high sales; the mean of $1.21.2 million overstates what most buyers actually paid.

Markers reward the identification of outlier/skew as the cause and a justified choice of the median, using the idea that the mean is sensitive to extreme values while the median is resistant. A bare answer with no reference to outliers earns little.

Practice questions

Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.

foundation2 marksFor the data set 4,7,7,9,134, 7, 7, 9, 13, find (a) the mean and (b) the median.
Show worked solution →

Part (a) - mean is the sum divided by how many. There are 55 values, so add them and divide by 55:

xˉ=4+7+7+9+135=405=8\bar{x} = \frac{4 + 7 + 7 + 9 + 13}{5} = \frac{40}{5} = 8

so the mean is 88.

Part (b) - median is the middle of the ordered list. The data is already in order. With 55 values the middle one is the 33rd: 4,7,7,9,134, 7, \mathbf{7}, 9, 13, so the median is 77. (Check: the median 77 sits inside the data range 44 to 1313, as it must.)

foundation2 marksFor the data set 3,5,5,6,8,93, 5, 5, 6, 8, 9, find (a) the mode and (b) the median.
Show worked solution →

Part (a) - mode is the most common value. The value 55 appears twice and every other value appears once, so the mode is 55.

Part (b) - median with an even number of values is the average of the middle two. The list is already ordered and has 66 values, so the two middle ones are the 33rd and 44th, namely 55 and 66:

median=5+62=112=5.5\text{median} = \frac{5 + 6}{2} = \frac{11}{2} = 5.5

so the median is 5.55.5. (Note the median need not be one of the data values when nn is even.)

foundation2 marksThe table shows the number of pets owned by 2020 students. Find the mean number of pets, correct to two decimal places. | Pets (xx) | 00 | 11 | 22 | 33 | | Frequency (ff) | 55 | 88 | 44 | 33 |
Show worked solution →

Build the fxfx column. Multiply each value by its frequency:

0×5=0,1×8=8,2×4=8,3×3=90 \times 5 = 0, \quad 1 \times 8 = 8, \quad 2 \times 4 = 8, \quad 3 \times 3 = 9

Add the columns. The total frequency is f=5+8+4+3=20\sum f = 5 + 8 + 4 + 3 = 20 and the total of the fxfx column is

fx=0+8+8+9=25\sum fx = 0 + 8 + 8 + 9 = 25

Divide. The mean of a frequency table is

xˉ=fxf=2520=1.25\bar{x} = \frac{\sum fx}{\sum f} = \frac{25}{20} = 1.25

so the mean is 1.251.25 pets. (Check: f=20\sum f = 20 matches the 2020 students given, so no row was missed.)

core4 marksSeven small businesses report their annual profit, in thousands of dollars: 48,52,55,55,58,61,21048, 52, 55, 55, 58, 61, 210. (a) Find the mean. (b) Find the median. (c) Find the mode. (d) State, with a reason, which of the three is the most appropriate measure of the typical profit.
Show worked solution →

Part (a) - mean. Add the seven values and divide by 77:

xˉ=48+52+55+55+58+61+2107=5397=77\bar{x} = \frac{48 + 52 + 55 + 55 + 58 + 61 + 210}{7} = \frac{539}{7} = 77

so the mean profit is $7700077\,000.

Part (b) - median
The list is ordered and has 77 values, so the median is the 44th value: 48,52,55,55,58,61,21048, 52, 55, \mathbf{55}, 58, 61, 210, giving a median of $5500055\,000.
Part (c) - mode
The only repeated value is 5555, so the mode is $5500055\,000.
Part (d) - most appropriate measure
The value 210210 is an outlier - it sits far above the others and drags the mean up to 7777, which is larger than six of the seven businesses actually earned. The median of $5500055\,000 is the most appropriate measure of the typical profit, because it is not distorted by the one unusually high value. (Check: six of the seven profits are between 4848 and 6161, so a "typical" figure near 5555 is far more representative than 7777.)
core3 marksOver a season a netball player records the number of goals she scores each game in a frequency table. Find the mean number of goals per game. | Goals (xx) | 00 | 11 | 22 | 33 | 44 | 55 | | Frequency (ff) | 33 | 66 | 88 | 55 | 22 | 11 |
Show worked solution →

Form the fxfx column. Multiply each goal value by its frequency:

0×3=0,1×6=6,2×8=16,3×5=15,4×2=8,5×1=50\times3=0,\quad 1\times6=6,\quad 2\times8=16,\quad 3\times5=15,\quad 4\times2=8,\quad 5\times1=5

Total the two columns. The number of games is

f=3+6+8+5+2+1=25\sum f = 3 + 6 + 8 + 5 + 2 + 1 = 25

and the total number of goals is

fx=0+6+16+15+8+5=50\sum fx = 0 + 6 + 16 + 15 + 8 + 5 = 50

Divide. The mean is

xˉ=fxf=5025=2\bar{x} = \frac{\sum fx}{\sum f} = \frac{50}{25} = 2

so she averages 22 goals per game. (Check: fx=50\sum fx = 50 is the total goals over 2525 games, and 50÷25=250 \div 25 = 2 is a sensible game average.)

core4 marksThe grouped frequency table shows the time, in minutes, that 4040 customers waited to be served. Using class centres, estimate the mean waiting time. | Time (min) | 0 to 100\text{ to }10 | 10 to 2010\text{ to }20 | 20 to 3020\text{ to }30 | 30 to 4030\text{ to }40 | 40 to 5040\text{ to }50 | | Frequency (ff) | 66 | 1010 | 1212 | 88 | 44 |
Show worked solution →

Find each class centre. The class centre is the midpoint of the interval, the average of its lower and upper ends:

0+102=5,10+202=15,20+302=25,30+402=35,40+502=45\tfrac{0+10}{2}=5,\quad \tfrac{10+20}{2}=15,\quad \tfrac{20+30}{2}=25,\quad \tfrac{30+40}{2}=35,\quad \tfrac{40+50}{2}=45

Build the fxfx column using the class centre as xx:

5×6=30,  15×10=150,  25×12=300,  35×8=280,  45×4=1805\times6=30,\ \ 15\times10=150,\ \ 25\times12=300,\ \ 35\times8=280,\ \ 45\times4=180

Total the columns.

f=6+10+12+8+4=40,fx=30+150+300+280+180=940\sum f = 6+10+12+8+4 = 40, \qquad \sum fx = 30+150+300+280+180 = 940

Divide. The estimated mean is

xˉ=fxf=94040=23.5\bar{x} = \frac{\sum fx}{\sum f} = \frac{940}{40} = 23.5

so the mean waiting time is about 23.523.5 minutes. (This is an estimate, because grouping replaces every value in a class with its centre. Check: 23.523.5 falls in the 20 to 3020\text{ to }30 class, which holds the most customers, so it is sensible.)

exam6 marksA cafe records the number of cups of coffee sold each hour during a 77-hour morning shift: 12,13,15,15,16,17,1712, 13, 15, 15, 16, 17, 17. (a) Find the mean, median and mode. (b) The owner later realises one hour was mis-recorded: the busiest hour actually sold 5050 cups, not the 1717 that was the largest value. Replace that value with 5050 and recalculate the mean and the median. (c) Explain which measure changed more, and which measure better describes a typical hour's sales after the correction.
Show worked solution →

Part (a) - the three measures for the original data. Add the seven values:

xˉ=12+13+15+15+16+17+177=1057=15\bar{x} = \frac{12 + 13 + 15 + 15 + 16 + 17 + 17}{7} = \frac{105}{7} = 15

The list is ordered with 77 values, so the median is the 44th value: 12,13,15,15,16,17,1712, 13, 15, \mathbf{15}, 16, 17, 17, giving a median of 1515. The most common value is... 1515 and 1717 each appear twice, so the data is bimodal, with modes 1515 and 1717.

Part (b) - replace the largest 1717 with 5050. The new data set is 12,13,15,15,16,17,5012, 13, 15, 15, 16, 17, 50. The new mean is

xˉ=12+13+15+15+16+17+507=138719.71\bar{x} = \frac{12 + 13 + 15 + 15 + 16 + 17 + 50}{7} = \frac{138}{7} \approx 19.71

The new ordered list is 12,13,15,15,16,17,5012, 13, 15, \mathbf{15}, 16, 17, 50, so the median is still the 44th value, 1515.

Part (c) - which changed, and which is better. The mean jumped from 1515 to about 19.7119.71, a rise of nearly 55, while the median did not change at all (it stayed 1515). The mean changed far more because it adds in the actual size of every value, so one large outlier (5050) pulls it upward; the median only depends on the position of the middle value, so a single extreme value barely moves it. After the correction the median of 1515 better describes a typical hour, since six of the seven hours sold between 1212 and 1717 cups. (Check: the new mean 19.7119.71 is larger than 66 of the 77 data values, the tell-tale sign that an outlier has distorted it.)

Related dot points