Skip to main content
NSWMaths Standard 2Syllabus dot point

How do you test whether a value is an outlier, name the shape of a data set, and write a full description of a distribution?

Determine outliers using the interquartile range, describe and interpret the shape and features of a distribution (symmetry, skewness, modality, centre, spread and outliers) and compare data displays using these features

A focused answer to the HSC Maths Standard 2 dot point on outliers and describing distributions. The 1.5 times IQR outlier test with lower and upper fences, telling symmetric from positively and negatively skewed data, unimodal versus bimodal shape, and writing a full describe-the-distribution answer covering shape, centre, spread and outliers, with worked Australian examples.

Generated by Claude Opus 4.814 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

What this dot point is asking

NESA wants you to do two linked jobs. First, apply a definite rule to decide whether an extreme value is an outlier: the 1.5×IQR1.5 \times \text{IQR} test, which builds a lower and an upper "fence" from the quartiles and flags anything beyond them. Second, describe a distribution in words, naming its shape (symmetric, or skewed left or right), its modality (one peak or two), its centre, its spread, and any outliers. Almost every Data Analysis question that shows a graph or a data set ends with "describe the distribution" or "is this an outlier", so these are among the most reliable marks in the module. The arithmetic is light; the marks are won by stating the fence you test against, showing the comparison, and using the right vocabulary for the shape.

The answer

There are two skills here, and they share one idea: the middle of the data is steady, and you measure everything relative to it. Outliers are found by stepping a fixed distance (1.5×IQR1.5 \times \text{IQR}) out from the quartiles. Shape is read from how the data sits around its centre - balanced (symmetric) or lopsided (skewed).

The interquartile range, quickly

The interquartile range is the spread of the middle half of the data:

IQR=Q3Q1,\text{IQR} = Q_3 - Q_1,

where Q1Q_1 is the lower quartile (a quarter of the way through the ordered data) and Q3Q_3 is the upper quartile (three quarters of the way through). The IQR ignores the extreme top and bottom quarters, so it is not distorted by a single wild value - which is exactly why the outlier test is built on it.

The 1.5×IQR1.5 \times \text{IQR} outlier test

An outlier is a value that lies unusually far from the rest of the data. The standard test draws two fences:

  • lower fence =Q11.5×IQR= Q_1 - 1.5 \times \text{IQR},
  • upper fence =Q3+1.5×IQR= Q_3 + 1.5 \times \text{IQR}.

Any value below the lower fence or above the upper fence is an outlier. Everything between the fences is treated as ordinary. The number line below shows the fences built out from the quartiles, with two values flagged because they fall beyond them.

The 1.5 times IQR outlier fences on a number lineA horizontal number line from zero to fifty. The lower quartile is at twenty-two and the upper quartile at twenty-eight. The lower fence sits one and a half interquartile ranges below the lower quartile at thirteen, and the upper fence the same distance above the upper quartile at thirty-seven. Most data dots lie between the fences. Two dots fall outside, one at three below the lower fence and one at forty-six above the upper fence, and each is labelled with its value as an outlier.Outlier if beyond a fence: below 13 or above 3705101520253035404550Q₁=22Q₃=28lower fence 13upper fence 37outlier 3outlier 46

The two outliers in the diagram, at 33 and 4646, are labelled with their values, not just coloured, so they are identifiable even in black and white. Notice the test treats high and low extremes the same way: always check both fences, because a question may hide a low outlier while you stare at an obvious high one.

Shape: symmetry and skew

The shape of a distribution is how the data sits around its centre. There are three shapes you must name on sight:

  • Symmetric: the data is balanced about the centre, so the left and right halves are near mirror images. The mean and median are roughly equal.
  • Positively skewed (skewed to the right): most data is bunched at the low end with a long tail stretching to the right. The few large values pull the mean above the median.
  • Negatively skewed (skewed to the left): most data is bunched at the high end with a long tail stretching to the left. The few small values pull the mean below the median.

The skew is named for the direction the tail points, which trips up many students: a right-pointing tail is positive skew even though the bulk of the data is on the left. The three smooth curves below show the shapes side by side, with the mean and median marked so you can see how skew pulls them apart.

Positively skewed, symmetric and negatively skewed distribution shapesThree smooth distribution curves. The first is positively skewed with its peak to the left and a long tail to the right, where the mean lies to the right of the median. The second is symmetric and bell shaped, with the mean and median together at the centre. The third is negatively skewed with its peak to the right and a long tail to the left, where the mean lies to the left of the median.Positive skewSymmetricNegative skewmedianmeanmean = medianmedianmeanThe tail names the skew. Mean chases the tail; median stays near the peak.

Modality: how many peaks

Modality counts the clear peaks in the data:

  • Unimodal: one clear peak (one mode). Most single-group data is unimodal.
  • Bimodal: two clear, separate peaks. Two peaks almost always means two groups have been combined - for example heights of male and female students, or sales on weekdays versus weekends. When you see bimodal data, the useful comment is that the data may be better split and described as two groups.

A set with no clear peak (all bars about level) is sometimes called uniform, but unimodal and bimodal are the two you will name most.

Writing a full "describe the distribution" answer

When a question says "describe the distribution", markers expect a checklist, not a vibe. Cover four features, in this order:

  1. Shape - symmetric, positively skewed, or negatively skewed (and mention bimodal if there are two peaks).
  2. Centre - quote the median (preferred when the data is skewed or has an outlier) or the mean, with its value.
  3. Spread - quote the IQR (preferred when skewed) or the range, with its value.
  4. Outliers - state any outliers (ideally justified by the 1.5×IQR1.5 \times \text{IQR} test) and whether they are kept.

A reliable sentence frame is: "The distribution is [shape], centred at [median] with a spread (IQR) of [value], and [has one outlier at .../ has no outliers]." Pairing median with IQR is the safe choice, because both resist outliers; pair mean with standard deviation only when the data is roughly symmetric.

How exam questions ask about outliers and shape

The wording maps straight onto a method:

  • "Is [value] an outlier?" or "Determine whether ... is an outlier" - run the 1.5×IQR1.5 \times \text{IQR} test: find the IQR, find the relevant fence, then state the comparison and conclusion.
  • "Show that [value] is an outlier" - the answer is already known, so the marks are entirely in the working: fence calculation plus the comparison.
  • "Describe the shape" or "What is the shape of the distribution?" - name symmetric, positive skew or negative skew (the tail names the skew), and add modality if there are two peaks.
  • "Describe the distribution" - the full four-part answer: shape, centre, spread, outliers.
  • "Which measure of centre is more appropriate?" - the median if the data is skewed or has an outlier, because it resists extreme values; otherwise the mean.
  • "Compare the two distributions" - compare like with like: centre against centre and spread against spread, using the median and IQR, then note shape and outliers.

Exam-style practice questions

Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2022 HSC-style3 marksA data set has a lower quartile of Q1=18Q_1 = 18 and an upper quartile of Q3=26Q_3 = 26. The largest value is 4444. Determine whether 4444 is an outlier, showing your working.
Show worked answer →

A full-mark response computes the IQR, 2618=826 - 18 = 8, then the upper fence, Q3+1.5×IQR=26+1.5×8=26+12=38Q_3 + 1.5 \times \text{IQR} = 26 + 1.5 \times 8 = 26 + 12 = 38.

It then states the comparison explicitly: 44>3844 > 38, therefore 4444 is an outlier.

Markers award one mark for the IQR, one for correctly applying the 1.5×IQR1.5 \times \text{IQR} rule to get the fence 3838, and one for the comparison and conclusion. A bare "yes" with no fence shown scores poorly, even if the answer is correct.

2021 HSC-style4 marksThe histogram of weekly earnings for a group of workers is bunched at the lower end with a long tail stretching to the right, where a few workers earn much more. (a) Name the shape of the distribution. (b) State whether the mean or the median is the larger measure of centre, and explain why. (c) State which measure better represents a typical worker, with a reason.
Show worked answer →

Part (a): the distribution is positively skewed (skewed to the right) - the long tail points to the right.

Part (b): the mean is larger than the median, because the small number of very high earners in the right tail pull the mean up, while the median (the middle position) is barely affected.

Part (c): the median better represents a typical worker, because it is resistant to the few extreme high incomes that distort the mean.

Markers reward the correct shape name, the correct mean-versus-median direction WITH the tail reasoning, and a justified choice of the median for a typical value. Naming the skew the wrong way (a common slip) loses the part (a) and part (b) marks together.

2023 HSC-style4 marksA set of 1212 daily maximum temperatures has a five-number summary with minimum 1919, Q1=24Q_1 = 24, median 2727, Q3=30Q_3 = 30 and maximum 4141. (a) Show that 4141 is an outlier. (b) Describe the distribution, referring to shape, centre and spread.
Show worked answer →

Part (a): IQR =3024=6= 30 - 24 = 6; upper fence =30+1.5×6=30+9=39= 30 + 1.5 \times 6 = 30 + 9 = 39; since 41>3941 > 39, the value 4141 is an outlier.

Part (b): shape is positively skewed (the upper tail is stretched by the high value); centre is a median of 2727 degrees (preferred over the mean because of the outlier); spread is an IQR of 66 degrees for the middle half, with a full range of 4119=2241 - 19 = 22 degrees inflated by the outlier.

Markers award the outlier test (fence plus comparison), then one mark each for a correctly justified shape, centre and spread. Quoting the median rather than the mean for the centre of a skewed set is part of what is rewarded here.

Practice questions

Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.

foundation2 marksFor a data set the lower quartile is Q1=13Q_1 = 13 and the upper quartile is Q3=17Q_3 = 17. (a) Find the interquartile range. (b) Find the lower and upper outlier fences using the 1.5×IQR1.5 \times \text{IQR} rule.
Show worked solution →

Part (a) - interquartile range. The IQR is the upper quartile minus the lower quartile:

IQR=Q3Q1=1713=4\text{IQR} = Q_3 - Q_1 = 17 - 13 = 4

Part (b) - the two fences. The lower fence sits 1.5×IQR1.5 \times \text{IQR} below Q1Q_1 and the upper fence sits 1.5×IQR1.5 \times \text{IQR} above Q3Q_3. Here 1.5×4=61.5 \times 4 = 6, so

lower fence=Q11.5×IQR=136=7\text{lower fence} = Q_1 - 1.5 \times \text{IQR} = 13 - 6 = 7

upper fence=Q3+1.5×IQR=17+6=23\text{upper fence} = Q_3 + 1.5 \times \text{IQR} = 17 + 6 = 23

Any value below 77 or above 2323 would be flagged as an outlier. (Check: the fences sit one full 1.5×IQR1.5 \times \text{IQR} step outside each quartile, so they should straddle the quartiles symmetrically, and 77 and 2323 are each 66 away from 1313 and 1717.)

foundation2 marksA data set is positively skewed (skewed to the right). (a) State which is larger, the mean or the median. (b) State on which side the long tail of the data lies.
Show worked solution →

Part (a) - mean versus median. In a positively skewed set the few large values in the tail pull the mean upward, while the median (a position, not a total) barely moves. So the mean is greater than the median.

Part (b) - the tail. Positive skew means the data is stretched out towards the high (positive) end, so the long tail points to the right. The bulk of the data is bunched on the left with a few large values trailing off to the right. (Memory hook: the skew is named for the direction the tail points, so "positive / right skew" has its tail on the right.)

foundation1 marksA histogram of student heights shows two clear, separate peaks. State the modality of the distribution and suggest what the two peaks might represent.
Show worked solution →

Count the peaks. Two clear, separate peaks means the distribution is bimodal.

Interpret the peaks. Two peaks usually signals two groups combined into one data set. For heights, a plausible explanation is that the data mixes two subgroups, for example male and female students, each clustering around its own typical height. (A single-peak set is unimodal; no clear peak is sometimes called uniform.)

core3 marksThe number of goals scored by a netball team across 1111 games, in order, is 8,9,11,12,13,14,15,16,17,18,408, 9, 11, 12, 13, 14, 15, 16, 17, 18, 40. The five-number summary gives Q1=11Q_1 = 11 and Q3=17Q_3 = 17. Use the 1.5×IQR1.5 \times \text{IQR} rule to test whether 4040 is an outlier.
Show worked solution →

Find the IQR. Subtract the quartiles:

IQR=Q3Q1=1711=6\text{IQR} = Q_3 - Q_1 = 17 - 11 = 6

Find the upper fence. A high value is tested against the upper fence, Q3+1.5×IQRQ_3 + 1.5 \times \text{IQR}. Here 1.5×6=91.5 \times 6 = 9, so

upper fence=17+9=26\text{upper fence} = 17 + 9 = 26

Compare and conclude. The value 4040 is greater than the upper fence 2626, so

40>2640 > 26

means 4040 is an outlier by the 1.5×IQR1.5 \times \text{IQR} rule. (For completeness the lower fence is 119=211 - 9 = 2, and the smallest value 88 is above 22, so there is no low outlier. State the fence you cross, then the comparison: that is the line markers reward.)

core3 marksThe waiting times (in minutes) at a clinic for 1111 patients, in order, are 3,20,22,23,24,25,26,27,28,29,463, 20, 22, 23, 24, 25, 26, 27, 28, 29, 46, with Q1=22Q_1 = 22 and Q3=28Q_3 = 28. Test both ends for outliers using the 1.5×IQR1.5 \times \text{IQR} rule and list any outliers.
Show worked solution →

Find the IQR. Subtract the quartiles:

IQR=Q3Q1=2822=6\text{IQR} = Q_3 - Q_1 = 28 - 22 = 6

Find both fences. With 1.5×6=91.5 \times 6 = 9:

lower fence=Q11.5×IQR=229=13\text{lower fence} = Q_1 - 1.5 \times \text{IQR} = 22 - 9 = 13

upper fence=Q3+1.5×IQR=28+9=37\text{upper fence} = Q_3 + 1.5 \times \text{IQR} = 28 + 9 = 37

Compare each extreme value. The smallest value is 33 and the largest is 4646:

3<13and46>373 < 13 \quad \text{and} \quad 46 > 37

so both fall outside their fences. The outliers are 33 minutes and 4646 minutes. (Check: every other value lies between 1313 and 3737, so exactly two points are flagged, one at each end. Always test both fences, not just the obvious big value.)

exam5 marksThe times (in minutes) for 1515 commuters to travel to work, in order, are 22,24,25,26,28,29,30,31,32,33,35,36,38,40,7222, 24, 25, 26, 28, 29, 30, 31, 32, 33, 35, 36, 38, 40, 72. For this data Q1=26Q_1 = 26, the median is 3131 and Q3=36Q_3 = 36. (a) Find the IQR. (b) Use the 1.5×IQR1.5 \times \text{IQR} rule to test whether 7272 is an outlier. (c) Describe the distribution, commenting on shape, centre, spread and outliers.
Show worked solution →

Part (a) - the IQR. Subtract the quartiles:

IQR=Q3Q1=3626=10\text{IQR} = Q_3 - Q_1 = 36 - 26 = 10

Part (b) - test the value 7272. A high value is tested against the upper fence. With 1.5×10=151.5 \times 10 = 15:

upper fence=Q3+1.5×IQR=36+15=51\text{upper fence} = Q_3 + 1.5 \times \text{IQR} = 36 + 15 = 51

Since

72>5172 > 51

the value 7272 is an outlier.

Part (c) - describe the distribution. Work through the four features in order:

  • Shape: ignoring the outlier the values rise fairly evenly, but the long upper tail (one value far above the rest) makes the data positively skewed (skewed to the right).
  • Centre: the median is 3131 minutes; the median is the better measure of centre here because the outlier would inflate the mean.
  • Spread: the IQR is 1010 minutes (the middle half of commuters are within a 1010 minute band); the full range is 7222=5072 - 22 = 50 minutes, stretched by the outlier.
  • Outliers: there is one outlier at 7272 minutes, well above the upper fence of 5151; this is a genuine value (a very long commute, perhaps a transport delay) rather than an error, so it should be kept but noted.

So the travel times are positively skewed with a median of 3131 minutes, an IQR of 1010 minutes, and one high outlier at 7272 minutes. (Check: the median 3131 sits inside the quartiles 2626 and 3636 as it must, and only the single value 7272 crosses a fence.)

exam6 marksA teacher records two class quiz results out of 2020. Class A, in order, is 4,6,7,8,9,10,11,12,13,14,204, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20 with Q1=7Q_1 = 7, median 1010, Q3=13Q_3 = 13. Class B, in order, is 12,14,15,16,16,17,17,18,18,19,2012, 14, 15, 16, 16, 17, 17, 18, 18, 19, 20 with Q1=15Q_1 = 15, median 1717, Q3=18Q_3 = 18. (a) Test Class A for outliers using the 1.5×IQR1.5 \times \text{IQR} rule. (b) Describe the shape of each class. (c) Write one or two sentences comparing the two classes' centre and spread.
Show worked solution →

Part (a) - test Class A for outliers. First the IQR:

IQRA=Q3Q1=137=6\text{IQR}_A = Q_3 - Q_1 = 13 - 7 = 6

With 1.5×6=91.5 \times 6 = 9, the fences are

lower fence=79=2,upper fence=13+9=22\text{lower fence} = 7 - 9 = -2, \qquad \text{upper fence} = 13 + 9 = 22

The smallest value is 44 and the largest is 2020, and

4>2and20<224 > -2 \quad \text{and} \quad 20 < 22

so both extremes lie inside the fences: Class A has no outliers.

Part (b) - shape of each class. For Class A the values are spread fairly evenly and the mean and median are close, so the shape is roughly symmetric. For Class B the marks bunch up near the top (the maximum is 2020) with a tail of lower marks trailing down to 1212, so Class B is negatively skewed (skewed to the left).

Part (c) - compare centre and spread. Class B has the higher centre (median 1717 versus 1010), so Class B performed better overall. Class B is also more consistent: its IQR is Q3Q1=1815=3Q_3 - Q_1 = 18 - 15 = 3, smaller than Class A's IQR of 66, so Class B's middle marks are more tightly clustered. (Check: comparing like with like, both statements use the median for centre and the IQR for spread, which is the safe pairing when a set may be skewed.)

Related dot points