Skip to main content
NSWMaths Standard 2Syllabus dot point

How are dot plots and stem-and-leaf plots used to display a data set and reveal its clusters, gaps, outliers and shape?

Display and interpret numerical data using dot plots and stem-and-leaf plots, including back-to-back stem-and-leaf plots, and describe the clusters, gaps, outliers and shape of the data

A focused answer to the HSC Maths Standard 2 dot point on dot plots and stem-and-leaf plots. How to construct and read each display, how to build a back-to-back stem-and-leaf plot to compare two groups, and how to describe clusters, gaps, outliers and the shape of a distribution, with worked Australian examples.

Generated by Claude Opus 4.814 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

What this dot point is asking

NESA wants you to display a set of numerical data in two simple but powerful ways - the dot plot and the stem-and-leaf plot - and then to read meaning out of the picture. Building each display is the easy half. The marks live in the second half: describing the data's clusters (where values bunch up), gaps (empty stretches), outliers (lone values far from the rest) and shape (symmetric, or skewed to one side). You also need the back-to-back stem-and-leaf plot, which lines two groups up on a shared stem so you can compare them at a glance. These displays are the entry point to the whole Data Analysis module, because the words you learn here - cluster, outlier, skew - are the same words every later question wants.

The answer

Both displays keep every original data value visible while showing the overall pattern. A dot plot stacks one dot per value above a number line, so the height of each stack is that value's frequency. A stem-and-leaf plot splits each number into a stem (the leading digit or digits) and a leaf (the last digit), groups the leaves by stem, and so sorts the data while keeping every figure readable. Choose the dot plot for small data sets over a short range of whole numbers; choose the stem-and-leaf plot once the numbers spread over many values (typically two-digit data), where a dot plot would become a forest of stacks.

Dot plot of goals scored per gameA horizontal number line from zero to seven goals. Above each value a column of dots is stacked, each dot one game. The stacks are: zero goals three dots, one goal five dots, two goals four dots, three goals two dots. No dots sit above four, five or six. A single dot sits above seven, separated from the main cluster by a gap, so it is an outlier. The main cluster runs from zero to three goals with the tallest stack above one goal.Goals scored per game (one dot = one game)01234567number of goalsgap (no data)outlier

In the dot plot above, the stack heights are the frequencies: one game with 00 goals appears as a column of dots, and the value 11 has the tallest stack, so 11 goal is the mode. The lone dot above 77, cut off from the rest by an empty stretch, is an outlier.

Constructing a dot plot

A dot plot is built on a horizontal number line. The steps are short:

  • Draw a number line that covers the full range of the data, marking every whole value from the smallest to the largest (including values with no data, so gaps show up).
  • For each data value, place one dot above its position on the line.
  • Stack repeated values directly on top of one another, keeping the dots the same size and evenly spaced, so the stack heights can be compared fairly.

Reading it back, the height of a stack is the frequency of that value, the tallest stack marks the mode, and you can spot clusters, gaps and outliers by eye. Dot plots suit small data sets of whole numbers over a short range - the number of pets per household, goals per game, siblings per student. Once the data spreads over dozens of values, switch to a stem-and-leaf plot.

Constructing a stem-and-leaf plot

A stem-and-leaf plot (sometimes called a stem plot) splits each number into two parts. For two-digit numbers the stem is the tens digit and the leaf is the units digit, so 7474 becomes stem 77, leaf 44. To build one:

  • List the stems in a vertical column, in order, from smallest to largest, drawing a vertical line to their right.
  • Write each value's leaf in the row of its stem.
  • Re-write each row with the leaves in ascending order (smallest first), which sorts the whole data set.
  • Add a key, for example 747\,|\,4 means 7474, so a reader knows how to decode the plot.

The plot below shows the resting heart rates 58,62,65,67,71,72,72,74,75,78,80,81,84,85,88,9658, 62, 65, 67, 71, 72, 72, 74, 75, 78, 80, 81, 84, 85, 88, 96 beats per minute.

Stem | Leaf
  5  | 8
  6  | 2 5 7
  7  | 1 2 2 4 5 8
  8  | 0 1 4 5 8
  9  | 6

Key: 5 | 8 means 58 beats per minute

Turned on its side, the rows of leaves are exactly the stacks of a dot plot, so the longest row (the 7070s) is where the data clusters, and the shape of the data is the shape of the leaf lengths. Because every original value is preserved, you can read the smallest value (5858), the largest (9696) and the mode (7272, the repeated leaf in the 7070s) straight off the plot.

Back-to-back stem-and-leaf plots

A back-to-back stem-and-leaf plot compares two data sets that share the same stems. The stems sit in a central column; one group's leaves run to the left and the other group's to the right. The one thing to watch is direction: on the left side the leaves increase outward from the stem, so the smallest leaf sits nearest the stem and the row reads right to left. The example below compares two squads' push-up counts.

Back-to-back stem-and-leaf plot comparing two squadsA shared central stem column of tens digits one to four. Junior squad leaves are written to the left of the stem and senior squad leaves to the right. The junior leaves cluster in the twenties while the senior leaves cluster in the thirties, showing the seniors completed more push-ups on average.Push-ups in one minute: junior vs senior squadJuniorStemSenior123488 7 5 4 2 116 92 3 5 7 81Key: junior 5 | 2 = 25   senior 2 | 6 = 26

Reading it, both squads have 88 players. The junior leaves bunch in the 2020s and the senior leaves bunch in the 3030s, so the seniors completed more push-ups on average. Comparing two groups always comes back to the same two ideas: which group has the higher centre (typical value) and which has the larger spread (how stretched out the values are).

Describing clusters, gaps, outliers and shape

Once a display is drawn, a "describe the data" question wants a short paragraph hitting these features:

  • Cluster. A range where values bunch together. State where the data piles up ("most values lie between 11 and 33 goals").
  • Gap. A stretch of the scale with no data. Name it ("there is a gap from 33 to 77"); a gap before a lone value is what marks it as an outlier.
  • Outlier. A value sitting well away from the rest, usually past a gap. Name the value and say it is an outlier.
  • Shape. Describe the overall outline of the stacks or leaves:
    • Symmetric - the two sides mirror each other about a central peak.
    • Positively skewed (skewed to the right) - a peak on the left with a long tail trailing to the higher values.
    • Negatively skewed (skewed to the left) - a peak on the right with a long tail trailing to the lower values.
    • Bimodal - two separate peaks (two modes).

A useful tip on skew: the skew is named for the side the long tail points to, not the side of the peak. A long tail stretching towards the big numbers is positive skew.

How exam questions ask about dot plots and stem-and-leaf plots

The command words map straight onto the method:

  • "Construct / draw a dot plot (or stem-and-leaf plot)" - build the display: a number line with stacked dots, or stems with sorted leaves and a key. For stem-and-leaf plots the key is worth a mark on its own, so never omit it.
  • "How many ... ?" - count dots or leaves; the total must match the number of items given (use this as your check).
  • "State the mode / median / range" - read them off: the tallest stack or repeated leaf is the mode; the values are already in order for the median and range.
  • "Identify any outliers" - name the lone value (or values) past the gap and call it an outlier.
  • "Describe the shape / distribution" - give the cluster, any gap, any outlier, and the shape word (symmetric or the direction of skew).
  • "Compare the two data sets" (a back-to-back plot) - address BOTH centre (which group is higher on average) AND spread (which group's values vary more), each justified from the plot.

Exam-style practice questions

Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2021 HSC-style3 marksThe dot plot below shows the number of times 2020 people visited a gym in a week. The stacks are: 00 visits - 44 dots; 11 - 55; 22 - 44; 33 - 33; 44 - 22; 55 - 11; 99 - 11. (a) Find the number of people who visited at least 33 times. (b) Identify the outlier. (c) Describe the shape of the data.
Show worked answer →

Part (a): "at least 33" means 33 or more, so add the stacks at 3,4,53, 4, 5 and 99: 3+2+1+1=73 + 2 + 1 + 1 = 7 people. One mark for the correct total; a common slip is to omit the outlier at 99 or to misread "at least" as "more than".

Part (b): the value 99 is the outlier, sitting alone well above the rest with a clear gap after 55.

Part (c): the main body of the data is clustered at the low end (00 to 55) with a peak at 11 visit and a long tail trailing out to the right towards 99, so the distribution is positively skewed (skewed to the right) with an outlier at 99.

Markers reward the correct count, naming the outlier by its value, and a shape description that uses the words "cluster", "skew" (with the correct direction) and "outlier".

2022 HSC-style4 marksA back-to-back stem-and-leaf plot compares the ages of members at two community clubs, where 343\,|\,4 means 3434. Club A leaves (left): 18 91\,|\,8\ 9; 23 5 72\,|\,3\ 5\ 7; 31 63\,|\,1\ 6; 424\,|\,2. Club B leaves (right): 21 42\,|\,1\ 4; 30 5 5 83\,|\,0\ 5\ 5\ 8; 43 74\,|\,3\ 7; 525\,|\,2. (a) State how many members are in each club. (b) State the youngest member overall. (c) Compare the ages of the two clubs, referring to centre and spread.
Show worked answer →

Part (a): count leaves on each side. Club A has 2+3+2+1=82 + 3 + 2 + 1 = 8 members; Club B has 2+4+2+1=92 + 4 + 2 + 1 = 9 members.

Part (b): the youngest is 1818 (Club A, the smallest leaf in the 1010s row); Club B's youngest is 2121, so 1818 is the youngest overall.

Part (c): Club A's ages cluster in the late teens to thirties and Club B's cluster in the thirties to forties, so Club B has the higher centre (older typical member). For spread, Club A runs 1818 to 4242 (range 2424) and Club B runs 2121 to 5252 (range 3131), so Club B is also more spread out.

Markers reward both correct counts, the correct youngest value read from the right row, and a comparison that explicitly addresses BOTH centre (which club is older on average) AND spread (which club's ages vary more), each justified from the plot.

2023 HSC-style3 marksA student writes the stem-and-leaf plot for the data 32,35,41,41,44,50,5832, 35, 41, 41, 44, 50, 58 but forgets the key and leaves one row blank. Their plot reads: 32 53\,|\,2\ 5; 41 1 44\,|\,1\ 1\ 4; 50 85\,|\,0\ 8. (a) Write a suitable key. (b) State the mode. (c) The student claims the data has a gap. Explain whether this is correct.
Show worked answer →

Part (a): a suitable key is 323\,|\,2 means 3232 (any correct stem-leaf example earns the mark).

Part (b): the mode is 4141, since the leaf 11 repeats in the 4040s row (4141 appears twice) and no other value repeats.

Part (c): there is no genuine gap. Every stem from 33 to 55 has at least one leaf, so the values run fairly continuously from 3232 to 5858 with no empty stretch and no isolated point. The student is incorrect; the data is reasonably evenly spread, not split by a gap.

Markers reward a valid key, the correct mode read from the repeated leaf, and a justified yes/no on the gap that refers to the rows all being populated (no empty stem, no isolated value).

Practice questions

Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.

foundation2 marksA class records the number of siblings each of 1515 students has: 0,1,1,2,2,2,1,0,3,2,1,4,2,1,20, 1, 1, 2, 2, 2, 1, 0, 3, 2, 1, 4, 2, 1, 2. (a) State the most common number of siblings. (b) State how many students have exactly 11 sibling.
Show worked solution →

Tally each value. Counting how many times each number appears:

0: 2,1: 5,2: 6,3: 1,4: 10:\ 2,\quad 1:\ 5,\quad 2:\ 6,\quad 3:\ 1,\quad 4:\ 1

The counts add to 2+5+6+1+1=152 + 5 + 6 + 1 + 1 = 15, which matches the 1515 students, so nothing is missed.

Part (a) - most common value. The tallest count is for 22 siblings (it occurs 66 times), so the most common number is 22.

Part (b) - students with exactly one sibling. The value 11 occurs 55 times, so 55 students have exactly 11 sibling. (On a dot plot these are the heights of the stacks above 22 and above 11.)

foundation3 marksThe dot plot shows the number of goals scored by a netball player in each game of a season. The stacks above the values are: 00 goals - 33 dots; 11 goal - 55 dots; 22 goals - 44 dots; 33 goals - 22 dots; 77 goals - 11 dot. (a) How many games were played? (b) Identify the outlier. (c) Describe the cluster.
Show worked solution →

Part (a) - total games. Each dot is one game, so add the stack heights:

3+5+4+2+1=15 games3 + 5 + 4 + 2 + 1 = 15 \text{ games}

Part (b) - the outlier. Most games sit between 00 and 33 goals, but one lonely dot sits out at 77 goals with a clear gap before it. The outlier is the game of 77 goals.

Part (c) - the cluster. The bulk of the data is bunched between 00 and 33 goals, so there is a cluster from 00 to 33 goals, with most games at 11 goal. (The gap from 33 to 77 is what makes the 77 stand out as the outlier.)

core3 marksThese are the resting heart rates (beats per minute) of 1616 people: 58,62,65,67,71,72,72,74,75,78,80,81,84,85,88,9658, 62, 65, 67, 71, 72, 72, 74, 75, 78, 80, 81, 84, 85, 88, 96. Construct a stem-and-leaf plot using the tens digit as the stem, then state the lowest and highest rates.
Show worked solution →

Choose the stem. The values run from the 5050s to the 9090s, so use the tens digit as the stem and the units digit as the leaf.

Sort each value into its row, leaves in order. Reading along each ten:

Stem | Leaf
  5  | 8
  6  | 2 5 7
  7  | 1 2 2 4 5 8
  8  | 0 1 4 5 8
  9  | 6

A key is essential: 585\,|\,8 means 5858 beats per minute.

Read off the extremes. The first leaf in the top row gives the lowest rate, 5858, and the last leaf in the bottom row gives the highest, 9696. (Check the leaf count: 1+3+6+5+1=161 + 3 + 6 + 5 + 1 = 16, which matches the 1616 people.)

core3 marksA stem-and-leaf plot of test marks (out of 100100) is shown, where 636\,|\,3 means 6363. Stems and leaves: 42 84\,|\,2\ 8; 51 5 5 95\,|\,1\ 5\ 5\ 9; 60 3 3 4 76\,|\,0\ 3\ 3\ 4\ 7; 72 6 87\,|\,2\ 6\ 8; 858\,|\,5. (a) How many students sat the test? (b) State the modal mark. (c) State the range.
Show worked solution →

Part (a) - number of students. Count every leaf:

2+4+5+3+1=15 students2 + 4 + 5 + 3 + 1 = 15 \text{ students}

Part (b) - the modal mark. The mode is the value that appears most often. The leaves show two 55s in the 5050s row (5555 twice) and two 33s in the 6060s row (6363 twice). Both 5555 and 6363 occur twice and no value occurs more, so the data is bimodal with modes 5555 and 6363.

Part (c) - the range. The smallest mark is the first leaf of the top row, 4242, and the largest is the single leaf of the bottom row, 8585:

8542=4385 - 42 = 43

so the range is 4343 marks. (A stem-and-leaf plot keeps every original value, which is why the mode and range can be read straight off it.)

core4 marksTwenty households are surveyed for the number of devices connected to their home internet. The counts are: 2,3,3,4,4,4,5,5,5,5,6,6,6,7,7,8,9,9,14,152, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9, 9, 14, 15. (a) Construct a frequency tally for each value from 22 to 99. (b) Identify any outliers and the gap that separates them. (c) Describe the shape of the main cluster.
Show worked solution →

Part (a) - tally the values 22 to 99. Counting each:

2:1, 3:2, 4:3, 5:4, 6:3, 7:2, 8:1, 9:22:1,\ 3:2,\ 4:3,\ 5:4,\ 6:3,\ 7:2,\ 8:1,\ 9:2

These 1818 values, plus the two large counts 1414 and 1515, give 18+2=2018 + 2 = 20 households.

Part (b) - outliers and the gap. The values 1414 and 1515 sit far above the rest. There is a wide gap from 99 up to 1414 where no household appears, so 1414 and 1515 are outliers separated from the main group by that gap.

Part (c) - shape of the main cluster. Ignoring the two outliers, the counts rise to a peak of 44 at the value 55 and then fall away roughly evenly on each side (1,2,3,4,3,2,1,21, 2, 3, 4, 3, 2, 1, 2). The main cluster is therefore roughly symmetric, mounded around 55 devices. (Plotting these as a dot plot makes the single mound and the two stray dots at 1414 and 1515 easy to see.)

exam5 marksA coach records the number of push-ups completed in one minute by the players in two squads. Junior squad: 18,21,22,24,25,27,28,3118, 21, 22, 24, 25, 27, 28, 31. Senior squad: 26,29,32,33,35,37,38,4126, 29, 32, 33, 35, 37, 38, 41. (a) Construct a back-to-back stem-and-leaf plot using the tens digit as the stem (junior leaves on the left, senior on the right). (b) State the highest score in each squad. (c) Compare the two squads' performance, referring to centre and spread.
Show worked solution →

Part (a) - build the back-to-back plot. Use the tens digit as the shared stem. Junior leaves are written to the LEFT of the stem (read right to left, smallest nearest the stem); senior leaves to the RIGHT (read left to right):

Junior      | Stem |  Senior
          8 |   1  |
8 7 5 4 2 1 |   2  |  6 9
          1 |   3  |  2 3 5 7 8
            |   4  |  1

A key is required: for the seniors 262\,|\,6 means 2626; for the juniors 525\,|\,2 means 2525.

Part (b) - highest in each squad. Reading the largest value in each side, the junior best is 3131 (the lone leaf 11 in the 3030s row, left side) and the senior best is 4141 (the leaf 11 in the 4040s row, right side).

Part (c) - compare centre and spread. Both squads have 88 players. The senior values cluster in the 3030s while the junior values cluster in the 2020s, so the seniors have the higher centre (a higher typical number of push-ups). For spread, the juniors run from 1818 to 3131, a range of 3118=1331 - 18 = 13, and the seniors run from 2626 to 4141, a range of 4126=1541 - 26 = 15, so the spreads are similar with the seniors slightly more spread out. Overall the seniors perform better on average, by a margin of roughly ten push-ups. (Check: 88 leaves on each side, matching the 88 players per squad.)

exam5 marksThe waiting times (in minutes, rounded) at a medical centre are recorded for 2020 patients: 4,5,5,6,6,6,7,7,7,7,8,8,8,9,9,10,11,12,24,264, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 9, 9, 10, 11, 12, 24, 26. (a) Describe the shape of the data and name the type of skew. (b) Identify the outliers. (c) Explain, with reference to the plot, why the mode of 77 minutes is a fairer summary of a typical wait than the mean would be.
Show worked solution →
Part (a) - shape and skew
Tallying the main group: the counts climb to a peak at 77 minutes (which occurs 44 times) and then trail off slowly towards the larger times. A distribution with a short side on the left and a long tail stretching to the right is positively skewed (skewed to the right).
Part (b) - the outliers
The values 2424 and 2626 minutes sit far above the main cluster, separated from it by a long gap after 1212 minutes, so 2424 and 2626 are outliers.
Part (c) - why the mode is fairer here
The mean is pulled upward by the two large outliers (2424 and 2626), so it would report a "typical" wait longer than almost anyone actually experienced. The plot shows the data piled up between 44 and 1212 minutes with a clear peak at 77, so the mode of 77 minutes describes the most common, and most representative, waiting time without being dragged up by the two unusual long waits. (This is the general rule: a long right tail or outliers make the mean an over-estimate of the centre, so a resistant measure is fairer.)

Related dot points