How do we display a single set of data and summarise its centre, spread and shape?
Display and summarise univariate data using frequency and cumulative-frequency tables, histograms, ogives, the mean and standard deviation, the five-number summary, box plots and outliers
A focused answer to the HSC Maths Advanced dot point on displaying and summarising one-variable data. Frequency and cumulative-frequency tables, histograms and ogives, the mean and standard deviation from data and a frequency table, the median, quartiles and IQR, the five-number summary, box and parallel box plots, the 1.5 IQR outlier rule and shape, with worked examples and traps.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
What this dot point is asking
NESA wants you to take a single set of numbers (univariate data), display it well, and summarise it numerically. Displaying means frequency and cumulative-frequency tables, histograms and ogives (cumulative-frequency polygons). Summarising means a measure of centre (the mean or the median), a measure of spread (the standard deviation or the interquartile range), and a sense of shape (symmetric or skewed). The five-number summary feeds the box plot, and the rule flags outliers. The single idea that ties it together is that one variable can be described by where its values sit (centre), how widely they vary (spread), and the overall pattern (shape), and that every display and every statistic is a tool for reading one of those three things.
The answer
Frequency and cumulative-frequency tables
A frequency table lists each value (or score) alongside its frequency , the number of times it occurs. The cumulative frequency at a value is the running total of frequencies up to and including that value, so it answers "how many readings are this value or less". The final cumulative frequency must equal , the total number of readings, which is a quick check that the table is right.
Data come in types, and the type decides the display. A variable is categorical if its values are labels (eye colour, suburb), and numeric if its values are numbers. A numeric variable is discrete if its values can be listed ( pets), and continuous if it is measured on a scale and can take any value in a range (height, time, mass). Continuous data are always grouped, because almost every measured value is unique.
Here are the numbers of pets owned by Year 12 students, a discrete variable, as a frequency and cumulative-frequency table.
| Pets | Frequency | Cumulative frequency |
|---|---|---|
The cumulative-frequency column rises to , as it must. Reading it off: students own at most pet, and own at least .
Grouped data, class centres and the modal class
When a variable is continuous, or when a discrete variable takes too many distinct values, group the data into class intervals of equal width. Each class is represented by its class centre, the midpoint of the interval, which stands in for every reading in that class. Grouping trades detail for clarity: you gain a readable overview but you lose the exact values, so any statistic computed from grouped data is an estimate. Never discard the original data.
The class that contains the most readings is the modal class. A boundary value (a reading sitting exactly on a class edge) must be assigned by a stated convention, usually into the upper class, and you note the convention if it matters.
Here are the commute times of Sydney workers, a continuous variable grouped into -minute classes. The notation to means .
| Commute (min) | Class centre | Frequency | Cumulative frequency |
|---|---|---|---|
| to | |||
| to | |||
| to | |||
| to | |||
| to | |||
| to |
The modal class is to minutes (frequency ), and the cumulative column reaches .
Histograms and ogives (cumulative-frequency polygons)
A histogram displays the frequency table as a bar chart in which the bars touch (no gaps), because the horizontal axis is a continuous numeric scale. For grouped data each bar is centred on its class interval; for ungrouped discrete data each bar is centred on the value.
An ogive (cumulative-frequency polygon) is the graph of cumulative frequency against the upper boundary of each class, joined by straight segments and started on the axis at the lower boundary of the first class. It rises from to in a characteristic S shape. Its real power is reading positions off it: go up to a cumulative frequency, across to the curve, and down to the value. The median sits at a cumulative frequency of , the lower quartile at and the upper quartile at .
The diagram overlays the histogram and the ogive for the commute-time data, with frequency on the left axis and cumulative frequency on the right.
Reading the ogive at cumulative frequency (which is ) gives a median of about minutes; at () the lower quartile is about minutes, and at () the upper quartile is about minutes.
Mean and standard deviation from data
The mean is the balance point of the data, for a raw list. The standard deviation measures the typical distance of a reading from the mean. In this course you use the population standard deviation, which divides by :
On a calculator this is the (or ) key, not the (sample) key, and choosing the wrong one is the single most common error here. You are expected to put data into the calculator's statistics mode and read and off it rather than computing the sum of squares by hand, but knowing the formula tells you what the machine is doing.
Mean and standard deviation from a frequency table
When data are tabulated, every value occurs times, so weight by frequency. With ,
The second form of the variance, , is the practical one: it needs only the two column totals and , with no need to compute each squared deviation. For grouped data, use the class centre as ; the answer is then an estimate.
For the pet-ownership table above, , and adding the weighted columns gives and . So
Median, quartiles and the interquartile range
The median is the middle value once the data are ordered: the middle reading if is odd, or the average of the two middle readings if is even. The quartiles split the ordered list into quarters. To find them, split the ordered list at the median into a lower half and an upper half. If is odd, exclude the middle value from both halves. The lower quartile is the median of the lower half and the upper quartile is the median of the upper half.
The interquartile range is , the range of the middle of the data. Unlike the full range (maximum minus minimum), the IQR ignores the extreme values, so it is not distorted by one unusual reading; that is exactly why it pairs with the median for skewed data.
The five-number summary and box plots
The five-number summary is the minimum, , the median , , and the maximum. A box-and-whisker plot (box plot) draws it: a box from to with a line at the median, and whiskers reaching out to the extreme values. The box length is the IQR, and the whole plot shows the range.
The diagram below builds the box plot for daily maximum temperatures recorded at Sydney Observatory Hill (in degrees Celsius): . Their five-number summary (excluding the outlier from the whisker) is , with one value, , plotted separately.
Outliers by the 1.5 IQR rule
An outlier is a reading that sits unusually far from the rest. The standard criterion in this course is based on the IQR: a value is an outlier if it lies more than below or more than above . The two cut-offs
are called the lower and upper fences. For the temperature data, , so the fences are and . The value exceeds , so it is an outlier; every other reading lies inside the fences. On the box plot an outlier is drawn as a separate dot beyond the whisker, and the whisker is shortened to stop at the most extreme reading that is not an outlier (here ). An outlier is flagged and discussed, not silently deleted.
Shape and skew
The shape of a distribution is read from the histogram, the box plot, or the relationship between the mean and the median.
- A symmetric distribution has the mean and median roughly equal, with the median centred in the box and whiskers of similar length.
- A right-skewed (positively skewed) distribution has a long tail of high values, which pulls the mean above the median; the median sits towards the left of the box and the right whisker is longer.
- A left-skewed (negatively skewed) distribution has a long low tail, the mean below the median, and a longer left whisker.
The rule of thumb means right skew, means left skew, follows because the mean is dragged towards the long tail while the median is not.
Comparing distributions with parallel box plots
To compare two groups, draw their box plots on a common scale, one above the other: a parallel box plot. You compare centre (median against median), spread (IQR and range), and shape (skew and outliers). Below are daily rainfall totals on wet days in two suburbs (in millimetres). North is ; South is .
The two suburbs have almost the same median (North , South ) and nearly equal means ( against mm), yet they are very different: North has a wider box ( against ), a long right whisker and an outlier at mm, so its rainfall is more variable and right-skewed, while South is tight and roughly symmetric. This is the whole point of comparing displays: similar centres can hide very different spreads and shapes.
How exam questions ask about univariate statistics
- "Complete the frequency / cumulative-frequency table." Add the column to get , then accumulate down the column; the last cumulative entry must equal .
- "Find the mean and standard deviation" (raw data or a table). Enter the data in the calculator's statistics mode and read and . For a table, enter the values against the values. Use (population), not .
- "Estimate the mean / standard deviation from the grouped data." Use the class centres as , then proceed as for a frequency table; state that the answer is an estimate.
- "Use the ogive to find the median / quartiles." Read across from cumulative frequency , and , then down to the value.
- "Find the five-number summary" or "draw a box plot." Order the data, find the median, then the quartiles of each half. Draw the box from to with the median marked.
- "Determine whether ... is an outlier." Compute , then the fences and , and compare the value to them.
- "Compare the two data sets." Use parallel box plots and compare centre, spread and shape in words, quoting the medians and IQRs.
- "Describe the shape / skew." Compare the mean and median, or look at the whisker lengths and the median's position in the box.
Edge cases worth knowing
- Population versus sample standard deviation. This course uses , which divides by . The sample version divides by and gives a slightly larger value; selecting it by mistake is the commonest standard-deviation error.
- Quartiles when is odd. Exclude the middle value (the median) from both halves before taking each half's median. Including it shifts the quartiles.
- Grouped statistics are estimates. Replacing each reading with its class centre means the grouped mean, standard deviation, median and quartiles only approximate the true values. The finer the classes, the better the estimate, but information is always lost.
- The ogive uses , not . When reading a median or quartiles off a cumulative-frequency curve you go up to , and , because the curve treats the data as continuous.
- Outliers are flagged, not deleted. Identify an outlier and draw it separately, but only remove it with a stated reason (such as a recording error). A genuine extreme value is still part of the data.
- The median resists outliers; the mean does not. Adding one huge value barely moves the median but drags the mean towards it. For skewed data or data with outliers, the median and IQR describe the data more honestly than the mean and standard deviation.
Practice questions
Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.
foundation4 marksA netballer scores these numbers of goals in games: . Find the five-number summary and the interquartile range, and state the shape suggested by the median's position in the box.Show worked solution →
- Order the data
- Writing the scores in increasing order gives .
- Median ()
- With (even), the median is the average of the th and th values, .
- Quartiles
- Split into a lower half and an upper half . The lower quartile is the median of the lower half, , and the upper quartile is the median of the upper half, .
- Five-number summary and IQR
- Minimum , , median , , maximum . So , and the range is .
- Shape
- The median sits closer to than to , and the upper whisker (to ) is longer than the lower (to ), so the distribution is skewed to the right (positively skewed).
core4 marksThe table shows the number of after-school activities done weekly by students. Find the mean and the population standard deviation (to decimal places), and write down the median.
| Activities | 5 | 6 | 7 | 8 | 9 | 10 |
| --- | --- | --- | --- | --- | --- | --- |
| Frequency | 3 | 5 | 8 | 6 | 2 | 1 |Show worked solution →
Totals from the table. With , build and :
.
.
- Mean
- .
- Population standard deviation
- , so .
- Median
- The cumulative frequencies are . With (odd) the median is the th value, which falls in the jump from to , so the median is .
In the calculator's statistics mode, enter the list against the list and read and directly; the by-hand columns are shown so the method is clear and checkable.
core3 marksTen recent sales in a Sydney suburb had these prices, in thousands of dollars: . Use the rule to decide whether any price is an outlier.Show worked solution →
- Order and find the quartiles
- The data are already increasing. With (even), the lower half is and the upper half is . Each half has values, so is the middle of the lower half, , and is the middle of the upper half, . The median is .
- IQR
- (in thousands of dollars).
- Fences
- Lower fence . Upper fence .
- Decision
- Every value lies inside except , which is above the upper fence . So $1,450,000 is an outlier; the others are not. In a report it would be plotted as a separate dot beyond the right whisker, not dropped, since a genuine high sale still describes the market.
exam6 marksThe masses (kg) of dogs seen at a vet clinic are grouped below.
| Mass (kg) | to | to | to | to | to | to |
| --- | --- | --- | --- | --- | --- | --- |
| Frequency | 7 | 15 | 18 | 12 | 6 | 2 |
(a) State the modal class. (b) Estimate the mean and population standard deviation (to decimal places). (c) Construct the cumulative-frequency column and estimate the median from it.Show worked solution →
(a) Modal class. The highest frequency is , so the modal class is to kg.
(b) Class centres and totals. Use the class centres as the representative value of each class, with .
.
.
Estimated mean kg. Estimated variance , so estimated kg.
(c) Cumulative frequency and median. Accumulating the frequencies at the upper boundaries gives . With , the median is read from the ogive at a cumulative frequency of . That cumulative value falls in the to class (cumulative rises from to there), so by linear interpolation
The estimates from grouped data are close to the true values but not exact, because grouping replaces each reading with its class centre.
Related dot points
- Use the normal distribution, z-scores, the empirical rule and the standard normal table to find probabilities and percentiles
A focused answer to the HSC Maths Advanced dot point on the normal distribution. Standardising with z-scores, the 68-95-99.7 empirical rule, computing probabilities as areas under the curve and inverse-normal percentiles, with worked examples and exam traps.
- Use probability density functions and cumulative distribution functions to find probabilities, medians, modes, means and variances of continuous random variables
A focused answer to the HSC Maths Advanced dot point on continuous random variables. Probability density functions, cumulative distribution functions, computing probabilities by integration, and finding mean, median, mode and variance, with worked examples.
- Define a discrete random variable by its probability distribution, and calculate the expected value, variance and standard deviation
A focused answer to the HSC Maths Advanced dot point on discrete random variables. Probability distributions, expected value, variance, standard deviation, and linear transformations of a discrete random variable, with worked examples.
- Construct scatter plots, calculate and interpret Pearson's correlation coefficient, and fit and use the least-squares regression line
A focused answer to the HSC Maths Advanced dot point on bivariate data. Scatter plots, the Pearson correlation coefficient, the least-squares regression line, prediction, and the limits of extrapolation, with worked examples and exam traps.
- Apply tree diagrams, conditional probability, independence and complementary events to solve multi-step probability problems
A focused answer to the HSC Maths Advanced probability dot point. Sample spaces and the basic rules, complementary 'at least one' events, the addition rule, independent versus dependent events, multiplying along tree branches, conditional probability and Bayes-style reverse reasoning, with worked examples drawn from recent HSC questions.