What is not ordering the data first?

The median and quartiles are positions in the ordered list. Working on the raw, unordered list gives nonsense.

NSWMaths AdvancedSyllabus dot point

How do we display a single set of data and summarise its centre, spread and shape?

Display and summarise univariate data using frequency and cumulative-frequency tables, histograms, ogives, the mean and standard deviation, the five-number summary, box plots and outliers

A focused answer to the HSC Maths Advanced dot point on displaying and summarising one-variable data. Frequency and cumulative-frequency tables, histograms and ogives, the mean and standard deviation from data and a frequency table, the median, quartiles and IQR, the five-number summary, box and parallel box plots, the 1.5 IQR outlier rule and shape, with worked examples and traps.

Generated by Claude Opus 4.818 min answerUpdated 2026-06-21

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

What this dot point is asking

NESA wants you to take a single set of numbers (univariate data), display it well, and summarise it numerically. Displaying means frequency and cumulative-frequency tables, histograms and ogives (cumulative-frequency polygons). Summarising means a measure of centre (the mean or the median), a measure of spread (the standard deviation or the interquartile range), and a sense of shape (symmetric or skewed). The five-number summary feeds the box plot, and the $1.5 \times \text{IQR}$ rule flags outliers. The single idea that ties it together is that one variable can be described by where its values sit (centre), how widely they vary (spread), and the overall pattern (shape), and that every display and every statistic is a tool for reading one of those three things.

The answer

Frequency and cumulative-frequency tables

A frequency table lists each value (or score) $x$ alongside its frequency $f$ , the number of times it occurs. The cumulative frequency at a value is the running total of frequencies up to and including that value, so it answers "how many readings are this value or less". The final cumulative frequency must equal $N = \sum f$ , the total number of readings, which is a quick check that the table is right.

Data come in types, and the type decides the display. A variable is categorical if its values are labels (eye colour, suburb), and numeric if its values are numbers. A numeric variable is discrete if its values can be listed ( $0, 1, 2, \dots$ pets), and continuous if it is measured on a scale and can take any value in a range (height, time, mass). Continuous data are always grouped, because almost every measured value is unique.

Here are the numbers of pets owned by $40$ Year 12 students, a discrete variable, as a frequency and cumulative-frequency table.

Pets $x$	Frequency $f$	Cumulative frequency
$0$	$6$	$6$
$1$	$11$	$17$
$2$	$9$	$26$
$3$	$8$	$34$
$4$	$4$	$38$
$5$	$2$	$40$

The cumulative-frequency column rises to $40 = N$ , as it must. Reading it off: $17$ students own at most $1$ pet, and $40 - 26 = 14$ own at least $3$ .

Grouped data, class centres and the modal class

When a variable is continuous, or when a discrete variable takes too many distinct values, group the data into class intervals of equal width. Each class is represented by its class centre, the midpoint of the interval, which stands in for every reading in that class. Grouping trades detail for clarity: you gain a readable overview but you lose the exact values, so any statistic computed from grouped data is an estimate. Never discard the original data.

The class that contains the most readings is the modal class. A boundary value (a reading sitting exactly on a class edge) must be assigned by a stated convention, usually into the upper class, and you note the convention if it matters.

Here are the commute times of $50$ Sydney workers, a continuous variable grouped into $10$ -minute classes. The notation $10$ to $20$ means $10 \le t < 20$ .

Commute (min)	Class centre $x$	Frequency $f$	Cumulative frequency
$10$ to $20$	$15$	$4$	$4$
$20$ to $30$	$25$	$10$	$14$
$30$ to $40$	$35$	$16$	$30$
$40$ to $50$	$45$	$11$	$41$
$50$ to $60$	$55$	$6$	$47$
$60$ to $70$	$65$	$3$	$50$

The modal class is $30$ to $40$ minutes (frequency $16$ ), and the cumulative column reaches $50 = N$ .

Histograms and ogives (cumulative-frequency polygons)

A histogram displays the frequency table as a bar chart in which the bars touch (no gaps), because the horizontal axis is a continuous numeric scale. For grouped data each bar is centred on its class interval; for ungrouped discrete data each bar is centred on the value.

An ogive (cumulative-frequency polygon) is the graph of cumulative frequency against the upper boundary of each class, joined by straight segments and started on the axis at the lower boundary of the first class. It rises from $0$ to $N$ in a characteristic S shape. Its real power is reading positions off it: go up to a cumulative frequency, across to the curve, and down to the value. The median sits at a cumulative frequency of $\frac{N}{2}$ , the lower quartile at $\frac{N}{4}$ and the upper quartile at $\frac{3N}{4}$ .

The diagram overlays the histogram and the ogive for the commute-time data, with frequency on the left axis and cumulative frequency on the right.

Reading the ogive at cumulative frequency $25$ (which is $\frac{N}{2}$ ) gives a median of about $37$ minutes; at $12.5$ ( $\frac{N}{4}$ ) the lower quartile is about $28$ minutes, and at $37.5$ ( $\frac{3N}{4}$ ) the upper quartile is about $47$ minutes.

Mean and standard deviation from data

The mean is the balance point of the data, $\bar{x} = \dfrac{\sum x}{n}$ for a raw list. The standard deviation measures the typical distance of a reading from the mean. In this course you use the population standard deviation, which divides by $n$ :

\sigma = \sqrt{\dfrac{\sum (x - \bar{x})^2}{n}}.

On a calculator this is the $\sigma_n$ (or $\sigma_x$ ) key, not the $s_{n-1}$ (sample) key, and choosing the wrong one is the single most common error here. You are expected to put data into the calculator's statistics mode and read $\bar{x}$ and $\sigma_n$ off it rather than computing the sum of squares by hand, but knowing the formula tells you what the machine is doing.

Mean and standard deviation from a frequency table

When data are tabulated, every value $x$ occurs $f$ times, so weight by frequency. With $N = \sum f$ ,

\bar{x} = \frac{\sum fx}{N}, \qquad \sigma = \sqrt{\frac{\sum fx^2}{N} - \bar{x}^2}.

The second form of the variance, $\dfrac{\sum fx^2}{N} - \bar{x}^2$ , is the practical one: it needs only the two column totals $\sum fx$ and $\sum fx^2$ , with no need to compute each squared deviation. For grouped data, use the class centre as $x$ ; the answer is then an estimate.

For the pet-ownership table above, $N = 40$ , and adding the weighted columns gives $\sum fx = 79$ and $\sum fx^2 = 233$ . So

\bar{x} = \frac{79}{40} = 1.975, \qquad \sigma = \sqrt{\frac{233}{40} - 1.975^2} = \sqrt{5.825 - 3.900625} \approx 1.39 \text{ pets}.

Median, quartiles and the interquartile range

The median $Q_2$ is the middle value once the data are ordered: the middle reading if $n$ is odd, or the average of the two middle readings if $n$ is even. The quartiles split the ordered list into quarters. To find them, split the ordered list at the median into a lower half and an upper half. If $n$ is odd, exclude the middle value from both halves. The lower quartile $Q_1$ is the median of the lower half and the upper quartile $Q_3$ is the median of the upper half.

The interquartile range is $\text{IQR} = Q_3 - Q_1$ , the range of the middle $50\%$ of the data. Unlike the full range (maximum minus minimum), the IQR ignores the extreme values, so it is not distorted by one unusual reading; that is exactly why it pairs with the median for skewed data.

The five-number summary and box plots

The five-number summary is the minimum, $Q_1$ , the median $Q_2$ , $Q_3$ , and the maximum. A box-and-whisker plot (box plot) draws it: a box from $Q_1$ to $Q_3$ with a line at the median, and whiskers reaching out to the extreme values. The box length is the IQR, and the whole plot shows the range.

The diagram below builds the box plot for $20$ daily maximum temperatures recorded at Sydney Observatory Hill (in degrees Celsius): $20, 21, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 25, 25, 25, 26, 34$ . Their five-number summary (excluding the outlier from the whisker) is $20, 22, 23, 24.5, 26$ , with one value, $34$ , plotted separately.

Outliers by the 1.5 IQR rule

An outlier is a reading that sits unusually far from the rest. The standard criterion in this course is based on the IQR: a value is an outlier if it lies more than $1.5 \times \text{IQR}$ below $Q_1$ or more than $1.5 \times \text{IQR}$ above $Q_3$ . The two cut-offs

Q_1 - 1.5 \times \text{IQR} \quad \text{and} \quad Q_3 + 1.5 \times \text{IQR}

are called the lower and upper fences. For the temperature data, $\text{IQR} = 24.5 - 22 = 2.5$ , so the fences are $22 - 1.5(2.5) = 18.25$ and $24.5 + 1.5(2.5) = 28.25$ . The value $34$ exceeds $28.25$ , so it is an outlier; every other reading lies inside the fences. On the box plot an outlier is drawn as a separate dot beyond the whisker, and the whisker is shortened to stop at the most extreme reading that is not an outlier (here $26$ ). An outlier is flagged and discussed, not silently deleted.

Shape and skew

The shape of a distribution is read from the histogram, the box plot, or the relationship between the mean and the median.

A symmetric distribution has the mean and median roughly equal, with the median centred in the box and whiskers of similar length.
A right-skewed (positively skewed) distribution has a long tail of high values, which pulls the mean above the median; the median sits towards the left of the box and the right whisker is longer.
A left-skewed (negatively skewed) distribution has a long low tail, the mean below the median, and a longer left whisker.

The rule of thumb $\text{mean} > \text{median}$ means right skew, $\text{mean} < \text{median}$ means left skew, follows because the mean is dragged towards the long tail while the median is not.

Comparing distributions with parallel box plots

To compare two groups, draw their box plots on a common scale, one above the other: a parallel box plot. You compare centre (median against median), spread (IQR and range), and shape (skew and outliers). Below are daily rainfall totals on $11$ wet days in two suburbs (in millimetres). North is $6, 8, 9, 11, 12, 14, 15, 17, 19, 22, 40$ ; South is $10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21$ .

The two suburbs have almost the same median (North $14$ , South $16$ ) and nearly equal means ( $15.7$ against $15.9$ mm), yet they are very different: North has a wider box ( $\text{IQR} = 10$ against $6$ ), a long right whisker and an outlier at $40$ mm, so its rainfall is more variable and right-skewed, while South is tight and roughly symmetric. This is the whole point of comparing displays: similar centres can hide very different spreads and shapes.

Key fact

Describe one-variable data by centre, spread and shape. Centre: the mean $\bar{x} = \frac{\sum fx}{N}$ or the median. Spread: the population standard deviation $\sigma = \sqrt{\frac{\sum fx^2}{N} - \bar{x}^2}$ or the interquartile range $\text{IQR} = Q_3 - Q_1$ . Shape: symmetric or skewed, read from the histogram, the box plot, or whether the mean sits above ( $\text{right skew}$ ) or below ( $\text{left skew}$ ) the median. The five-number summary (min, $Q_1$ , $Q_2$ , $Q_3$ , max) builds the box plot, and a value is an outlier if it falls beyond $Q_1 - 1.5 \times \text{IQR}$ or $Q_3 + 1.5 \times \text{IQR}$ .

How exam questions ask about univariate statistics

"Complete the frequency / cumulative-frequency table." Add the $f$ column to get $N$ , then accumulate down the column; the last cumulative entry must equal $N$ .
"Find the mean and standard deviation" (raw data or a table). Enter the data in the calculator's statistics mode and read $\bar{x}$ and $\sigma_n$ . For a table, enter the $x$ values against the $f$ values. Use $\sigma_n$ (population), not $s_{n-1}$ .
"Estimate the mean / standard deviation from the grouped data." Use the class centres as $x$ , then proceed as for a frequency table; state that the answer is an estimate.
"Use the ogive to find the median / quartiles." Read across from cumulative frequency $\frac{N}{2}$ , $\frac{N}{4}$ and $\frac{3N}{4}$ , then down to the value.
"Find the five-number summary" or "draw a box plot." Order the data, find the median, then the quartiles of each half. Draw the box from $Q_1$ to $Q_3$ with the median marked.
"Determine whether ... is an outlier." Compute $\text{IQR}$ , then the fences $Q_1 - 1.5\,\text{IQR}$ and $Q_3 + 1.5\,\text{IQR}$ , and compare the value to them.
"Compare the two data sets." Use parallel box plots and compare centre, spread and shape in words, quoting the medians and IQRs.
"Describe the shape / skew." Compare the mean and median, or look at the whisker lengths and the median's position in the box.

Exam technique

Always order the data first; most quartile and median errors come from working on an unordered list. State which statistic you are using and why: the median and IQR for skewed data or data with outliers (they resist extremes), the mean and standard deviation for roughly symmetric data. On the calculator pick $\sigma_n$ , the population standard deviation, not the sample $s_{n-1}$ key, and quote it to the accuracy the question asks (often $2$ decimal places). For grouped data, write the class-centre column explicitly and call your mean and standard deviation estimates. When asked to compare, never just list numbers: write a sentence comparing centre, then spread, then shape, each backed by a value. Test outliers with the fences and, if you find one, draw it as a separate dot and shorten the whisker.

Worked example

These five examples cover the full toolkit: reading a cumulative-frequency table, computing summary statistics from a frequency table, building a five-number summary and box plot, applying the outlier rule, and reading an ogive.

Build a cumulative-frequency table and read it

A survey records the number of siblings of $30$ students: the value $0$ occurs $5$ times, $1$ occurs $11$ times, $2$ occurs $8$ times, $3$ occurs $4$ times and $4$ occurs $2$ times. Build the cumulative-frequency column and state how many students have at most $2$ siblings, and how many have at least $3$ .

Set up the columns: List $x = 0, 1, 2, 3, 4$ with $f = 5, 11, 8, 4, 2$ , and check the total $N = 5 + 11 + 8 + 4 + 2 = 30$ .
Accumulate: The running totals are $5$ , then $5 + 11 = 16$ , then $16 + 8 = 24$ , then $24 + 4 = 28$ , then $28 + 2 = 30$ . The final cumulative frequency is $30 = N$ , which confirms the column.
Read off the answers: "At most $2$ siblings" is the cumulative frequency at $x = 2$ , which is $24$ students. "At least $3$ " is the rest, $30 - 24 = 6$ students.

Mean and standard deviation from a frequency table

For the pet-ownership data ( $x = 0, \dots, 5$ with $f = 6, 11, 9, 8, 4, 2$ , total $N = 40$ ), find the mean and the population standard deviation.

Form the weighted totals: Compute $fx$ for each row: $0, 11, 18, 24, 16, 10$ , which sum to $\sum fx = 79$ . Compute $fx^2$ (that is $x \cdot fx$ ): $0, 11, 36, 72, 64, 50$ , which sum to $\sum fx^2 = 233$ .
Mean: $\bar{x} = \dfrac{\sum fx}{N} = \dfrac{79}{40} = 1.975$ pets.
Standard deviation: Using the practical form of the variance,
$\sigma^2 = \frac{\sum fx^2}{N} - \bar{x}^2 = \frac{233}{40} - 1.975^2 = 5.825 - 3.900625 = 1.924375,$

so $\sigma = \sqrt{1.924375} \approx 1.39$ pets. On a calculator, entering the $x$ list against the $f$ list and reading $\bar{x}$ and $\sigma_n$ returns the same values.

Five-number summary and box plot

Find the five-number summary of the $20$ Sydney temperatures $20, 21, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 25, 25, 25, 26, 34$ and describe what the box plot shows.

Median: The data are already ordered, with $n = 20$ (even), so the median is the average of the $10$ th and $11$ th values, $\frac{23 + 23}{2} = 23$ .
Quartiles: The lower half is the first $10$ values $20, 21, 21, 21, 22, 22, 22, 22, 23, 23$ ; its median (average of its $5$ th and $6$ th, $22$ and $22$ ) is $Q_1 = 22$ . The upper half is the last $10$ values $23, 23, 24, 24, 24, 25, 25, 25, 26, 34$ ; its median (average of $24$ and $25$ ) is $Q_3 = 24.5$ .
Five-number summary: Minimum $20$ , $Q_1 = 22$ , median $23$ , $Q_3 = 24.5$ , maximum $34$ . The box runs from $22$ to $24.5$ (a tight $\text{IQR}$ of $2.5$ ), and the single far value $34$ stretches the maximum well beyond the box, signalling a likely outlier and right skew.

Apply the outlier rule

Using the temperature data above, test whether $34$ is an outlier, and state where the whisker ends.

Interquartile range: From the previous example, $\text{IQR} = Q_3 - Q_1 = 24.5 - 22 = 2.5$ .
Fences: Lower fence $= Q_1 - 1.5 \times \text{IQR} = 22 - 1.5(2.5) = 22 - 3.75 = 18.25$ . Upper fence $= Q_3 + 1.5 \times \text{IQR} = 24.5 + 3.75 = 28.25$ .
Compare: The value $34$ is greater than the upper fence $28.25$ , so $34$ is an outlier. No value is below $18.25$ , so there is no low outlier.
Whisker: With $34$ flagged as an outlier and plotted as a separate dot, the right whisker stops at the largest reading that is not an outlier, which is $26$ .

Read a median and quartiles from an ogive

For the grouped commute-time data ( $N = 50$ , with cumulative frequencies $4, 14, 30, 41, 47, 50$ at the class boundaries $20, 30, 40, 50, 60, 70$ ), estimate the median and the quartiles.

Locate the cumulative-frequency levels: The median is at $\frac{N}{2} = 25$ , the lower quartile at $\frac{N}{4} = 12.5$ and the upper quartile at $\frac{3N}{4} = 37.5$ .
Median: A cumulative frequency of $25$ falls in the $30$ to $40$ class (the cumulative total rises from $14$ to $30$ there), so by linear interpolation
$\text{median} \approx 30 + \frac{25 - 14}{16} \times 10 \approx 36.9 \text{ minutes}.$
Lower quartile: A cumulative frequency of $12.5$ falls in the $20$ to $30$ class (cumulative $4$ to $14$ ), so
$Q_1 \approx 20 + \frac{12.5 - 4}{10} \times 10 = 20 + 8.5 \approx 28.5 \text{ minutes}.$
Upper quartile: A cumulative frequency of $37.5$ falls in the $40$ to $50$ class (cumulative $30$ to $41$ ), so
$Q_3 \approx 40 + \frac{37.5 - 30}{11} \times 10 \approx 46.8 \text{ minutes},$

giving an estimated $\text{IQR} \approx 46.8 - 28.5 = 18.3$ minutes. Reading the same levels straight off the drawn ogive gives the same values to graph accuracy.

Edge cases worth knowing

Population versus sample standard deviation. This course uses $\sigma_n$ , which divides by $n$ . The sample version $s_{n-1}$ divides by $n - 1$ and gives a slightly larger value; selecting it by mistake is the commonest standard-deviation error.
Quartiles when $n$ is odd. Exclude the middle value (the median) from both halves before taking each half's median. Including it shifts the quartiles.
Grouped statistics are estimates. Replacing each reading with its class centre means the grouped mean, standard deviation, median and quartiles only approximate the true values. The finer the classes, the better the estimate, but information is always lost.
The ogive uses $\frac{N}{2}$ , not $\frac{N+1}{2}$ . When reading a median or quartiles off a cumulative-frequency curve you go up to $\frac{N}{2}$ , $\frac{N}{4}$ and $\frac{3N}{4}$ , because the curve treats the data as continuous.
Outliers are flagged, not deleted. Identify an outlier and draw it separately, but only remove it with a stated reason (such as a recording error). A genuine extreme value is still part of the data.
The median resists outliers; the mean does not. Adding one huge value barely moves the median but drags the mean towards it. For skewed data or data with outliers, the median and IQR describe the data more honestly than the mean and standard deviation.

Common traps

Using the sample key $s_{n-1}$: The course requires the population standard deviation $\sigma_n$ (divide by $n$ ). Choosing the $n - 1$ key gives a wrong, slightly larger value.
Not ordering the data first: The median and quartiles are positions in the ordered list. Working on the raw, unordered list gives nonsense.
Mishandling the middle value for quartiles: When $n$ is odd, the median is excluded from both halves before finding $Q_1$ and $Q_3$ . Leaving it in shifts both quartiles.
Forgetting the cumulative column resets nothing: Cumulative frequency is a running total that only ever increases and must finish at $N$ . A cumulative entry that drops, or a final entry that is not $N$ , signals an arithmetic slip.
Treating grouped statistics as exact: Estimates from class centres are approximations. State that they are estimates, and do not claim grouped values equal the true mean or median.
Deleting an outlier automatically: An outlier is identified by the $1.5 \times \text{IQR}$ rule and shown as a separate dot. It is removed only with a justified reason, not by default.

Practice questions

Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.

foundation4 marksA netballer scores these numbers of goals in

12

games:

2, 5, 1, 3, 2, 7, 4, 2, 3, 1, 5, 0

. Find the five-number summary and the interquartile range, and state the shape suggested by the median's position in the box.

Show worked solution →

Order the data: Writing the $12$ scores in increasing order gives $0, 1, 1, 2, 2, 2, 3, 3, 4, 5, 5, 7$ .
Median ( $Q_2$ ): With $n = 12$ (even), the median is the average of the $6$ th and $7$ th values, $\frac{2 + 3}{2} = 2.5$ .
Quartiles: Split into a lower half $0, 1, 1, 2, 2, 2$ and an upper half $3, 3, 4, 5, 5, 7$ . The lower quartile is the median of the lower half, $\frac{1 + 2}{2} = 1.5$ , and the upper quartile is the median of the upper half, $\frac{4 + 5}{2} = 4.5$ .
Five-number summary and IQR: Minimum $0$ , $Q_1 = 1.5$ , median $2.5$ , $Q_3 = 4.5$ , maximum $7$ . So $\text{IQR} = Q_3 - Q_1 = 4.5 - 1.5 = 3$ , and the range is $7 - 0 = 7$ .
Shape: The median $2.5$ sits closer to $Q_1 = 1.5$ than to $Q_3 = 4.5$ , and the upper whisker (to $7$ ) is longer than the lower (to $0$ ), so the distribution is skewed to the right (positively skewed).

core4 marksThe table shows the number of after-school activities done weekly by

25

students. Find the mean and the population standard deviation (to

2

decimal places), and write down the median. | Activities

x

| 5 | 6 | 7 | 8 | 9 | 10 | | --- | --- | --- | --- | --- | --- | --- | | Frequency

f

| 3 | 5 | 8 | 6 | 2 | 1 |

Show worked solution →

Totals from the table. With $N = \sum f = 25$ , build $\sum fx$ and $\sum fx^2$ :

$\sum fx = 5(3) + 6(5) + 7(8) + 8(6) + 9(2) + 10(1) = 15 + 30 + 56 + 48 + 18 + 10 = 177$ .

$\sum fx^2 = 25(3) + 36(5) + 49(8) + 64(6) + 81(2) + 100(1) = 75 + 180 + 392 + 384 + 162 + 100 = 1293$ .

Mean: $\bar{x} = \dfrac{\sum fx}{N} = \dfrac{177}{25} = 7.08$ .
Population standard deviation: $\sigma^2 = \dfrac{\sum fx^2}{N} - \bar{x}^2 = \dfrac{1293}{25} - 7.08^2 = 51.72 - 50.1264 = 1.5936$ , so $\sigma = \sqrt{1.5936} \approx 1.26$ .
Median: The cumulative frequencies are $3, 8, 16, 22, 24, 25$ . With $N = 25$ (odd) the median is the $13$ th value, which falls in the jump from $8$ to $16$ , so the median is $7$ .

In the calculator's statistics mode, enter the $x$ list against the $f$ list and read $\bar{x} = 7.08$ and $\sigma_n \approx 1.26$ directly; the by-hand columns are shown so the method is clear and checkable.

core3 marksTen recent sales in a Sydney suburb had these prices, in thousands of dollars:

640, 680, 710, 720, 750, 760, 790, 820, 850, 1450

. Use the

1.5 \times \text{IQR}

rule to decide whether any price is an outlier.

Show worked solution →

Order and find the quartiles: The data are already increasing. With $n = 10$ (even), the lower half is $640, 680, 710, 720, 750$ and the upper half is $760, 790, 820, 850, 1450$ . Each half has $5$ values, so $Q_1$ is the middle of the lower half, $710$ , and $Q_3$ is the middle of the upper half, $820$ . The median is $\frac{750 + 760}{2} = 755$ .
IQR: $\text{IQR} = Q_3 - Q_1 = 820 - 710 = 110$ (in thousands of dollars).
Fences: Lower fence $Q_1 - 1.5 \times \text{IQR} = 710 - 1.5(110) = 710 - 165 = 545$ . Upper fence $Q_3 + 1.5 \times \text{IQR} = 820 + 165 = 985$ .
Decision: Every value lies inside $[545, 985]$ except $1450$ , which is above the upper fence $985$ . So $1,450,000 is an outlier; the others are not. In a report it would be plotted as a separate dot beyond the right whisker, not dropped, since a genuine high sale still describes the market.

exam6 marksThe masses (kg) of

60

dogs seen at a vet clinic are grouped below. | Mass (kg) |

0

5

5

10

10

15

15

20

20

25

25

30

| | --- | --- | --- | --- | --- | --- | --- | | Frequency | 7 | 15 | 18 | 12 | 6 | 2 | (a) State the modal class. (b) Estimate the mean and population standard deviation (to

2

decimal places). (c) Construct the cumulative-frequency column and estimate the median from it.

Show worked solution →

(a) Modal class. The highest frequency is $18$ , so the modal class is $10$ to $15$ kg.

(b) Class centres and totals. Use the class centres $2.5, 7.5, 12.5, 17.5, 22.5, 27.5$ as the representative value of each class, with $N = 60$ .

$\sum fx = 2.5(7) + 7.5(15) + 12.5(18) + 17.5(12) + 22.5(6) + 27.5(2) = 17.5 + 112.5 + 225 + 210 + 135 + 55 = 755$ .

$\sum fx^2 = 6.25(7) + 56.25(15) + 156.25(18) + 306.25(12) + 506.25(6) + 756.25(2) = 43.75 + 843.75 + 2812.5 + 3675 + 3037.5 + 1512.5 = 11925$ .

Estimated mean $\bar{x} = \dfrac{755}{60} \approx 12.58$ kg. Estimated variance $\sigma^2 = \dfrac{11925}{60} - 12.5833^2 = 198.75 - 158.3403 = 40.4097$ , so estimated $\sigma = \sqrt{40.4097} \approx 6.36$ kg.

(c) Cumulative frequency and median. Accumulating the frequencies at the upper boundaries $5, 10, 15, 20, 25, 30$ gives $7, 22, 40, 52, 58, 60$ . With $N = 60$ , the median is read from the ogive at a cumulative frequency of $\frac{N}{2} = 30$ . That cumulative value falls in the $10$ to $15$ class (cumulative rises from $22$ to $40$ there), so by linear interpolation

\text{median} \approx 10 + \frac{30 - 22}{18} \times 5 = 10 + \frac{8}{18} \times 5 \approx 12.22 \text{ kg}.

The estimates from grouped data are close to the true values but not exact, because grouping replaces each reading with its class centre.

What this dot point is asking

The answer

Frequency and cumulative-frequency tables

Grouped data, class centres and the modal class

Histograms and ogives (cumulative-frequency polygons)

Mean and standard deviation from data

Mean and standard deviation from a frequency table

Median, quartiles and the interquartile range

The five-number summary and box plots

Outliers by the 1.5 IQR rule

Shape and skew

Comparing distributions with parallel box plots

How exam questions ask about univariate statistics

Edge cases worth knowing

Practice questions

Related dot points