Year 12: Statistical Analysis

NSWMaths AdvancedSyllabus dot point

How do we use the normal distribution and z-scores to compute probabilities and compare observations?

Use the normal distribution, z-scores, the empirical rule and the standard normal table to find probabilities and percentiles

A focused answer to the HSC Maths Advanced dot point on the normal distribution. Standardising with z-scores, the 68-95-99.7 empirical rule, computing probabilities and inverse-normal percentiles, with worked examples and exam traps.

Generated by Claude OpusReviewed by Better Tuition Academy9 min answer

Have a quick question? Jump to the Q&A page

What this dot point is asking

NESA wants you to standardise normally distributed data using z-scores, apply the 6868-9595-99.799.7 empirical rule, find probabilities and percentiles using the standard normal, and interpret z-scores when comparing observations from different distributions.

The answer

The normal distribution

A continuous random variable XX is normally distributed with mean μ\mu and standard deviation σ\sigma, written XN(μ,σ2)X \sim N(\mu, \sigma^2), if its pdf is

f(x)=1σ2πexp((xμ)22σ2).f(x) = \frac{1}{\sigma \sqrt{2 \pi}} \exp\left( -\frac{(x - \mu)^2}{2 \sigma^2} \right).

Key features:

  • The graph is a bell curve symmetric about x=μx = \mu.
  • IMATH_11 is the mean, median and mode.
  • IMATH_12 controls the spread: larger σ\sigma gives a flatter, wider curve.
  • The total area under the curve is 11.

The case μ=0\mu = 0, σ=1\sigma = 1 is the standard normal, denoted ZN(0,1)Z \sim N(0, 1).

z-scores

The z-score of a value xx measures how many standard deviations it is from the mean:

z=xμσ.z = \frac{x - \mu}{\sigma}.

If XN(μ,σ2)X \sim N(\mu, \sigma^2), then Z=XμσN(0,1)Z = \frac{X - \mu}{\sigma} \sim N(0, 1). Standardising turns any normal calculation into one about the standard normal.

z-scores let you compare observations from different distributions on the same scale. A higher z-score is "further above the mean in standard deviation units".

The empirical rule (68-95-99.7)

For any normal distribution,

  • about 68%68\% of values lie within 11 standard deviation of the mean (μ±σ\mu \pm \sigma),
  • about 95%95\% within 22 standard deviations (μ±2σ\mu \pm 2 \sigma),
  • about 99.7%99.7\% within 33 standard deviations (μ±3σ\mu \pm 3 \sigma).

By symmetry, P(0Z1)0.34P(0 \le Z \le 1) \approx 0.34, P(0Z2)0.475P(0 \le Z \le 2) \approx 0.475, P(0Z3)0.4985P(0 \le Z \le 3) \approx 0.4985.

Tail probabilities are the complement: P(Z>1)0.16P(Z > 1) \approx 0.16, P(Z>2)0.025P(Z > 2) \approx 0.025, P(Z>3)0.0015P(Z > 3) \approx 0.0015.

Computing probabilities

For XN(μ,σ2)X \sim N(\mu, \sigma^2) and a<ba < b:

P(aXb)=P(aμσZbμσ).P(a \le X \le b) = P\left( \frac{a - \mu}{\sigma} \le Z \le \frac{b - \mu}{\sigma} \right).

In the exam, the empirical rule covers the common endpoints. For other endpoints, use the standard normal table or the calculator's normalcdf function.

Inverse problems (percentiles)

To find the value xx such that P(Xx)=pP(X \le x) = p, find the corresponding zz from a table or invNorm, then transform: x=μ+zσx = \mu + z \sigma. The 90th percentile of ZZ is z1.28z \approx 1.28, the 95th is z1.645z \approx 1.645, the 97.5th is z1.96z \approx 1.96.

Worked examples

Direct use of the empirical rule

Heights of adult males in a city are normally distributed with μ=175\mu = 175 cm and σ=7\sigma = 7 cm. About what percentage of men are taller than 189189 cm?

z=1891757=2z = \frac{189 - 175}{7} = 2. So we need P(Z>2)0.025P(Z > 2) \approx 0.025, or 2.5%2.5\%.

Two-sided interval

For the same distribution, what percentage are between 168168 and 182182 cm?

These are μ±σ\mu \pm \sigma, so by the empirical rule about 68%68\%.

Mixed empirical-rule interval

For XN(50,100)X \sim N(50, 100) (so σ=10\sigma = 10), find P(30<X<60)P(30 < X < 60).

z1=305010=2z_1 = \frac{30 - 50}{10} = -2, z2=605010=1z_2 = \frac{60 - 50}{10} = 1.

P(2Z1)=P(2Z0)+P(0Z1)0.475+0.34=0.815P(-2 \le Z \le 1) = P(-2 \le Z \le 0) + P(0 \le Z \le 1) \approx 0.475 + 0.34 = 0.815.

Comparing two distributions with z-scores

Two students sit different tests. Alex scores 8282 on a test with μ=70\mu = 70, σ=8\sigma = 8. Sam scores 7575 on a test with μ=60\mu = 60, σ=10\sigma = 10. Who performed better relative to their cohort?

Alex: z=82708=1.5z = \frac{82 - 70}{8} = 1.5. Sam: z=756010=1.5z = \frac{75 - 60}{10} = 1.5.

Same z-score, so they performed equally well relative to their cohorts.

Inverse normal

For XN(100,225)X \sim N(100, 225) (so σ=15\sigma = 15), find the value below which 95%95\% of the data lies.

The 95th percentile of ZZ is z1.645z \approx 1.645, so x=100+1.64515124.7x = 100 + 1.645 \cdot 15 \approx 124.7.

Common traps

Standardising with the wrong sign. z=xμσz = \frac{x - \mu}{\sigma}. A value below the mean has a negative z-score. Do not drop the sign.

Confusing σ\sigma and σ2\sigma^2. N(μ,σ2)N(\mu, \sigma^2) uses the variance, but the empirical rule and z-score use σ\sigma. If a question gives σ2=25\sigma^2 = 25, then σ=5\sigma = 5.

Forgetting symmetry. P(Z1)=P(Z1)0.84P(Z \ge -1) = P(Z \le 1) \approx 0.84. Use the symmetry of the bell curve rather than computing tails twice.

Adding empirical rule pieces incorrectly. P(1Z1)0.68P(-1 \le Z \le 1) \approx 0.68 is for the full two-sided interval. The one-sided half is 0.682=0.34\frac{0.68}{2} = 0.34. Do not double-count the central area.

Applying the empirical rule to non-normal data. The 6868-9595-99.799.7 rule is specific to the normal distribution. For other shapes you must use other methods.

In one sentence

For XN(μ,σ2)X \sim N(\mu, \sigma^2), standardise with z=(xμ)/σz = (x - \mu) / \sigma to convert to the standard normal, then use the empirical rule, a table, or normalcdf or invNorm on a calculator to find probabilities and percentiles.

Past exam questions, worked

Real questions from past NESA papers on this dot point, with our answer explainer.

2022 HSC Q294 marksTest marks are normally distributed with mean $70$ and standard deviation $8$. Find the probability that a randomly chosen student scores between $62$ and $86$.
Show worked answer →

Standardise the endpoints with z=xμσz = \frac{x - \mu}{\sigma}.

z1=62708=1z_1 = \frac{62 - 70}{8} = -1, z2=86708=2z_2 = \frac{86 - 70}{8} = 2.

By the empirical rule, P(1Z1)0.68P(-1 \le Z \le 1) \approx 0.68, so P(0Z1)0.34P(0 \le Z \le 1) \approx 0.34. Similarly P(0Z2)0.475P(0 \le Z \le 2) \approx 0.475.

P(1Z2)=P(1Z0)+P(0Z2)0.34+0.475=0.815P(-1 \le Z \le 2) = P(-1 \le Z \le 0) + P(0 \le Z \le 2) \approx 0.34 + 0.475 = 0.815.

Markers reward correct standardisation, splitting the interval at 00, and applying the empirical rule values cleanly. A calculator's normalcdf gives 0.81860.8186 as a precise answer.

2021 HSC Q283 marksA continuous variable is normally distributed with mean $\mu = 100$ and standard deviation $\sigma = 15$. Approximately what percentage of values lie between $85$ and $115$? Between $70$ and $130$?
Show worked answer →

8585 and 115115 are one standard deviation either side of the mean, so by the empirical rule about 68%68\% of values lie in this range.

7070 and 130130 are two standard deviations either side, so about 95%95\% of values lie in this range.

Markers expect explicit identification of how many σ\sigma from the mean each endpoint is, and the corresponding empirical rule percentage.

Related dot points