← Unit 4: Further calculus and statistical inference

QLDMath MethodsSyllabus dot point

Topic 3: Continuous random variables, the normal distribution, and statistical inference

Apply the sampling distribution of the sample proportion $\hat{p}$ (mean $p$, standard deviation $\sqrt{p(1-p)/n}$) and construct approximate confidence intervals $\hat{p} \pm z^* \sqrt{\hat{p}(1-\hat{p})/n}$ for a population proportion

A focused answer to the QCE Maths Methods Unit 4 dot point on sample proportions and confidence intervals. The sampling distribution of $\hat{p}$, the normal approximation, the CI formula with standard $z^*$ values, and worked Paper 2 / PSMT examples.

Generated by Claude OpusReviewed by Better Tuition Academy9 min answer

Have a quick question? Jump to the Q&A page

What this dot point is asking

QCAA wants you to treat the sample proportion p^\hat{p} as a random variable, apply the normal approximation to its sampling distribution, construct confidence intervals for a population proportion, and interpret the interval correctly. The dot point bridges Unit 3 binomial probability with Unit 4 statistical inference, and is heavily examined in PSMT and EA.

Sample proportion

If a population has true proportion pp of "successes" and a random sample of nn is drawn with XX successes, the sample proportion is:

p^=Xn\hat{p} = \frac{X}{n}

Because XX is random, p^\hat{p} is a random variable: it varies from sample to sample.

Sampling distribution of IMATH_10

Mean. E(p^)=pE(\hat{p}) = p. The sample proportion is an unbiased estimator of pp.

Standard deviation. SD(p^)=p(1βˆ’p)n\text{SD}(\hat{p}) = \sqrt{\frac{p(1-p)}{n}}.

Two takeaways:

  • SD falls as n\sqrt{n}: quadruple nn to halve SD.
  • SD is maximised at p=0.5p = 0.5; minimised at p=0p = 0 or p=1p = 1.

Normal approximation

For large nn:

p^β‰ˆN(p,p(1βˆ’p)n)\hat{p} \approx N\left(p, \frac{p(1-p)}{n}\right)

Conditions for the approximation (QCAA convention):

  • IMATH_20
  • IMATH_21

When these conditions hold, p^\hat{p} is approximately normal with mean pp and SD p(1βˆ’p)/n\sqrt{p(1-p)/n}. Standardise via Z=(p^βˆ’p)/p(1βˆ’p)/nZ = (\hat{p} - p)/\sqrt{p(1-p)/n} to compute probabilities.

Confidence intervals

A confidence interval for a population proportion combines:

  • The point estimate p^\hat{p} (centre).
  • The standard error: p^(1βˆ’p^)/n\sqrt{\hat{p}(1-\hat{p})/n} (using p^\hat{p} in place of unknown pp).
  • The critical value zβˆ—z^* for the confidence level.

The formula:

p^Β±zβˆ—p^(1βˆ’p^)n\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}

The margin of error is MoE=zβˆ—Γ—SE\text{MoE} = z^* \times \text{SE}.

Standard zβˆ—z^* values

Level IMATH_33
90% 1.6449 (round to 1.645)
95% 1.9600 (round to 1.96)
99% 2.5758 (round to 2.58)

Interpretation

A C%C \% confidence interval has the long-run interpretation:

Approximately C%C \% of intervals constructed by this procedure across repeated samples would contain the true population proportion.

This is NOT:

  • "There is a C%C \% probability that pp is in this interval." (Once the interval is constructed, pp either is or is not in it.)
  • "C%C \% of the population have proportions in this interval." (The interval is about the parameter, not about individuals.)

The correct language refers to the long-run success rate of the procedure.

Sample size design

To achieve a margin of error at most EE at C%C \% confidence:

nβ‰₯(zβˆ—)2p^(1βˆ’p^)E2n \geq \frac{(z^*)^2 \hat{p}(1-\hat{p})}{E^2}

If p^\hat{p} is unknown in advance, use p^=0.5\hat{p} = 0.5 (worst case) for a conservative design.

Always round up to the next integer (cannot sample a fractional person).

Worked example

A study estimates the proportion of adults who exercise daily. A random sample of n=400n = 400 adults gives 120 who do.

p^=120/400=0.30\hat{p} = 120/400 = 0.30.

SE: 0.30Γ—0.70/400=0.000525β‰ˆ0.0229\sqrt{0.30 \times 0.70 / 400} = \sqrt{0.000525} \approx 0.0229.

For 95 percent: zβˆ—=1.96z^* = 1.96. Margin: 1.96Γ—0.0229β‰ˆ0.04491.96 \times 0.0229 \approx 0.0449.

CI: (0.30βˆ’0.045,0.30+0.045)=(0.255,0.345)(0.30 - 0.045, 0.30 + 0.045) = (0.255, 0.345).

Interpretation: approximately 95 percent of intervals constructed by this procedure would contain the true population proportion of adults who exercise daily.

Trade-offs

Confidence vs precision. Higher confidence (99 percent) requires a wider interval. Lower confidence (90 percent) gives a narrower interval. To improve both, increase nn.

Sample size economics. Doubling nn reduces SE by a factor of 2\sqrt{2}. Quadrupling nn halves SE. Diminishing returns above nβ‰ˆ1000n \approx 1000 for opinion polling.

When the normal approximation fails

For very small samples or proportions near 0 or 1, the approximation can be poor. QCAA conventions (npβ‰₯10n p \geq 10 and n(1βˆ’p)β‰₯10n(1-p) \geq 10) ensure validity. Outside these conditions, an exact (binomial-based) interval would be needed, beyond Methods scope.

Common errors

Wrong zβˆ—z^* for the level. 1.96 for 95 percent; 1.645 for 90 percent. Mixing them up gives wrong widths.

Probability misinterpretation. "The probability pp is in this interval" is wrong. Use long-run-procedure language.

Sample size not rounded up. n=384.1n = 384.1 becomes n=385n = 385, not 384.

**pp vs p^\hat{p} in formulas.** In the sampling distribution SD: use pp (when known). In the CI standard error: use p^\hat{p} (when pp unknown).

Forgetting worst-case p=0.5p = 0.5. When no prior estimate of pp exists for sample-size design, use p=0.5p = 0.5 to maximise p(1βˆ’p)=0.25p(1-p) = 0.25 for the most conservative nn.

In one sentence

The sample proportion p^=X/n\hat{p} = X/n has mean pp and SD p(1βˆ’p)/n\sqrt{p(1-p)/n}, and for large nn (with npβ‰₯10np \geq 10 and n(1βˆ’p)β‰₯10n(1-p) \geq 10) is approximately normal; an approximate C%C \% confidence interval for the population proportion is p^Β±zβˆ—p^(1βˆ’p^)/n\hat{p} \pm z^* \sqrt{\hat{p}(1-\hat{p})/n} with zβˆ—=1.645,1.96,2.58z^* = 1.645, 1.96, 2.58 for the 90, 95, 99 percent levels, interpreted as the long-run procedure containing the true pp in approximately C%C \% of repeated samples.

Past exam questions, worked

Real questions from past QCAA papers on this dot point, with our answer explainer.

2024 QCAA-style P25 marksA poll of 600 voters found 354 supported candidate X. (a) Compute a 95 percent confidence interval for the true proportion. (b) Interpret the interval. (c) Find the smallest sample size needed to achieve a 95 percent confidence interval of half-width 0.02, assuming the true proportion is around 0.5.
Show worked answer β†’

(a) Confidence interval.

p^=354/600=0.59\hat{p} = 354/600 = 0.59.

SE: 0.59Γ—0.41/600=0.000403β‰ˆ0.0201\sqrt{0.59 \times 0.41 / 600} = \sqrt{0.000403} \approx 0.0201.

zβˆ—=1.96z^* = 1.96 for 95 percent.

Margin: 1.96Γ—0.0201β‰ˆ0.03941.96 \times 0.0201 \approx 0.0394.

CI: (0.59βˆ’0.0394,0.59+0.0394)=(0.5506,0.6294)(0.59 - 0.0394, 0.59 + 0.0394) = (0.5506, 0.6294), approximately (0.551,0.629)(0.551, 0.629).

(b) Interpretation. If many similar samples of 600 voters were drawn and 95 percent confidence intervals were constructed from each, approximately 95 percent of these intervals would contain the true population proportion of voters supporting candidate X. (Avoid: "there is a 95 percent probability the true proportion is in this interval", which is incorrect.)

(c) Sample size. Margin: zβˆ—p(1βˆ’p)/n≀0.02z^* \sqrt{p(1-p)/n} \leq 0.02. With worst-case p=0.5p = 0.5:

1.960.25/n≀0.021.96 \sqrt{0.25 / n} \leq 0.02

0.25/n≀0.02/1.96β‰ˆ0.0102\sqrt{0.25 / n} \leq 0.02 / 1.96 \approx 0.0102

0.25/n≀0.0001040.25 / n \leq 0.000104

nβ‰₯0.25/0.000104β‰ˆ2401n \geq 0.25 / 0.000104 \approx 2401.

Smallest n=2401n = 2401.

Markers reward correct SE formula, correct zβˆ—z^* for 95 percent, the correct interpretation language, and rounding up to integer for sample size.

Related dot points