← Unit 4

VICMath MethodsSyllabus dot point

What is a sample proportion, and what is the sampling distribution of $\hat{p}$ for repeated samples from a population?

The sample proportion $\hat{p}$ as a random variable, the sampling distribution of $\hat{p}$ for repeated samples of size $n$ from a population with true proportion $p$, and the normal approximation for large $n$

A focused answer to the VCE Math Methods Unit 4 key-knowledge point on the sample proportion. Defines $\hat{p}$ as a random variable, gives its mean and standard deviation, sets out the normal-approximation conditions, and works through a Paper 2 estimation question.

Generated by Claude OpusReviewed by Better Tuition Academy9 min answer

Have a quick question? Jump to the Q&A page

What this dot point is asking

VCAA wants you to treat the sample proportion p^\hat{p} as a random variable, identify the mean and standard deviation of its sampling distribution, and apply the normal approximation to compute sample-proportion probabilities. The dot point is the statistical-inference precursor to confidence intervals.

What is a sample proportion

Suppose a population has a true proportion pp of "successes" (members with some characteristic: voters for party A, defective items, smokers, opinion-poll affirmatives). A random sample of nn items is drawn, and the number of successes in the sample is recorded as XX.

The sample proportion is:

p^=Xn\hat{p} = \frac{X}{n}

Because XX is random (depends on which nn items happen to be sampled), p^\hat{p} is a random variable. It varies from sample to sample.

The sampling distribution of IMATH_11

Repeatedly drawing samples of size nn from the same population and computing p^\hat{p} each time produces a distribution of p^\hat{p}-values: the sampling distribution of p^\hat{p}.

Two facts about this distribution:

Mean of IMATH_16

E(p^)=pE(\hat{p}) = p

The expected value of the sample proportion equals the population proportion. The sample proportion is an unbiased estimator of pp.

Standard deviation of IMATH_18

SD(p^)=p(1βˆ’p)n\text{SD}(\hat{p}) = \sqrt{\frac{p (1 - p)}{n}}

Two interpretations:

  • The standard deviation falls as n\sqrt{n}. Quadrupling the sample size halves the standard deviation.
  • The standard deviation is largest when p=0.5p = 0.5. At p=0p = 0 or p=1p = 1, SD(p^\hat{p}) = 0 (no variability because every sample has the same proportion).

Conditions for the formula

The formula assumes:

  • Independence. Each sample item is drawn independently. In practice, this requires either sampling with replacement, or sampling from a population large enough that each draw does not materially change the remaining proportion (typically, the population should be at least 10 times the sample size).
  • Identical distribution. Each sampled item has the same probability pp of being a success.

These are the conditions of the binomial distribution: X∼Bin(n,p)X \sim \text{Bin}(n, p).

The normal approximation

For large nn, the sampling distribution of p^\hat{p} is approximately normal:

p^β‰ˆN(p,p(1βˆ’p)n)\hat{p} \approx N\left( p, \frac{p (1 - p)}{n} \right)

(Equivalently, p^\hat{p} has mean pp and standard deviation p(1βˆ’p)/n\sqrt{p(1-p)/n}.)

When is "large nn" large enough?

Standard conditions (VCAA cites both):

  • IMATH_32
  • IMATH_33

Some texts use 55 or 1515 as the threshold; VCAA accepts any reasonable convention. The conditions ensure the binomial is well-approximated by the normal.

Why the normal approximation works

X∼Bin(n,p)X \sim \text{Bin}(n, p) for large nn is approximately N(np,np(1βˆ’p))N(n p, n p (1 - p)) by the central limit theorem. Dividing by nn gives p^=X/n\hat{p} = X / n approximately N(p,p(1βˆ’p)/n)N(p, p(1-p)/n).

Computing sample-proportion probabilities

To find P(p^≀c)P(\hat{p} \leq c), P(p^β‰₯c)P(\hat{p} \geq c) or P(a≀p^≀b)P(a \leq \hat{p} \leq b):

  1. Verify the normal approximation conditions (npβ‰₯10n p \geq 10 and n(1βˆ’p)β‰₯10n (1 - p) \geq 10).
  2. State the approximate distribution: p^β‰ˆN(p,p(1βˆ’p)/n)\hat{p} \approx N(p, p(1-p)/n).
  3. Standardise: z=p^βˆ’pp(1βˆ’p)/nz = \frac{\hat{p} - p}{\sqrt{p(1-p)/n}}.
  4. Compute the probability using calculator (normCdf) or table.

Worked example

A factory produces 60 percent of items meeting specification. A sample of n=150n = 150 items is taken. Find the probability that the sample proportion meeting spec is at least 0.55.

Mean: E(p^)=0.60E(\hat{p}) = 0.60.

SD: 0.6Γ—0.4/150=0.0016=0.04\sqrt{0.6 \times 0.4 / 150} = \sqrt{0.0016} = 0.04.

Conditions: np=90β‰₯10n p = 90 \geq 10, n(1βˆ’p)=60β‰₯10n (1-p) = 60 \geq 10. Normal approximation valid.

Standardise: z=(0.55βˆ’0.60)/0.04=βˆ’1.25z = (0.55 - 0.60) / 0.04 = -1.25.

P(p^β‰₯0.55)=P(Zβ‰₯βˆ’1.25)β‰ˆ0.8944P(\hat{p} \geq 0.55) = P(Z \geq -1.25) \approx 0.8944.

Sampling distribution shape and the central limit theorem

For small nn, the sampling distribution of p^\hat{p} is discrete (taking only values 0,1/n,2/n,…,10, 1/n, 2/n, \ldots, 1) and may be skewed if pp is far from 0.5. As nn grows, the distribution becomes both more concentrated (smaller SD) and more bell-shaped (better normal approximation). This is the central limit theorem in action.

VCE Methods does not require formal statement of the CLT, but the underlying intuition is the reason the normal approximation works.

Common errors

Confusing p^\hat{p} with pp. pp is the (unknown) population proportion; p^\hat{p} is the random sample-based estimate. Different objects with different statistical roles.

Wrong formula for SD. p(1βˆ’p)/n\sqrt{p (1 - p) / n}, not p(1βˆ’p)n\sqrt{p (1 - p) n} or p/n\sqrt{p / n}. The factor of nn is in the denominator.

Using sample SD without checking conditions. If npn p or n(1βˆ’p)n(1-p) is too small (below 10 by VCAA convention), the normal approximation is unreliable and the binomial distribution is needed instead.

Forgetting that p^\hat{p} is a random variable. Treating p^\hat{p} as a fixed number ignores the entire point of the sampling distribution. Probabilities require treating it as random.

Using p^\hat{p} instead of pp in the SD formula. When the true population proportion pp is known, use pp. When pp is unknown (confidence-interval setting), substitute p^\hat{p} as an estimate.

Calculator without set-up. Paper 2 expects the explicit standardisation set-up. A naked normCdf call without the distribution statement loses set-up marks.

In one sentence

The sample proportion p^=X/n\hat{p} = X/n is a random variable whose sampling distribution has mean pp (the true population proportion) and standard deviation p(1βˆ’p)/n\sqrt{p(1-p)/n}; for large nn (with npβ‰₯10n p \geq 10 and n(1βˆ’p)β‰₯10n(1-p) \geq 10), the sampling distribution is approximately normal, so probabilities about p^\hat{p} can be computed by standardising to Z=(p^βˆ’p)/p(1βˆ’p)/nZ = (\hat{p} - p) / \sqrt{p(1-p)/n}.

Past exam questions, worked

Real questions from past VCAA papers on this dot point, with our answer explainer.

2024 VCAA Paper 24 marksA large city's voter list has 40 percent supporters of party A. A random sample of 200 voters is selected. (a) State the mean and standard deviation of the sample proportion $\hat{p}$ of supporters of party A. (b) Use the normal approximation to estimate the probability that $\hat{p}$ is greater than 0.45.
Show worked answer β†’

(a) Mean and standard deviation.

E(p^)=p=0.40E(\hat{p}) = p = 0.40.

SD(p^)=p(1βˆ’p)n=0.4Γ—0.6200=0.24200=0.0012β‰ˆ0.0346\text{SD}(\hat{p}) = \sqrt{\frac{p(1 - p)}{n}} = \sqrt{\frac{0.4 \times 0.6}{200}} = \sqrt{\frac{0.24}{200}} = \sqrt{0.0012} \approx 0.0346.

(b) Probability via normal approximation. With np=80n p = 80 and n(1βˆ’p)=120n(1-p) = 120 both larger than 10, the normal approximation is valid.

p^∼N(0.40,0.03462)\hat{p} \sim N(0.40, 0.0346^2) approximately.

Standardise: z=0.45βˆ’0.400.0346β‰ˆ1.4434z = \frac{0.45 - 0.40}{0.0346} \approx 1.4434.

P(p^>0.45)=P(Z>1.4434)β‰ˆ0.0745P(\hat{p} > 0.45) = P(Z > 1.4434) \approx 0.0745 (from normCdf).

So approximately 0.07450.0745 or 7.45 percent.

Markers reward the formula for SD(p^\hat{p}), checking the normal-approximation conditions, the standardisation, and a final probability with sensible decimal places.

2023 VCAA Paper 23 marksIn a population, 30 percent of items are defective. A sample of $n$ items is taken. (a) Find the smallest value of $n$ for which the standard deviation of $\hat{p}$ is less than 0.02. (b) State an assumption needed for the formula for SD($\hat{p}$) to apply.
Show worked answer β†’

(a) Find nn.

SD(p^)=p(1βˆ’p)n=0.3Γ—0.7n=0.21n\text{SD}(\hat{p}) = \sqrt{\frac{p (1 - p)}{n}} = \sqrt{\frac{0.3 \times 0.7}{n}} = \sqrt{\frac{0.21}{n}}.

Set 0.21n<0.02\sqrt{\frac{0.21}{n}} < 0.02. Square both sides: 0.21n<0.0004\frac{0.21}{n} < 0.0004. So n>0.210.0004=525n > \frac{0.21}{0.0004} = 525.

The smallest integer is n=526n = 526.

(b) Assumption. The sample must be a simple random sample (each item chosen independently) from a population large enough that drawing one item does not materially change the proportion remaining. Equivalent assumption: the sample is drawn with replacement, or the sample size is much smaller than the population.

Markers reward the SD formula, the algebraic manipulation to isolate nn, and an independence / random-sample condition.

Related dot points