What is a sample proportion, and what is the sampling distribution of $\hat{p}$ for repeated samples from a population?
The sample proportion $\hat{p}$ as a random variable, the sampling distribution of $\hat{p}$ for repeated samples of size $n$ from a population with true proportion $p$, and the normal approximation for large $n$
A focused answer to the VCE Math Methods Unit 4 key-knowledge point on the sample proportion. Defines $\hat{p}$ as a random variable, gives its mean and standard deviation, sets out the normal-approximation conditions, and works through a Paper 2 estimation question.
Have a quick question? Jump to the Q&A page
What this dot point is asking
VCAA wants you to treat the sample proportion as a random variable, identify the mean and standard deviation of its sampling distribution, and apply the normal approximation to compute sample-proportion probabilities. The dot point is the statistical-inference precursor to confidence intervals.
What is a sample proportion
Suppose a population has a true proportion of "successes" (members with some characteristic: voters for party A, defective items, smokers, opinion-poll affirmatives). A random sample of items is drawn, and the number of successes in the sample is recorded as .
The sample proportion is:
Because is random (depends on which items happen to be sampled), is a random variable. It varies from sample to sample.
The sampling distribution of IMATH_11
Repeatedly drawing samples of size from the same population and computing each time produces a distribution of -values: the sampling distribution of .
Two facts about this distribution:
Mean of IMATH_16
The expected value of the sample proportion equals the population proportion. The sample proportion is an unbiased estimator of .
Standard deviation of IMATH_18
Two interpretations:
- The standard deviation falls as . Quadrupling the sample size halves the standard deviation.
- The standard deviation is largest when . At or , SD() = 0 (no variability because every sample has the same proportion).
Conditions for the formula
The formula assumes:
- Independence. Each sample item is drawn independently. In practice, this requires either sampling with replacement, or sampling from a population large enough that each draw does not materially change the remaining proportion (typically, the population should be at least 10 times the sample size).
- Identical distribution. Each sampled item has the same probability of being a success.
These are the conditions of the binomial distribution: .
The normal approximation
For large , the sampling distribution of is approximately normal:
(Equivalently, has mean and standard deviation .)
When is "large " large enough?
Standard conditions (VCAA cites both):
- IMATH_32
- IMATH_33
Some texts use or as the threshold; VCAA accepts any reasonable convention. The conditions ensure the binomial is well-approximated by the normal.
Why the normal approximation works
for large is approximately by the central limit theorem. Dividing by gives approximately .
Computing sample-proportion probabilities
To find , or :
- Verify the normal approximation conditions ( and ).
- State the approximate distribution: .
- Standardise: .
- Compute the probability using calculator (normCdf) or table.
Worked example
A factory produces 60 percent of items meeting specification. A sample of items is taken. Find the probability that the sample proportion meeting spec is at least 0.55.
Mean: .
SD: .
Conditions: , . Normal approximation valid.
Standardise: .
.
Sampling distribution shape and the central limit theorem
For small , the sampling distribution of is discrete (taking only values ) and may be skewed if is far from 0.5. As grows, the distribution becomes both more concentrated (smaller SD) and more bell-shaped (better normal approximation). This is the central limit theorem in action.
VCE Methods does not require formal statement of the CLT, but the underlying intuition is the reason the normal approximation works.
Common errors
Confusing with . is the (unknown) population proportion; is the random sample-based estimate. Different objects with different statistical roles.
Wrong formula for SD. , not or . The factor of is in the denominator.
Using sample SD without checking conditions. If or is too small (below 10 by VCAA convention), the normal approximation is unreliable and the binomial distribution is needed instead.
Forgetting that is a random variable. Treating as a fixed number ignores the entire point of the sampling distribution. Probabilities require treating it as random.
Using instead of in the SD formula. When the true population proportion is known, use . When is unknown (confidence-interval setting), substitute as an estimate.
Calculator without set-up. Paper 2 expects the explicit standardisation set-up. A naked normCdf call without the distribution statement loses set-up marks.
In one sentence
The sample proportion is a random variable whose sampling distribution has mean (the true population proportion) and standard deviation ; for large (with and ), the sampling distribution is approximately normal, so probabilities about can be computed by standardising to .
Past exam questions, worked
Real questions from past VCAA papers on this dot point, with our answer explainer.
2024 VCAA Paper 24 marksA large city's voter list has 40 percent supporters of party A. A random sample of 200 voters is selected. (a) State the mean and standard deviation of the sample proportion $\hat{p}$ of supporters of party A. (b) Use the normal approximation to estimate the probability that $\hat{p}$ is greater than 0.45.Show worked answer β
(a) Mean and standard deviation.
.
.
(b) Probability via normal approximation. With and both larger than 10, the normal approximation is valid.
approximately.
Standardise: .
(from normCdf).
So approximately or 7.45 percent.
Markers reward the formula for SD(), checking the normal-approximation conditions, the standardisation, and a final probability with sensible decimal places.
2023 VCAA Paper 23 marksIn a population, 30 percent of items are defective. A sample of $n$ items is taken. (a) Find the smallest value of $n$ for which the standard deviation of $\hat{p}$ is less than 0.02. (b) State an assumption needed for the formula for SD($\hat{p}$) to apply.Show worked answer β
(a) Find .
.
Set . Square both sides: . So .
The smallest integer is .
(b) Assumption. The sample must be a simple random sample (each item chosen independently) from a population large enough that drawing one item does not materially change the proportion remaining. Equivalent assumption: the sample is drawn with replacement, or the sample size is much smaller than the population.
Markers reward the SD formula, the algebraic manipulation to isolate , and an independence / random-sample condition.
Related dot points
- The normal distribution with mean $\mu$ and standard deviation $\sigma$, the standard normal $Z$, the use of the empirical 68/95/99.7 rule, and computation of normal probabilities and inverse probabilities using technology or standard tables
A focused answer to the VCE Math Methods Unit 4 key-knowledge point on the normal distribution. The pdf, the standardisation transformation $Z = (X - \mu)/\sigma$, the empirical rule, and the inverse-probability technique. Includes worked Paper 2 examples and standard CAS workflows.
- Approximate confidence intervals for a population proportion $p$ based on the sample proportion $\hat{p}$, including the standard 90, 95 and 99 percent intervals and their interpretation
A focused answer to the VCE Math Methods Unit 4 key-knowledge point on confidence intervals. The formula, the standard $z^*$ values for 90, 95 and 99 percent intervals, the correct interpretation language, and the relationship between sample size, margin of error and confidence level.
- Continuous random variables, their probability density functions, cumulative distribution functions, expected value (mean), variance and standard deviation, and computation of probabilities as definite integrals
A focused answer to the VCE Math Methods Unit 4 key-knowledge point on continuous random variables. Defines the probability density function and cumulative distribution function, computes mean and variance as definite integrals, and works through the conditions a pdf must satisfy and the standard Paper 2 set-up questions.