How does the sample proportion behave as a random variable, and how do we use its normal approximation?
Use the sample proportion as a random variable, with mean , variance and standard deviation , and apply its normal approximation
A focused answer to the HSC Maths Extension 1 dot point on sample proportions. What is, why it is a random variable, its mean , variance and standard deviation , the normal approximation , computing probabilities about , the effect of sample size, and expected-range reasoning for polling and quality control.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
What this dot point is asking
NESA wants you to treat the sample proportion as a random variable in its own right, know its mean , its variance and its standard deviation , and use the normal approximation to compute probabilities about how close a survey's estimate is likely to be to the true value. This is the one slice of the statistics chapter that is uniquely Extension 1, and it is where binomial theory turns into real polling and quality-control reasoning.
The answer
What a sample proportion is
Run a binomial experiment: independent Bernoulli trials, each a success with probability . Let count the successes. The population proportion is a fixed (usually unknown) number, for example the true fraction of Australians who support a policy. The sample proportion
is the fraction of your sample that were successes. If you poll voters and say they support a party, your sample proportion is , an estimate of the true .
The crucial idea is that is itself a random variable. Run the survey again with a fresh random sample and you get a different , hence a different . So has its own distribution, mean and spread, and those are exactly what tell us how trustworthy a single survey is.
Why the mean is and the variance is
These are not new results to memorise blindly; they fall straight out of the binomial mean and variance you already know, and , using the scaling rules and with the constant :
The mean result, , is the whole reason surveys work: on average the sample proportion equals the population proportion, so is an unbiased estimator of . The variance result is the whole reason large samples are better: the spread carries a , so it shrinks as grows. Note the dimension check too, is a proportion (a number between and ), so its SD is also a small fraction, never the large count-sized spread that has.
The distribution of is the binomial, restretched
Because , each value of corresponds to exactly one value of and carries the same probability:
So the probability graph of is just the probability graph of with the horizontal axis relabelled (squashed from the integers onto the fractions ). Nothing about the probabilities changes, only the scale of the axis. That is why every binomial tool you have still applies: to find a probability about exactly, convert it into the matching statement about and sum binomial terms; to find it quickly for large , use the normal approximation below.
The normal approximation
For large the binomial is well approximated by a normal distribution (the central limit theorem). Dividing through by carries that approximation over to : it becomes approximately normal with the same mean and variance we just derived,
The validity conditions are the same , rule used for approximating itself, because and have identical probabilities. The picture below is the sampling distribution of for a national poll with true support and : a bell curve centred exactly on the true value , with standard deviation . The shaded band is the central standard deviations, the "expected range" almost every poll will fall inside.
The effect of sample size
The mean of is always , no matter the sample size, so a bigger sample does not move the centre, it sharpens it. Since
the spread is inversely proportional to . To halve the standard deviation you must quadruple the sample. The overlay below fixes and stacks the sampling distributions for , and : each fourfold jump in halves the standard deviation (from to to ) and the curve becomes correspondingly taller and tighter around .
This is the link to confidence-style "expected range" reasoning. Because is roughly normal, about of the time a single survey's lands within standard deviations of (more precisely SD). Read backwards, that interval is the "margin of error": if a poll of reports , you can say the true value is very likely within , i.e. roughly . The smaller you want that margin, the larger has to be, and because of the , shrinking the margin by half costs four times the sample.
Exact versus approximate
For small , do not reach for the normal curve, the values of are too few and chunky. Instead convert the question about into the matching question about and sum binomial terms exactly. For large (both and ), the normal approximation replaces a long sum with one or two -lookups. A continuity correction of (the half-step between adjacent values is ) can be applied, but for the large samples typical of polling it is negligible and is usually dropped, which is the convention this page follows.
How exam questions ask about sample proportions
The wording is the tell. Map the phrase to the move:
- "Write down / find the sample proportion": just compute from the given count. A mark opener.
- "State the mean and standard deviation of ": quote and (note the square root, and that the SD is a small fraction, not ).
- "Show that / ": derive from , using and the scaling rules , .
- "Use the normal approximation to find " or "estimate the probability the sample proportion is between ...": confirm and , write , standardise with , look up.
- "within of the true value": this is , a symmetric interval ; standardise both ends and use .
- "What sample size is needed so the estimate is within ... with probability ...": set and solve for , using the worst case if is unknown.
Practice questions
Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.
foundation3 marksA market-research firm surveys randomly chosen Sydney commuters and finds that used a train at least once last week. Write down the sample proportion . If the true population proportion is , state the mean, variance and standard deviation of for a sample of this size.Show worked solution →
- Sample proportion
- With successes out of ,
- Mean
- .
- Variance
- With ,
- Standard deviation
- .
So this one survey produced an estimate of , and the estimator is centred on the true value with a spread of about .
foundation3 marksA fair coin is tossed times and is the proportion of heads. List the possible values of , then find its mean and standard deviation.Show worked solution →
- Possible values
- can be , so takes the values
- Mean
- .
- Standard deviation
- With and ,
Note how few values has and how large the spread is: with only trials a single survey tells you very little about .
core4 marksA streaming service estimates that a new show has a true national audience share of . A ratings panel of households is sampled. Using the normal approximation to , estimate the probability that the panel's sample proportion lies between and . Use and .Show worked solution →
Set up the model. Here , , . Check the conditions: and , so the normal approximation is valid.
Parameters of the approximating normal.
So .
Standardise both endpoints.
Read the probability.
That uses , so
Answer. About , so roughly an chance the panel's share lands in .
core4 marksA bottling plant claims its true defective rate is . A quality inspector samples bottles. Find the standard deviation of the sample proportion , then use the normal approximation to estimate the probability that the inspector observes a sample defective rate greater than . Use .Show worked solution →
- Conditions
- and , so the approximation is valid.
- Standard deviation
- With ,
- Standardise
- For ,
Read the probability.
Answer. About . So even though the claimed rate is only , there is roughly a chance a clean batch shows a sample rate above purely by sampling variation, worth remembering before raising an alarm.
exam5 marksA polling company wants to estimate the proportion of voters supporting a referendum. It requires a chance that its sample proportion falls within of the true value . Taking the worst case , and using for the central of a normal distribution, find the smallest sample size the company should use.Show worked solution →
- Translate the requirement
- "Within of with probability " means the half-width of the central interval of must be at most :
- Substitute the worst case
- The product is largest at , where . Using gives the most demanding (largest) , which is safe for any true :
- Solve for
- Square both sides:
- Answer
- The company needs voters (round up to guarantee the bound). This is the textbook "margin of error " sample size for a national poll, and it explains why such polls quote samples of roughly to people.
exam5 marksTwo opinion polls estimate the same true support level . Poll A samples voters; Poll B samples . (a) Compare the standard deviations of for the two polls. (b) Using the normal approximation, estimate for each poll the probability that falls within of . Use and .Show worked solution →
(a) Standard deviations. With , so .
Because and the SD has in the denominator, quadrupling the sample halves the standard deviation: .
(b) Probability within for Poll A.
Probability within for Poll B.
Answer. Poll A has about a chance of landing within of the truth; Poll B about . Halving the standard deviation sharply tightens the estimate, which is why larger samples are worth the cost.
Related dot points
- Define the binomial distribution , state its probability mass function, and find its mean and variance
A focused answer to the HSC Maths Extension 1 dot point on the binomial distribution. The pmf , mean , variance , and standard situations that fit the model.
- Use the normal approximation to approximate binomial probabilities for large
A focused answer to the HSC Maths Extension 1 dot point on the normal approximation of the binomial. The rule of thumb and , continuity correction, standardising and computing approximate probabilities, with worked examples.
- Compute exact probabilities for the binomial distribution including , , , and use complementary counting
A focused answer to the HSC Maths Extension 1 dot point on computing binomial probabilities. Exact pmf values, cumulative sums, complements (at least, at most), and standard problem patterns, with worked examples.
- Define a Bernoulli random variable, compute its mean and variance, and recognise scenarios that fit the model
A focused answer to the HSC Maths Extension 1 dot point on Bernoulli trials. The definition, mean , variance , and the role of Bernoulli trials as the building block of the binomial distribution.