Skip to main content
VICMath MethodsSyllabus dot point

What is a sample proportion, and what is the sampling distribution of p^\hat{p} for repeated samples from a population?

The sample proportion p^\hat{p} as a random variable, the sampling distribution of p^\hat{p} for repeated samples of size nn from a population with true proportion pp, and the normal approximation for large nn

A focused answer to the VCE Math Methods Unit 4 key-knowledge point on the sample proportion. Defines p^\hat{p} as a random variable, gives its mean and standard deviation, sets out the normal-approximation conditions, and works through a Paper 2 estimation question.

Generated by Claude Opus 4.89 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. What is a sample proportion
  3. The sampling distribution of p^\hat{p}
  4. The normal approximation
  5. Computing sample-proportion probabilities
  6. Sampling distribution shape and the central limit theorem
  7. Examples in context
  8. Try this

What this dot point is asking

VCAA wants you to treat the sample proportion p^\hat{p} as a random variable, identify the mean and standard deviation of its sampling distribution, and apply the normal approximation to compute sample-proportion probabilities. The dot point is the statistical-inference precursor to confidence intervals.

What is a sample proportion

Suppose a population has a true proportion pp of "successes" (members with some characteristic: voters for party A, defective items, smokers, opinion-poll affirmatives). A random sample of nn items is drawn, and the number of successes in the sample is recorded as XX.

The sample proportion is:

p^=Xn\hat{p} = \frac{X}{n}

Because XX is random (depends on which nn items happen to be sampled), p^\hat{p} is a random variable. It varies from sample to sample.

The sampling distribution of p^\hat{p}

Repeatedly drawing samples of size nn from the same population and computing p^\hat{p} each time produces a distribution of p^\hat{p}-values: the sampling distribution of p^\hat{p}.

Two facts about this distribution:

Mean of p^\hat{p}

E(p^)=pE(\hat{p}) = p

The expected value of the sample proportion equals the population proportion. The sample proportion is an unbiased estimator of pp.

Standard deviation of p^\hat{p}

SD(p^)=p(1βˆ’p)n\text{SD}(\hat{p}) = \sqrt{\frac{p (1 - p)}{n}}

Two interpretations:

  • The standard deviation falls as n\sqrt{n}. Quadrupling the sample size halves the standard deviation.
  • The standard deviation is largest when p=0.5p = 0.5. At p=0p = 0 or p=1p = 1, SD(p^\hat{p}) = 0 (no variability because every sample has the same proportion).

Conditions for the formula

The formula assumes:

  • Independence. Each sample item is drawn independently. In practice, this requires either sampling with replacement, or sampling from a population large enough that each draw does not materially change the remaining proportion (typically, the population should be at least 10 times the sample size).
  • Identical distribution. Each sampled item has the same probability pp of being a success.

These are the conditions of the binomial distribution: X∼Bin(n,p)X \sim \text{Bin}(n, p).

The normal approximation

For large nn, the sampling distribution of p^\hat{p} is approximately normal:

p^β‰ˆN(p,p(1βˆ’p)n)\hat{p} \approx N\left( p, \frac{p (1 - p)}{n} \right)

(Equivalently, p^\hat{p} has mean pp and standard deviation p(1βˆ’p)/n\sqrt{p(1-p)/n}.)

When is "large nn" large enough?

Standard conditions (VCAA cites both):

  • npβ‰₯10n p \geq 10
  • n(1βˆ’p)β‰₯10n (1 - p) \geq 10

Some texts use 55 or 1515 as the threshold; VCAA accepts any reasonable convention. The conditions ensure the binomial is well-approximated by the normal.

Why the normal approximation works

X∼Bin(n,p)X \sim \text{Bin}(n, p) for large nn is approximately N(np,np(1βˆ’p))N(n p, n p (1 - p)) by the central limit theorem. Dividing by nn gives p^=X/n\hat{p} = X / n approximately N(p,p(1βˆ’p)/n)N(p, p(1-p)/n).

Computing sample-proportion probabilities

To find P(p^≀c)P(\hat{p} \leq c), P(p^β‰₯c)P(\hat{p} \geq c) or P(a≀p^≀b)P(a \leq \hat{p} \leq b):

  1. Verify the normal approximation conditions (npβ‰₯10n p \geq 10 and n(1βˆ’p)β‰₯10n (1 - p) \geq 10).
  2. State the approximate distribution: p^β‰ˆN(p,p(1βˆ’p)/n)\hat{p} \approx N(p, p(1-p)/n).
  3. Standardise: z=p^βˆ’pp(1βˆ’p)/nz = \frac{\hat{p} - p}{\sqrt{p(1-p)/n}}.
  4. Compute the probability using calculator (normCdf) or table.

Worked example

A factory produces 60 percent of items meeting specification. A sample of n=150n = 150 items is taken. Find the probability that the sample proportion meeting spec is at least 0.55.

Mean: E(p^)=0.60E(\hat{p}) = 0.60.

SD: 0.6Γ—0.4/150=0.0016=0.04\sqrt{0.6 \times 0.4 / 150} = \sqrt{0.0016} = 0.04.

Conditions: np=90β‰₯10n p = 90 \geq 10, n(1βˆ’p)=60β‰₯10n (1-p) = 60 \geq 10. Normal approximation valid.

Standardise: z=(0.55βˆ’0.60)/0.04=βˆ’1.25z = (0.55 - 0.60) / 0.04 = -1.25.

P(p^β‰₯0.55)=P(Zβ‰₯βˆ’1.25)β‰ˆ0.8944P(\hat{p} \geq 0.55) = P(Z \geq -1.25) \approx 0.8944.

Sampling distribution shape and the central limit theorem

For small nn, the sampling distribution of p^\hat{p} is discrete (taking only values 0,1/n,2/n,…,10, 1/n, 2/n, \ldots, 1) and may be skewed if pp is far from 0.5. As nn grows, the distribution becomes both more concentrated (smaller SD) and more bell-shaped (better normal approximation). This is the central limit theorem in action.

VCE Methods does not require formal statement of the CLT, but the underlying intuition is the reason the normal approximation works.

Examples in context

Example 1. Opinion poll variability. In a region 55%55\% favour a proposal (p=0.55p = 0.55). A poll of n=400n = 400 people gives a sample proportion p^\hat p with mean E(p^)=0.55E(\hat p) = 0.55 and standard deviation 0.55Γ—0.45400=0.000619=0.0249\sqrt{\frac{0.55 \times 0.45}{400}} = \sqrt{0.000619} = 0.0249. To find the chance the poll shows under 50%50\% support: standardise z=0.50βˆ’0.550.0249=βˆ’2.01z = \frac{0.50 - 0.55}{0.0249} = -2.01, so P(p^<0.50)=P(Z<βˆ’2.01)β‰ˆ0.022P(\hat p < 0.50) = P(Z < -2.01) \approx 0.022.

Example 2. Choosing a sample size for precision. A manufacturer with a 20%20\% return rate (p=0.2p = 0.2) wants SD(p^)≀0.025\text{SD}(\hat p) \le 0.025. Solving 0.2Γ—0.8n≀0.025\sqrt{\frac{0.2 \times 0.8}{n}} \le 0.025 gives 0.16n≀0.000625\frac{0.16}{n} \le 0.000625, so nβ‰₯256n \ge 256. A sample of at least 256256 products achieves the target precision.

Try this

Q1. A population has p=0.3p = 0.3. For a sample of n=100n = 100, find the mean and standard deviation of p^\hat{p}. [2 marks]

  • Cue. E(p^)=0.3E(\hat p) = 0.3; SD(p^)=0.3Γ—0.7100=0.0021=0.0458\text{SD}(\hat p) = \sqrt{\frac{0.3 \times 0.7}{100}} = \sqrt{0.0021} = 0.0458.

Q2. For p=0.5p = 0.5 and n=100n = 100, find P(p^>0.6)P(\hat{p} > 0.6) using the normal approximation. [3 marks]

  • Cue. SD=0.25100=0.05\text{SD} = \sqrt{\frac{0.25}{100}} = 0.05; z=0.6βˆ’0.50.05=2z = \frac{0.6 - 0.5}{0.05} = 2; P(Z>2)β‰ˆ0.0228P(Z > 2) \approx 0.0228.

Q3. Find the smallest nn so that SD(p^)<0.03\text{SD}(\hat{p}) < 0.03 when p=0.4p = 0.4. [3 marks]

  • Cue. 0.24n<0.0009β‡’n>266.7\frac{0.24}{n} < 0.0009 \Rightarrow n > 266.7, so n=267n = 267.

Exam-style practice questions

Practice questions written in the style of VCAA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2024 VCAA Paper 24 marksA large city's voter list has 40 percent supporters of party A. A random sample of 200 voters is selected. (a) State the mean and standard deviation of the sample proportion p^\hat{p} of supporters of party A. (b) Use the normal approximation to estimate the probability that p^\hat{p} is greater than 0.45.
Show worked answer β†’

(a) Mean and standard deviation.

E(p^)=p=0.40E(\hat{p}) = p = 0.40.

SD(p^)=p(1βˆ’p)n=0.4Γ—0.6200=0.24200=0.0012β‰ˆ0.0346\text{SD}(\hat{p}) = \sqrt{\frac{p(1 - p)}{n}} = \sqrt{\frac{0.4 \times 0.6}{200}} = \sqrt{\frac{0.24}{200}} = \sqrt{0.0012} \approx 0.0346.

(b) Probability via normal approximation. With np=80n p = 80 and n(1βˆ’p)=120n(1-p) = 120 both larger than 10, the normal approximation is valid.

p^∼N(0.40,0.03462)\hat{p} \sim N(0.40, 0.0346^2) approximately.

Standardise: z=0.45βˆ’0.400.0346β‰ˆ1.4434z = \frac{0.45 - 0.40}{0.0346} \approx 1.4434.

P(p^>0.45)=P(Z>1.4434)β‰ˆ0.0745P(\hat{p} > 0.45) = P(Z > 1.4434) \approx 0.0745 (from normCdf).

So approximately 0.07450.0745 or 7.45 percent.

Markers reward the formula for SD(p^\hat{p}), checking the normal-approximation conditions, the standardisation, and a final probability with sensible decimal places.

2023 VCAA Paper 23 marksIn a population, 30 percent of items are defective. A sample of nn items is taken. (a) Find the smallest value of nn for which the standard deviation of p^\hat{p} is less than 0.02. (b) State an assumption needed for the formula for SD(p^\hat{p}) to apply.
Show worked answer β†’

(a) Find nn.

SD(p^)=p(1βˆ’p)n=0.3Γ—0.7n=0.21n\text{SD}(\hat{p}) = \sqrt{\frac{p (1 - p)}{n}} = \sqrt{\frac{0.3 \times 0.7}{n}} = \sqrt{\frac{0.21}{n}}.

Set 0.21n<0.02\sqrt{\frac{0.21}{n}} < 0.02. Square both sides: 0.21n<0.0004\frac{0.21}{n} < 0.0004. So n>0.210.0004=525n > \frac{0.21}{0.0004} = 525.

The smallest integer is n=526n = 526.

(b) Assumption. The sample must be a simple random sample (each item chosen independently) from a population large enough that drawing one item does not materially change the proportion remaining. Equivalent assumption: the sample is drawn with replacement, or the sample size is much smaller than the population.

Markers reward the SD formula, the algebraic manipulation to isolate nn, and an independence / random-sample condition.

Related dot points