How is the sample mean distributed across repeated samples, and why does the central limit theorem make it approximately normal?
The distribution of the sample mean as a random variable, its mean and standard deviation (the standard error), the effect of sample size, and the central limit theorem giving the approximate normality of for large samples
A focused answer to the VCE Specialist Mathematics Unit 4 key-knowledge point on the distribution of the sample mean. Its mean and standard error, the effect of sample size, and the central limit theorem, with a verified worked example.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
VCAA wants you to understand that the sample mean is itself a random variable with its own distribution, to know its mean and standard deviation (the standard error), to see how increasing the sample size sharpens that distribution, and to state and use the central limit theorem that makes approximately normal for large samples. This underpins confidence intervals and hypothesis testing.
The sample mean is a random variable
Take a random sample of size from a population. The sample mean
depends on which sample is drawn, so it varies from sample to sample: it is a random variable with its own distribution, called the sampling distribution of the mean.
Mean and standard error
Suppose the population has mean and standard deviation , and the observations are independent. Using the rules for linear combinations of independent random variables:
The mean of equals the population mean, so is an unbiased estimator of . Its standard deviation, called the standard error, is , smaller than the population standard deviation by a factor of .
The effect of sample size
Because the standard error is , larger samples give a sample mean clustered more tightly around . The shrinking is by , not : to halve the standard error you must quadruple the sample size. This diminishing return is why very precise estimates need large samples.
The central limit theorem
The central limit theorem states that, for a sufficiently large sample size , the distribution of the sample mean is approximately normal:
regardless of the shape of the population distribution. So even if the population is skewed or discrete, the sample mean of a large sample behaves normally. This is what lets us use normal-based methods (confidence intervals, hypothesis tests) for the mean. If the population is itself normal, is exactly normal for any .
Examples in context
Example 1. Quadrupling from to changes the standard error from to , halving it.
Example 2. For a normal population is exactly normal for any , so no large-sample approximation is needed.
Try this
Q1. State the mean and standard error of for , , . [2 marks]
- Cue. Mean , standard error .
Q2. By what factor must increase to halve the standard error? [1 mark]
- Cue. A factor of .
Q3. State the central limit theorem in one sentence. [2 marks]
- Cue. For large , is approximately normal with mean and standard error , whatever the population shape.
Exam-style practice questions
Practice questions written in the style of VCAA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
2025 VCAA1 marksThe volume of water, V mL, consumed by a student during a school day is normally distributed with a mean of 1000 mL and a standard deviation of 80 mL. Write down the mean and standard deviation of the sampling distribution for the average volume of water consumed by randomly selected samples of 25 students. Give your answers in millilitres.
Show worked answer →
For the sample mean X-bar of a sample of size n drawn from a population with mean mu and standard deviation sigma, the sampling distribution has mean mu and standard deviation (the standard error) sigma / sqrt(n).
Here mu = 1000, sigma = 80 and n = 25.
Mean of the sampling distribution: E(X-bar) = mu = 1000 mL.
Standard deviation of the sampling distribution: sigma / sqrt(n) = 80 / sqrt(25) = 80 / 5 = 16 mL.
So the average volume for samples of 25 has mean 1000 mL and standard deviation 16 mL.
2023 VCAA2 marksThe waiting time, Xw minutes, for a train is normally distributed with a mean of 8 minutes and a standard deviation of 3 minutes. The probability that, for 12 randomly chosen work days, the average waiting time is between 7 minutes 45 seconds and 8 minutes 30 seconds is equivalent to Pr(a < Z < b), where Z ~ N(0, 1). Find the values of a and b.
Show worked answer →
The sample mean of n = 12 waiting times is approximately normal with mean 8 and standard error sigma / sqrt(n) = 3 / sqrt(12) = sqrt(3) / 2 (about 0.8660).
Convert each bound to a z-score using z = (x - mu) / standard error. First write the times as decimals: 7 minutes 45 seconds = 7.75 minutes and 8 minutes 30 seconds = 8.5 minutes.
Lower bound: a = (7.75 - 8) / (sqrt(3) / 2) = (-0.25) / 0.8660 = -sqrt(3) / 6, which is about -0.289.
Upper bound: b = (8.5 - 8) / (sqrt(3) / 2) = 0.5 / 0.8660 = sqrt(3) / 3, which is about 0.577.
So a = -sqrt(3) / 6 (about -0.289) and b = sqrt(3) / 3 (about 0.577).
2023 VCAA1 marksA company accountant knows that the amount owed on any individual unpaid invoice is normally distributed with a mean of 200. What is the probability, correct to three decimal places, that in a random sample of 16 unpaid invoices the total amount owed is more than $13 500? A. 0.087 B. 0.191 C. 0.413 D. 0.587 E. 0.809
Show worked answer →
The answer is B.
The total T of 16 independent invoices is normally distributed. Its mean is 16 . 800 = 12 800, and its variance is 16 . 200^2 = 640 000, so its standard deviation is sqrt(640 000) = 800.
Standardise the value 13 500: z = (13 500 - 12 800) / 800 = 700 / 800 = 0.875.
Then Pr(T > 13 500) = Pr(Z > 0.875) = 1 - 0.8092 = 0.191 (to three decimal places), which is option B.