What is not checking the strata add to the sample size?

Always confirm the parts total the required sample; if they do not, the fraction or the arithmetic is wrong.

NSWMaths Standard 2Syllabus dot point

How are data collected and sampled so that the results fairly represent the whole population?

Investigate sampling techniques, including census, simple random, systematic and stratified sampling, and identify the target population and sources of bias in data collection

A focused answer to the HSC Maths Standard 2 dot point on data collection and sampling. Census versus sample, defining the target population, simple random, systematic and stratified sampling, choosing a sample size, designing a stratified sample with correct proportions, and spotting bias in a survey, with worked Australian examples.

Generated by Claude Opus 4.814 min answerUpdated 2026-06-21

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

What this dot point is asking

NESA wants you to understand how data is gathered before any of it is graphed or summarised, because a conclusion is only as trustworthy as the data behind it. You need to decide when to survey the whole population (a census) and when to survey only part of it (a sample), name and apply the three sampling methods in the syllabus (simple random, systematic and stratified), choose and justify a sample size, and recognise the ways a sample can become biased so that it no longer represents the population. The arithmetic is light - mostly finding a fraction of each group for a stratified sample - so most of the marks reward clear reasoning: identifying the target population, justifying census versus sample, and explaining exactly why a particular method is or is not fair.

The answer

Almost every study starts with one decision: do you collect data from everyone, or from a representative few? A census collects data from every member of the population. A sample collects data from a chosen subset and uses it to estimate what the whole population is like. The group you ultimately want your results to describe is the target population, and the whole point of sampling is to pick a sample that mirrors that population, so that what is true of the sample is also true of the population.

Census versus sample

A census surveys the entire population. It gives an exact answer with no sampling error, but it is only practical when the population is small and easy to reach. The Australian Bureau of Statistics runs a national Census every five years, which shows both the value (a complete picture) and the cost (it is a huge, expensive national exercise).

A sample surveys only part of the population. You use a sample when:

the population is too large to survey everyone (for example, all Australian voters),
a census would be too expensive or too slow, or
the testing is destructive, so testing every item would destroy the whole product (matches, light globes, crash-tested cars).

The trade-off is sampling error: because you did not ask everyone, the sample result is an estimate, not the exact truth. A good sampling method and a large enough sample keep that error small.

The three sampling methods

NESA names three ways to choose a sample. Each aims to be fair, but they differ in how they do it.

Simple random sampling. Every member of the population has an equal chance of being chosen, like drawing names from a hat. In practice you number every member and use a random number generator. It is fair and unbiased, but you need a full list of the population, and by chance it may under-represent a small subgroup.
Systematic sampling. You order the population, choose a random starting point, then select every $k$ th member. The step $k$ is the population size divided by the sample size. For a sample of $50$ from $1000$ people, $k = 1000 \div 50 = 20$ , so you take every $20$ th person. It is quick and easy, but it fails if the list has a repeating pattern that lines up with $k$ .
Stratified sampling. You split the population into strata (non-overlapping groups such as year levels, age bands or suburbs), then take the same fraction from each stratum, so every group is represented in its correct proportion. This is the fairest method when the population has clear subgroups of different sizes, but it needs you to know the size of each stratum.

Designing a stratified sample

Stratified sampling is the method NESA examines most, because it involves a calculation. The idea is simple: each stratum should contribute to the sample in the same proportion as it appears in the population. The tool is the sampling fraction:

\text{sampling fraction} = \frac{\text{sample size}}{\text{population size}}.

Multiply each stratum's size by this fraction to get the number to take from that stratum. The diagram below traces the whole process - the population splits into strata, and each stratum passes the same fraction of itself into the sample.

The check at the end is the habit to build: the numbers you take from the strata must add up to the sample size you wanted. If they do not, recompute the fraction.

Sample size

A bigger sample generally gives a more reliable estimate, because random sampling error shrinks as the sample grows. But a bigger sample also costs more time and money, so you balance accuracy against cost. There is no single "correct" size in Standard 2; what matters is that the sample is large enough to be reliable and is chosen by a fair method. A small sample, or a large sample chosen by a biased method, is worse than a modest sample chosen well.

Bias: when a sample stops representing the population

Bias is any systematic tendency for a sample to differ from the population, so that it consistently over- or under-states the truth. Bias is not bad luck; it is a flaw built into how the data was collected, and a larger sample does not fix it. The common sources are:

Selection (coverage) bias. The method leaves part of the population out or over-includes another part - surveying only shoppers at one shopping centre, or sampling one suburb to speak for a whole town.
Self-selection (voluntary-response) bias. Only people who choose to respond are counted - phone-in polls and click-to-vote web surveys, where people with strong opinions are far more likely to take part.
Timing or convenience bias. Surveying at a time or place that misses whole groups - a weekday-morning survey misses full-time workers and students; sampling only your friends is a convenience sample.
Question-wording bias. A leading or loaded question pushes people toward an answer - "Do you support the wasteful new tax?" invites a no.

The cure for bias is a fair, random method applied to the whole target population, with neutral wording.

How exam questions ask about sampling

The wording tells you which part of the dot point is being tested:

"Would a census or a sample be more appropriate? Justify your answer." Decide on size, cost and whether testing is destructive, then give a one-line reason.
"Identify / name the sampling method." Match the description: a random start then every $k$ th is systematic; equal chance for all is simple random; same fraction from each group is stratified.
"Calculate how many should be selected from each ..." This is a stratified-sample calculation: find the sampling fraction, multiply each stratum by it, and check the parts sum to the sample size.
"Identify the (target) population." State the entire group the results are meant to describe (all students at the school, all residents of the town).
"Explain why this method is biased" or "give a source of bias." Name the type of bias and say which group is over- or under-represented and why the result cannot be generalised.

Census or sample, a stratified design, and spotting bias

Three worked examples covering the three things exams ask: choosing census or sample with a reason, designing a stratified sample with correct proportions, and identifying bias in a survey method.

Choose a census or a sample, with justification

A car manufacturer wants to know (i) the average paint thickness on every car of a model leaving the factory this year, and (ii) how many kilometres a tyre lasts before it is no longer roadworthy. For each, decide whether a census or a sample is appropriate and justify it.

Part (i) paint thickness: Measuring paint thickness does not damage the car, but the model has many thousands of cars produced over a year, so measuring every one is slow and costly for little extra benefit. A sample chosen by a fair method estimates the average accurately at a fraction of the effort.
Part (ii) tyre life: Finding how long a tyre lasts means wearing it out completely, so the test is destructive - a tested tyre is ruined and cannot be sold. Testing every tyre would destroy all of them, so a sample is the only option.
Answer: Both call for a sample: the first because a census would be needlessly expensive, the second because the test is destructive.

Design a stratified sample with the correct proportions

A high school has $1200$ students: $360$ in Year $7$ , $300$ in Year $8$ , $240$ in Year $9$ , $180$ in Year $10$ and $120$ in Year $11$ . The school council surveys a stratified sample of $60$ students. Find how many to survey from each year group.

Find the sampling fraction.

\frac{60}{1200} = \frac{1}{20}

so one in every twenty students is chosen (divide each year group by $20$ ).

Multiply each stratum by the fraction.

\text{Year } 7: 360 \times \tfrac{1}{20} = 18, \qquad \text{Year } 8: 300 \times \tfrac{1}{20} = 15,

\text{Year } 9: 240 \times \tfrac{1}{20} = 12, \qquad \text{Year } 10: 180 \times \tfrac{1}{20} = 9,

\text{Year } 11: 120 \times \tfrac{1}{20} = 6.

Check the total. $18 + 15 + 12 + 9 + 6 = 60$ , which is the required sample size, so the split is correct. Survey $18$ Year $7$ s, $15$ Year $8$ s, $12$ Year $9$ s, $9$ Year $10$ s and $6$ Year $11$ s. Notice each year group keeps its share: Year $7$ is $360$ of $1200$ students (three-tenths) and gets $18$ of the $60$ in the sample (also three-tenths).

Identify the bias in a survey method

To estimate the average number of hours per week that all residents of a city exercise, a researcher stands outside a gym on a Saturday morning and surveys people as they leave.

Identify the population and the sample: The target population is all residents of the city. The sample is people leaving one gym on a Saturday morning.
Name and explain the bias: This is selection (convenience) bias. People leaving a gym are far more active than the average resident, so the sample over-represents exercisers and the estimated weekly hours will be too high. People who never exercise - exactly the ones who pull the average down - have no chance of being surveyed.
State the consequence: Because the flaw is in how the sample was chosen, surveying more people outside the same gym would not fix it. A fair estimate needs a random sample drawn from all residents, not just gym-goers.

Common traps

Calling a survey of "everyone in a group" a census of the wrong population. A census is of the whole target population. Asking every member of a club is a census of the club, but only a sample of the wider community the club sits in.

Forgetting the destructive-testing case: If testing destroys the item (matches, globes, tyres, crash tests), a census is impossible, so a sample is forced - this is a favourite exam reason.
Getting the sampling fraction upside down: The fraction is sample $\div$ population, not population $\div$ sample. For $60$ from $1200$ it is $\tfrac{1}{20}$ , so you multiply each stratum by $\tfrac{1}{20}$ (or divide by $20$ ), never by $20$ .
Not checking the strata add to the sample size: Always confirm the parts total the required sample; if they do not, the fraction or the arithmetic is wrong.
Thinking a bigger sample removes bias: A larger sample shrinks random sampling error but does nothing about bias: a million voluntary online votes are still self-selected and unrepresentative.

Exam-style practice questions

Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2021 HSC-style3 marksA company has

1500

employees:

900

at its Sydney office,

400

at its Melbourne office and

200

at its Brisbane office. Management surveys a stratified sample of

75

employees. Calculate how many employees should be surveyed from each office.

Show worked answer →

Find the sampling fraction first: $\frac{75}{1500} = \frac{1}{20}$ . One mark is for this fraction (or equivalent, such as dividing each office by $20$ ).

Apply it to each stratum: Sydney $900 \times \tfrac{1}{20} = 45$ ; Melbourne $400 \times \tfrac{1}{20} = 20$ ; Brisbane $200 \times \tfrac{1}{20} = 10$ . One mark for the three correct numbers.

The final mark is for a correct method shown and a check that the parts total the sample: $45 + 20 + 10 = 75$ . A common error is multiplying by $20$ instead of by $\tfrac{1}{20}$ , or not checking the total - markers look for the fraction stated and the sum verified.

2022 HSC-style4 marksA local newspaper prints a survey form and invites readers to fill it in and mail it back. From the replies it reports that

82\%

of the town opposes a proposed shopping centre. (a) Identify the target population. (b) Name the type of bias present. (c) Explain why the

82\%

figure may not represent the views of the town. (d) Suggest a better sampling method.

Show worked answer →

(a) The target population is all residents (or all eligible voters) of the town - one mark.

(b) Self-selection (voluntary-response) bias, because only readers who chose to return the form are counted - one mark.

(c) Markers reward an explanation that those who reply are not representative: people with strong (often negative) views are far more likely to bother replying, and only newspaper readers see the form, so the sample is skewed and cannot be generalised - one mark.

(d) A simple random or stratified sample of residents drawn from the whole town (for example using the electoral roll and a random selection) - one mark. The strongest answers name a random method AND tie it back to covering the whole target population.

2023 HSC-style3 marksFor each situation, state whether a census or a sample is more appropriate and give a reason. (a) A school wants the opinion of its

250

students on a new uniform. (b) A factory wants to know the average lifetime of the batteries it produces. (c) The government wants the average household size across Australia.

Show worked answer →

One mark each for a correct choice WITH a valid reason.

(a) Census: the population is small and easily reached ( $250$ students in one school), so surveying everyone is feasible and exact.

(b) Sample: testing a battery's lifetime is destructive (the battery is used up), so a census is impossible.

(c) Sample: the population (every Australian household) is far too large and costly to survey in full for this purpose. Markers do not award the mark for the choice alone - the reason must match the situation (size/cost for (a) and (c), destructive testing for (b)).

Practice questions

Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.

foundation2 marksA school principal wants to know the favourite sport of every student at a school of

480

students. (a) Would a census or a sample be more appropriate? (b) Give one reason for your choice.

Show worked solution →

Part (a) - decide census or sample. A census asks every member of the population; a sample asks a chosen part of it. The population here is small and easy to reach (one school, $480$ students), so a census is appropriate.

Part (b) - justify the choice. Because the whole population is small and accessible (the students are all in the one place), surveying everyone is quick, cheap and gives an exact answer with no sampling error. (If the population had been every student in NSW, a sample would have been the sensible choice instead.)

foundation2 marksA manufacturer tests whether each match in a box of matches lights. Explain why a sample, not a census, must be used here.

Show worked solution →

Identify what testing does to the item. Striking a match to test it uses the match up, so the test is destructive: a tested match can no longer be sold.

Conclude. A census would destroy every match in production, leaving nothing to sell, so the manufacturer must test only a sample and use the result to judge the whole batch. (Destructive testing - matches, light globes, crash-testing cars - always forces a sample.)

foundation3 marksA factory produces

2000

light globes a day. A quality inspector tests every

50

th globe coming off the line. (a) Name this sampling method. (b) How many globes are tested each day? (c) State one advantage of this method.

Show worked solution →

Part (a) - name the method. Choosing every $50$ th item in order is systematic sampling: a starting point is chosen and then every $k$ th member is selected.

Part (b) - count the sample. Testing every $50$ th of $2000$ globes gives

\frac{2000}{50} = 40

so $40$ globes are tested each day.

Part (c) - one advantage. It is simple and quick to run on a production line (you just count off the items), and it spreads the sample evenly across the whole day's output. (A risk to note: if a fault repeats every $50$ th globe, systematic sampling could miss it or hit it every time.)

core4 marksA gym has

800

members:

400

aged under

30

280

aged

30

50

, and

120

aged over

50

. Management surveys a stratified sample of

40

members. How many members should be chosen from each age group?

Show worked solution →

Find the sampling fraction. A stratified sample takes the same fraction from each group (stratum). The fraction is the sample size over the population:

\frac{40}{800} = \frac{1}{20}

so $\tfrac{1}{20}$ of each group is chosen (equivalently, divide each group by $20$ ).

Apply the fraction to each stratum.

\text{under } 30: \quad 400 \times \frac{1}{20} = 20

30 \text{ to } 50: \quad 280 \times \frac{1}{20} = 14

\text{over } 50: \quad 120 \times \frac{1}{20} = 6

Check the parts add to the sample size. $20 + 14 + 6 = 40$ , which matches the required sample of $40$ , so the split is correct. Choose $20$ members aged under $30$ , $14$ aged $30$ to $50$ and $6$ aged over $50$ .

core3 marksA radio station asks listeners to phone in to vote on whether a new freeway should be built. Of those who call,

78\%

say no. (a) Identify the type of bias in this survey. (b) Explain why the

78\%

may not reflect the views of the whole community.

Show worked solution →

Part (a) - name the bias. The sample is made up only of people who chose to ring in, so this is self-selection (voluntary-response) bias.

Part (b) - explain it. People who phone a talkback line are not a random cross-section of the community: those with strong feelings (especially those opposed) are far more likely to bother calling, and listeners of that one station are not all residents. The $78\%$ therefore measures the views of motivated callers, not the whole community, so it cannot be generalised. (A fair result would need a random sample of all affected residents.)

exam5 marksA council wants to gauge support for a new library among the

9000

residents of a town. The town has three suburbs: Northvale (

4500

residents), Riverside (

3000

) and Hilltop (

1500

). The council will survey a stratified sample of

300

residents. (a) Explain why a sample rather than a census is sensible here. (b) Calculate how many residents to survey from each suburb. (c) The council's first plan was to survey shoppers at the Northvale shopping centre one weekday morning. Give two reasons this plan would be biased.

Show worked solution →

Part (a) - census or sample. With $9000$ residents a census would be slow and expensive to organise and process, and a well-designed sample of $300$ can estimate the level of support accurately, so a sample is the sensible choice.

Part (b) - stratified split. The sampling fraction is

\frac{300}{9000} = \frac{1}{30}

so take $\tfrac{1}{30}$ of each suburb:

\text{Northvale}: \quad 4500 \times \frac{1}{30} = 150

\text{Riverside}: \quad 3000 \times \frac{1}{30} = 100

\text{Hilltop}: \quad 1500 \times \frac{1}{30} = 50

Check: $150 + 100 + 50 = 300$ , the required total, so survey $150$ from Northvale, $100$ from Riverside and $50$ from Hilltop.

Part (c) - two reasons the first plan is biased. First, sampling only at the Northvale shopping centre over-represents Northvale residents and under-represents Riverside and Hilltop, so the sample is not representative of the whole town (a coverage problem). Second, surveying on a weekday morning misses people who are at work, at school or away during those hours, so groups such as full-time workers and students are systematically left out (selection/timing bias). Either of these means the result would not generalise to all $9000$ residents.

exam5 marksA school of

1200

students is made up of

360

in Year

7

300

in Year

8

240

in Year

9

180

in Year

10

and

120

in Year

11

. The Student Representative Council surveys a stratified sample of

80

students. (a) Find the number surveyed from each year group. (b) The SRC instead suggests just surveying its own

80

members. Identify the target population and explain why the SRC plan would give biased results. (c) Suggest a fairer method that is simpler to run than full stratified sampling, and state one drawback of it.

Show worked solution →

Part (a) - stratified split. The sampling fraction is

\frac{80}{1200} = \frac{1}{15}

so take $\tfrac{1}{15}$ of each year group:

\text{Year } 7: \quad 360 \times \frac{1}{15} = 24

\text{Year } 8: \quad 300 \times \frac{1}{15} = 20

\text{Year } 9: \quad 240 \times \frac{1}{15} = 16

\text{Year } 10: \quad 180 \times \frac{1}{15} = 12

\text{Year } 11: \quad 120 \times \frac{1}{15} = 8

Check: $24 + 20 + 16 + 12 + 8 = 80$ , the required sample, so survey $24$ , $20$ , $16$ , $12$ and $8$ students from Years $7$ to $11$ respectively.

Part (b) - target population and bias. The target population is all $1200$ students at the school. Surveying only the $80$ SRC members samples a special subgroup (students engaged enough to be elected to the council), not a cross-section of the school, so their views are unlikely to match the whole student body. This is selection bias, and the result cannot be generalised to all students.

Part (c) - a fairer, simpler method. A simple random sample of $80$ students drawn from the full roll (for example by assigning every student a number and using a random number generator) is fair and easier to organise than splitting by year. One drawback is that, by chance, a random sample might pick very few students from a small year group, so the year groups may not be represented in the right proportions the way stratified sampling guarantees.

What this dot point is asking

The answer

Census versus sample

The three sampling methods

Designing a stratified sample

Sample size

Bias: when a sample stops representing the population

How exam questions ask about sampling

Exam-style practice questions

Practice questions

Related dot points