Skip to main content
NSWMaths Standard 2Syllabus dot point

How are data collected and sampled so that the results fairly represent the whole population?

Investigate sampling techniques, including census, simple random, systematic and stratified sampling, and identify the target population and sources of bias in data collection

A focused answer to the HSC Maths Standard 2 dot point on data collection and sampling. Census versus sample, defining the target population, simple random, systematic and stratified sampling, choosing a sample size, designing a stratified sample with correct proportions, and spotting bias in a survey, with worked Australian examples.

Generated by Claude Opus 4.814 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

What this dot point is asking

NESA wants you to understand how data is gathered before any of it is graphed or summarised, because a conclusion is only as trustworthy as the data behind it. You need to decide when to survey the whole population (a census) and when to survey only part of it (a sample), name and apply the three sampling methods in the syllabus (simple random, systematic and stratified), choose and justify a sample size, and recognise the ways a sample can become biased so that it no longer represents the population. The arithmetic is light - mostly finding a fraction of each group for a stratified sample - so most of the marks reward clear reasoning: identifying the target population, justifying census versus sample, and explaining exactly why a particular method is or is not fair.

The answer

Almost every study starts with one decision: do you collect data from everyone, or from a representative few? A census collects data from every member of the population. A sample collects data from a chosen subset and uses it to estimate what the whole population is like. The group you ultimately want your results to describe is the target population, and the whole point of sampling is to pick a sample that mirrors that population, so that what is true of the sample is also true of the population.

Census versus sample

A census surveys the entire population. It gives an exact answer with no sampling error, but it is only practical when the population is small and easy to reach. The Australian Bureau of Statistics runs a national Census every five years, which shows both the value (a complete picture) and the cost (it is a huge, expensive national exercise).

A sample surveys only part of the population. You use a sample when:

  • the population is too large to survey everyone (for example, all Australian voters),
  • a census would be too expensive or too slow, or
  • the testing is destructive, so testing every item would destroy the whole product (matches, light globes, crash-tested cars).

The trade-off is sampling error: because you did not ask everyone, the sample result is an estimate, not the exact truth. A good sampling method and a large enough sample keep that error small.

The three sampling methods

NESA names three ways to choose a sample. Each aims to be fair, but they differ in how they do it.

  • Simple random sampling. Every member of the population has an equal chance of being chosen, like drawing names from a hat. In practice you number every member and use a random number generator. It is fair and unbiased, but you need a full list of the population, and by chance it may under-represent a small subgroup.
  • Systematic sampling. You order the population, choose a random starting point, then select every kkth member. The step kk is the population size divided by the sample size. For a sample of 5050 from 10001000 people, k=1000÷50=20k = 1000 \div 50 = 20, so you take every 2020th person. It is quick and easy, but it fails if the list has a repeating pattern that lines up with kk.
  • Stratified sampling. You split the population into strata (non-overlapping groups such as year levels, age bands or suburbs), then take the same fraction from each stratum, so every group is represented in its correct proportion. This is the fairest method when the population has clear subgroups of different sizes, but it needs you to know the size of each stratum.

Designing a stratified sample

Stratified sampling is the method NESA examines most, because it involves a calculation. The idea is simple: each stratum should contribute to the sample in the same proportion as it appears in the population. The tool is the sampling fraction:

sampling fraction=sample sizepopulation size.\text{sampling fraction} = \frac{\text{sample size}}{\text{population size}}.

Multiply each stratum's size by this fraction to get the number to take from that stratum. The diagram below traces the whole process - the population splits into strata, and each stratum passes the same fraction of itself into the sample.

Designing a stratified sample from three age strataA gym of eight hundred members is split into three age strata: four hundred aged under thirty, two hundred and eighty aged thirty to fifty, and one hundred and twenty aged over fifty. A stratified sample of forty takes one twentieth of each stratum, giving twenty, fourteen and six members, which add to forty. The sample bar has the same proportions as the population bar.Take the same fraction (× 1/20) from every stratumPopulation800 members400280120StrataUnder 30: 40030 to 50: 280Over 50: 120× 1/20Sample40 members20146Sampling fraction = 40 ÷ 800 = 1/20400×1/20 = 20,   280×1/20 = 14,   120×1/20 = 620 + 14 + 6 = 40 ✓

The check at the end is the habit to build: the numbers you take from the strata must add up to the sample size you wanted. If they do not, recompute the fraction.

Sample size

A bigger sample generally gives a more reliable estimate, because random sampling error shrinks as the sample grows. But a bigger sample also costs more time and money, so you balance accuracy against cost. There is no single "correct" size in Standard 2; what matters is that the sample is large enough to be reliable and is chosen by a fair method. A small sample, or a large sample chosen by a biased method, is worse than a modest sample chosen well.

Bias: when a sample stops representing the population

Bias is any systematic tendency for a sample to differ from the population, so that it consistently over- or under-states the truth. Bias is not bad luck; it is a flaw built into how the data was collected, and a larger sample does not fix it. The common sources are:

  • Selection (coverage) bias. The method leaves part of the population out or over-includes another part - surveying only shoppers at one shopping centre, or sampling one suburb to speak for a whole town.
  • Self-selection (voluntary-response) bias. Only people who choose to respond are counted - phone-in polls and click-to-vote web surveys, where people with strong opinions are far more likely to take part.
  • Timing or convenience bias. Surveying at a time or place that misses whole groups - a weekday-morning survey misses full-time workers and students; sampling only your friends is a convenience sample.
  • Question-wording bias. A leading or loaded question pushes people toward an answer - "Do you support the wasteful new tax?" invites a no.

The cure for bias is a fair, random method applied to the whole target population, with neutral wording.

How exam questions ask about sampling

The wording tells you which part of the dot point is being tested:

  • "Would a census or a sample be more appropriate? Justify your answer." Decide on size, cost and whether testing is destructive, then give a one-line reason.
  • "Identify / name the sampling method." Match the description: a random start then every kkth is systematic; equal chance for all is simple random; same fraction from each group is stratified.
  • "Calculate how many should be selected from each ..." This is a stratified-sample calculation: find the sampling fraction, multiply each stratum by it, and check the parts sum to the sample size.
  • "Identify the (target) population." State the entire group the results are meant to describe (all students at the school, all residents of the town).
  • "Explain why this method is biased" or "give a source of bias." Name the type of bias and say which group is over- or under-represented and why the result cannot be generalised.

Exam-style practice questions

Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2021 HSC-style3 marksA company has 15001500 employees: 900900 at its Sydney office, 400400 at its Melbourne office and 200200 at its Brisbane office. Management surveys a stratified sample of 7575 employees. Calculate how many employees should be surveyed from each office.
Show worked answer →

Find the sampling fraction first: 751500=120\frac{75}{1500} = \frac{1}{20}. One mark is for this fraction (or equivalent, such as dividing each office by 2020).

Apply it to each stratum: Sydney 900×120=45900 \times \tfrac{1}{20} = 45; Melbourne 400×120=20400 \times \tfrac{1}{20} = 20; Brisbane 200×120=10200 \times \tfrac{1}{20} = 10. One mark for the three correct numbers.

The final mark is for a correct method shown and a check that the parts total the sample: 45+20+10=7545 + 20 + 10 = 75. A common error is multiplying by 2020 instead of by 120\tfrac{1}{20}, or not checking the total - markers look for the fraction stated and the sum verified.

2022 HSC-style4 marksA local newspaper prints a survey form and invites readers to fill it in and mail it back. From the replies it reports that 82%82\% of the town opposes a proposed shopping centre. (a) Identify the target population. (b) Name the type of bias present. (c) Explain why the 82%82\% figure may not represent the views of the town. (d) Suggest a better sampling method.
Show worked answer →

(a) The target population is all residents (or all eligible voters) of the town - one mark.

(b) Self-selection (voluntary-response) bias, because only readers who chose to return the form are counted - one mark.

(c) Markers reward an explanation that those who reply are not representative: people with strong (often negative) views are far more likely to bother replying, and only newspaper readers see the form, so the sample is skewed and cannot be generalised - one mark.

(d) A simple random or stratified sample of residents drawn from the whole town (for example using the electoral roll and a random selection) - one mark. The strongest answers name a random method AND tie it back to covering the whole target population.

2023 HSC-style3 marksFor each situation, state whether a census or a sample is more appropriate and give a reason. (a) A school wants the opinion of its 250250 students on a new uniform. (b) A factory wants to know the average lifetime of the batteries it produces. (c) The government wants the average household size across Australia.
Show worked answer →

One mark each for a correct choice WITH a valid reason.

(a) Census: the population is small and easily reached (250250 students in one school), so surveying everyone is feasible and exact.

(b) Sample: testing a battery's lifetime is destructive (the battery is used up), so a census is impossible.

(c) Sample: the population (every Australian household) is far too large and costly to survey in full for this purpose. Markers do not award the mark for the choice alone - the reason must match the situation (size/cost for (a) and (c), destructive testing for (b)).

Practice questions

Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.

foundation2 marksA school principal wants to know the favourite sport of every student at a school of 480480 students. (a) Would a census or a sample be more appropriate? (b) Give one reason for your choice.
Show worked solution →

Part (a) - decide census or sample. A census asks every member of the population; a sample asks a chosen part of it. The population here is small and easy to reach (one school, 480480 students), so a census is appropriate.

Part (b) - justify the choice. Because the whole population is small and accessible (the students are all in the one place), surveying everyone is quick, cheap and gives an exact answer with no sampling error. (If the population had been every student in NSW, a sample would have been the sensible choice instead.)

foundation2 marksA manufacturer tests whether each match in a box of matches lights. Explain why a sample, not a census, must be used here.
Show worked solution →

Identify what testing does to the item. Striking a match to test it uses the match up, so the test is destructive: a tested match can no longer be sold.

Conclude. A census would destroy every match in production, leaving nothing to sell, so the manufacturer must test only a sample and use the result to judge the whole batch. (Destructive testing - matches, light globes, crash-testing cars - always forces a sample.)

foundation3 marksA factory produces 20002000 light globes a day. A quality inspector tests every 5050th globe coming off the line. (a) Name this sampling method. (b) How many globes are tested each day? (c) State one advantage of this method.
Show worked solution →

Part (a) - name the method. Choosing every 5050th item in order is systematic sampling: a starting point is chosen and then every kkth member is selected.

Part (b) - count the sample. Testing every 5050th of 20002000 globes gives

200050=40\frac{2000}{50} = 40

so 4040 globes are tested each day.

Part (c) - one advantage. It is simple and quick to run on a production line (you just count off the items), and it spreads the sample evenly across the whole day's output. (A risk to note: if a fault repeats every 5050th globe, systematic sampling could miss it or hit it every time.)

core4 marksA gym has 800800 members: 400400 aged under 3030, 280280 aged 3030 to 5050, and 120120 aged over 5050. Management surveys a stratified sample of 4040 members. How many members should be chosen from each age group?
Show worked solution →

Find the sampling fraction. A stratified sample takes the same fraction from each group (stratum). The fraction is the sample size over the population:

40800=120\frac{40}{800} = \frac{1}{20}

so 120\tfrac{1}{20} of each group is chosen (equivalently, divide each group by 2020).

Apply the fraction to each stratum.

under 30:400×120=20\text{under } 30: \quad 400 \times \frac{1}{20} = 20

30 to 50:280×120=1430 \text{ to } 50: \quad 280 \times \frac{1}{20} = 14

over 50:120×120=6\text{over } 50: \quad 120 \times \frac{1}{20} = 6

Check the parts add to the sample size. 20+14+6=4020 + 14 + 6 = 40, which matches the required sample of 4040, so the split is correct. Choose 2020 members aged under 3030, 1414 aged 3030 to 5050 and 66 aged over 5050.

core3 marksA radio station asks listeners to phone in to vote on whether a new freeway should be built. Of those who call, 78%78\% say no. (a) Identify the type of bias in this survey. (b) Explain why the 78%78\% may not reflect the views of the whole community.
Show worked solution →

Part (a) - name the bias. The sample is made up only of people who chose to ring in, so this is self-selection (voluntary-response) bias.

Part (b) - explain it. People who phone a talkback line are not a random cross-section of the community: those with strong feelings (especially those opposed) are far more likely to bother calling, and listeners of that one station are not all residents. The 78%78\% therefore measures the views of motivated callers, not the whole community, so it cannot be generalised. (A fair result would need a random sample of all affected residents.)

exam5 marksA council wants to gauge support for a new library among the 90009000 residents of a town. The town has three suburbs: Northvale (45004500 residents), Riverside (30003000) and Hilltop (15001500). The council will survey a stratified sample of 300300 residents. (a) Explain why a sample rather than a census is sensible here. (b) Calculate how many residents to survey from each suburb. (c) The council's first plan was to survey shoppers at the Northvale shopping centre one weekday morning. Give two reasons this plan would be biased.
Show worked solution →

Part (a) - census or sample. With 90009000 residents a census would be slow and expensive to organise and process, and a well-designed sample of 300300 can estimate the level of support accurately, so a sample is the sensible choice.

Part (b) - stratified split. The sampling fraction is

3009000=130\frac{300}{9000} = \frac{1}{30}

so take 130\tfrac{1}{30} of each suburb:

Northvale:4500×130=150\text{Northvale}: \quad 4500 \times \frac{1}{30} = 150

Riverside:3000×130=100\text{Riverside}: \quad 3000 \times \frac{1}{30} = 100

Hilltop:1500×130=50\text{Hilltop}: \quad 1500 \times \frac{1}{30} = 50

Check: 150+100+50=300150 + 100 + 50 = 300, the required total, so survey 150150 from Northvale, 100100 from Riverside and 5050 from Hilltop.

Part (c) - two reasons the first plan is biased. First, sampling only at the Northvale shopping centre over-represents Northvale residents and under-represents Riverside and Hilltop, so the sample is not representative of the whole town (a coverage problem). Second, surveying on a weekday morning misses people who are at work, at school or away during those hours, so groups such as full-time workers and students are systematically left out (selection/timing bias). Either of these means the result would not generalise to all 90009000 residents.

exam5 marksA school of 12001200 students is made up of 360360 in Year 77, 300300 in Year 88, 240240 in Year 99, 180180 in Year 1010 and 120120 in Year 1111. The Student Representative Council surveys a stratified sample of 8080 students. (a) Find the number surveyed from each year group. (b) The SRC instead suggests just surveying its own 8080 members. Identify the target population and explain why the SRC plan would give biased results. (c) Suggest a fairer method that is simpler to run than full stratified sampling, and state one drawback of it.
Show worked solution →

Part (a) - stratified split. The sampling fraction is

801200=115\frac{80}{1200} = \frac{1}{15}

so take 115\tfrac{1}{15} of each year group:

Year 7:360×115=24\text{Year } 7: \quad 360 \times \frac{1}{15} = 24

Year 8:300×115=20\text{Year } 8: \quad 300 \times \frac{1}{15} = 20

Year 9:240×115=16\text{Year } 9: \quad 240 \times \frac{1}{15} = 16

Year 10:180×115=12\text{Year } 10: \quad 180 \times \frac{1}{15} = 12

Year 11:120×115=8\text{Year } 11: \quad 120 \times \frac{1}{15} = 8

Check: 24+20+16+12+8=8024 + 20 + 16 + 12 + 8 = 80, the required sample, so survey 2424, 2020, 1616, 1212 and 88 students from Years 77 to 1111 respectively.

Part (b) - target population and bias. The target population is all 12001200 students at the school. Surveying only the 8080 SRC members samples a special subgroup (students engaged enough to be elected to the council), not a cross-section of the school, so their views are unlikely to match the whole student body. This is selection bias, and the result cannot be generalised to all students.

Part (c) - a fairer, simpler method. A simple random sample of 8080 students drawn from the full roll (for example by assigning every student a number and using a random number generator) is fair and easier to organise than splitting by year. One drawback is that, by chance, a random sample might pick very few students from a small year group, so the year groups may not be represented in the right proportions the way stratified sampling guarantees.

Related dot points