How are data collected and sampled so that the results fairly represent the whole population?
Investigate sampling techniques, including census, simple random, systematic and stratified sampling, and identify the target population and sources of bias in data collection
A focused answer to the HSC Maths Standard 2 dot point on data collection and sampling. Census versus sample, defining the target population, simple random, systematic and stratified sampling, choosing a sample size, designing a stratified sample with correct proportions, and spotting bias in a survey, with worked Australian examples.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
What this dot point is asking
NESA wants you to understand how data is gathered before any of it is graphed or summarised, because a conclusion is only as trustworthy as the data behind it. You need to decide when to survey the whole population (a census) and when to survey only part of it (a sample), name and apply the three sampling methods in the syllabus (simple random, systematic and stratified), choose and justify a sample size, and recognise the ways a sample can become biased so that it no longer represents the population. The arithmetic is light - mostly finding a fraction of each group for a stratified sample - so most of the marks reward clear reasoning: identifying the target population, justifying census versus sample, and explaining exactly why a particular method is or is not fair.
The answer
Almost every study starts with one decision: do you collect data from everyone, or from a representative few? A census collects data from every member of the population. A sample collects data from a chosen subset and uses it to estimate what the whole population is like. The group you ultimately want your results to describe is the target population, and the whole point of sampling is to pick a sample that mirrors that population, so that what is true of the sample is also true of the population.
Census versus sample
A census surveys the entire population. It gives an exact answer with no sampling error, but it is only practical when the population is small and easy to reach. The Australian Bureau of Statistics runs a national Census every five years, which shows both the value (a complete picture) and the cost (it is a huge, expensive national exercise).
A sample surveys only part of the population. You use a sample when:
- the population is too large to survey everyone (for example, all Australian voters),
- a census would be too expensive or too slow, or
- the testing is destructive, so testing every item would destroy the whole product (matches, light globes, crash-tested cars).
The trade-off is sampling error: because you did not ask everyone, the sample result is an estimate, not the exact truth. A good sampling method and a large enough sample keep that error small.
The three sampling methods
NESA names three ways to choose a sample. Each aims to be fair, but they differ in how they do it.
- Simple random sampling. Every member of the population has an equal chance of being chosen, like drawing names from a hat. In practice you number every member and use a random number generator. It is fair and unbiased, but you need a full list of the population, and by chance it may under-represent a small subgroup.
- Systematic sampling. You order the population, choose a random starting point, then select every th member. The step is the population size divided by the sample size. For a sample of from people, , so you take every th person. It is quick and easy, but it fails if the list has a repeating pattern that lines up with .
- Stratified sampling. You split the population into strata (non-overlapping groups such as year levels, age bands or suburbs), then take the same fraction from each stratum, so every group is represented in its correct proportion. This is the fairest method when the population has clear subgroups of different sizes, but it needs you to know the size of each stratum.
Designing a stratified sample
Stratified sampling is the method NESA examines most, because it involves a calculation. The idea is simple: each stratum should contribute to the sample in the same proportion as it appears in the population. The tool is the sampling fraction:
Multiply each stratum's size by this fraction to get the number to take from that stratum. The diagram below traces the whole process - the population splits into strata, and each stratum passes the same fraction of itself into the sample.
The check at the end is the habit to build: the numbers you take from the strata must add up to the sample size you wanted. If they do not, recompute the fraction.
Sample size
A bigger sample generally gives a more reliable estimate, because random sampling error shrinks as the sample grows. But a bigger sample also costs more time and money, so you balance accuracy against cost. There is no single "correct" size in Standard 2; what matters is that the sample is large enough to be reliable and is chosen by a fair method. A small sample, or a large sample chosen by a biased method, is worse than a modest sample chosen well.
Bias: when a sample stops representing the population
Bias is any systematic tendency for a sample to differ from the population, so that it consistently over- or under-states the truth. Bias is not bad luck; it is a flaw built into how the data was collected, and a larger sample does not fix it. The common sources are:
- Selection (coverage) bias. The method leaves part of the population out or over-includes another part - surveying only shoppers at one shopping centre, or sampling one suburb to speak for a whole town.
- Self-selection (voluntary-response) bias. Only people who choose to respond are counted - phone-in polls and click-to-vote web surveys, where people with strong opinions are far more likely to take part.
- Timing or convenience bias. Surveying at a time or place that misses whole groups - a weekday-morning survey misses full-time workers and students; sampling only your friends is a convenience sample.
- Question-wording bias. A leading or loaded question pushes people toward an answer - "Do you support the wasteful new tax?" invites a no.
The cure for bias is a fair, random method applied to the whole target population, with neutral wording.
How exam questions ask about sampling
The wording tells you which part of the dot point is being tested:
- "Would a census or a sample be more appropriate? Justify your answer." Decide on size, cost and whether testing is destructive, then give a one-line reason.
- "Identify / name the sampling method." Match the description: a random start then every th is systematic; equal chance for all is simple random; same fraction from each group is stratified.
- "Calculate how many should be selected from each ..." This is a stratified-sample calculation: find the sampling fraction, multiply each stratum by it, and check the parts sum to the sample size.
- "Identify the (target) population." State the entire group the results are meant to describe (all students at the school, all residents of the town).
- "Explain why this method is biased" or "give a source of bias." Name the type of bias and say which group is over- or under-represented and why the result cannot be generalised.
Exam-style practice questions
Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
2021 HSC-style3 marksA company has employees: at its Sydney office, at its Melbourne office and at its Brisbane office. Management surveys a stratified sample of employees. Calculate how many employees should be surveyed from each office.Show worked answer →
Find the sampling fraction first: . One mark is for this fraction (or equivalent, such as dividing each office by ).
Apply it to each stratum: Sydney ; Melbourne ; Brisbane . One mark for the three correct numbers.
The final mark is for a correct method shown and a check that the parts total the sample: . A common error is multiplying by instead of by , or not checking the total - markers look for the fraction stated and the sum verified.
2022 HSC-style4 marksA local newspaper prints a survey form and invites readers to fill it in and mail it back. From the replies it reports that of the town opposes a proposed shopping centre. (a) Identify the target population. (b) Name the type of bias present. (c) Explain why the figure may not represent the views of the town. (d) Suggest a better sampling method.Show worked answer →
(a) The target population is all residents (or all eligible voters) of the town - one mark.
(b) Self-selection (voluntary-response) bias, because only readers who chose to return the form are counted - one mark.
(c) Markers reward an explanation that those who reply are not representative: people with strong (often negative) views are far more likely to bother replying, and only newspaper readers see the form, so the sample is skewed and cannot be generalised - one mark.
(d) A simple random or stratified sample of residents drawn from the whole town (for example using the electoral roll and a random selection) - one mark. The strongest answers name a random method AND tie it back to covering the whole target population.
2023 HSC-style3 marksFor each situation, state whether a census or a sample is more appropriate and give a reason. (a) A school wants the opinion of its students on a new uniform. (b) A factory wants to know the average lifetime of the batteries it produces. (c) The government wants the average household size across Australia.Show worked answer →
One mark each for a correct choice WITH a valid reason.
(a) Census: the population is small and easily reached ( students in one school), so surveying everyone is feasible and exact.
(b) Sample: testing a battery's lifetime is destructive (the battery is used up), so a census is impossible.
(c) Sample: the population (every Australian household) is far too large and costly to survey in full for this purpose. Markers do not award the mark for the choice alone - the reason must match the situation (size/cost for (a) and (c), destructive testing for (b)).
Practice questions
Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.
foundation2 marksA school principal wants to know the favourite sport of every student at a school of students. (a) Would a census or a sample be more appropriate? (b) Give one reason for your choice.Show worked solution →
Part (a) - decide census or sample. A census asks every member of the population; a sample asks a chosen part of it. The population here is small and easy to reach (one school, students), so a census is appropriate.
Part (b) - justify the choice. Because the whole population is small and accessible (the students are all in the one place), surveying everyone is quick, cheap and gives an exact answer with no sampling error. (If the population had been every student in NSW, a sample would have been the sensible choice instead.)
foundation2 marksA manufacturer tests whether each match in a box of matches lights. Explain why a sample, not a census, must be used here.Show worked solution →
Identify what testing does to the item. Striking a match to test it uses the match up, so the test is destructive: a tested match can no longer be sold.
Conclude. A census would destroy every match in production, leaving nothing to sell, so the manufacturer must test only a sample and use the result to judge the whole batch. (Destructive testing - matches, light globes, crash-testing cars - always forces a sample.)
foundation3 marksA factory produces light globes a day. A quality inspector tests every th globe coming off the line. (a) Name this sampling method. (b) How many globes are tested each day? (c) State one advantage of this method.Show worked solution →
Part (a) - name the method. Choosing every th item in order is systematic sampling: a starting point is chosen and then every th member is selected.
Part (b) - count the sample. Testing every th of globes gives
so globes are tested each day.
Part (c) - one advantage. It is simple and quick to run on a production line (you just count off the items), and it spreads the sample evenly across the whole day's output. (A risk to note: if a fault repeats every th globe, systematic sampling could miss it or hit it every time.)
core4 marksA gym has members: aged under , aged to , and aged over . Management surveys a stratified sample of members. How many members should be chosen from each age group?Show worked solution →
Find the sampling fraction. A stratified sample takes the same fraction from each group (stratum). The fraction is the sample size over the population:
so of each group is chosen (equivalently, divide each group by ).
Apply the fraction to each stratum.
Check the parts add to the sample size. , which matches the required sample of , so the split is correct. Choose members aged under , aged to and aged over .
core3 marksA radio station asks listeners to phone in to vote on whether a new freeway should be built. Of those who call, say no. (a) Identify the type of bias in this survey. (b) Explain why the may not reflect the views of the whole community.Show worked solution →
Part (a) - name the bias. The sample is made up only of people who chose to ring in, so this is self-selection (voluntary-response) bias.
Part (b) - explain it. People who phone a talkback line are not a random cross-section of the community: those with strong feelings (especially those opposed) are far more likely to bother calling, and listeners of that one station are not all residents. The therefore measures the views of motivated callers, not the whole community, so it cannot be generalised. (A fair result would need a random sample of all affected residents.)
exam5 marksA council wants to gauge support for a new library among the residents of a town. The town has three suburbs: Northvale ( residents), Riverside () and Hilltop (). The council will survey a stratified sample of residents. (a) Explain why a sample rather than a census is sensible here. (b) Calculate how many residents to survey from each suburb. (c) The council's first plan was to survey shoppers at the Northvale shopping centre one weekday morning. Give two reasons this plan would be biased.Show worked solution →
Part (a) - census or sample. With residents a census would be slow and expensive to organise and process, and a well-designed sample of can estimate the level of support accurately, so a sample is the sensible choice.
Part (b) - stratified split. The sampling fraction is
so take of each suburb:
Check: , the required total, so survey from Northvale, from Riverside and from Hilltop.
Part (c) - two reasons the first plan is biased. First, sampling only at the Northvale shopping centre over-represents Northvale residents and under-represents Riverside and Hilltop, so the sample is not representative of the whole town (a coverage problem). Second, surveying on a weekday morning misses people who are at work, at school or away during those hours, so groups such as full-time workers and students are systematically left out (selection/timing bias). Either of these means the result would not generalise to all residents.
exam5 marksA school of students is made up of in Year , in Year , in Year , in Year and in Year . The Student Representative Council surveys a stratified sample of students. (a) Find the number surveyed from each year group. (b) The SRC instead suggests just surveying its own members. Identify the target population and explain why the SRC plan would give biased results. (c) Suggest a fairer method that is simpler to run than full stratified sampling, and state one drawback of it.Show worked solution →
Part (a) - stratified split. The sampling fraction is
so take of each year group:
Check: , the required sample, so survey , , , and students from Years to respectively.
Part (b) - target population and bias. The target population is all students at the school. Surveying only the SRC members samples a special subgroup (students engaged enough to be elected to the council), not a cross-section of the school, so their views are unlikely to match the whole student body. This is selection bias, and the result cannot be generalised to all students.
Part (c) - a fairer, simpler method. A simple random sample of students drawn from the full roll (for example by assigning every student a number and using a random number generator) is fair and easier to organise than splitting by year. One drawback is that, by chance, a random sample might pick very few students from a small year group, so the year groups may not be represented in the right proportions the way stratified sampling guarantees.
Related dot points
- Classify data relating to a single random variable as categorical (nominal or ordinal) or numerical (discrete or continuous), and select and use an appropriate graphical display for the data type
A focused answer to the HSC Maths Standard 2 dot point on classifying data. Categorical (nominal versus ordinal) against numerical (discrete versus continuous), the questions that decide each branch, and choosing an appropriate display for each data type, with a classification tree and worked Australian examples.
- Display and interpret numerical data using dot plots and stem-and-leaf plots, including back-to-back stem-and-leaf plots, and describe the clusters, gaps, outliers and shape of the data
A focused answer to the HSC Maths Standard 2 dot point on dot plots and stem-and-leaf plots. How to construct and read each display, how to build a back-to-back stem-and-leaf plot to compare two groups, and how to describe clusters, gaps, outliers and the shape of a distribution, with worked Australian examples.
- Calculate measures of central tendency, including the mean, median and mode, for both raw data and data presented in a frequency table
A focused answer to the HSC Maths Standard 2 dot point on the mean, median and mode. Finding all three from a raw list, the mean and mode from a frequency table, the mean from grouped data using class centres, and choosing the most appropriate measure when the data is skewed or has an outlier, with worked Australian examples.
- Calculate measures of spread, including the range, quartiles and interquartile range, and the population standard deviation using technology
A focused answer to the HSC Maths Standard 2 dot point on measures of spread. The range, the quartiles and interquartile range, the five-number summary, the population standard deviation from a calculator, and how to compare the spread of two data sets, with worked Australian examples.