Middle value when data is ordered. Less affected by outliers than the mean.

Difference between maximum and minimum. A simple measure of spread.

What is standard deviation?

A more rigorous measure of spread that quantifies how tightly values cluster around the mean.

§-Syllabus dot point

NSWInvestigating ScienceSyllabus dot point

Inquiry Question 3: How is the integrity of a scientific investigation judged?

Process, analyse and interpret quantitative and qualitative data, including identifying and accounting for sources of error and uncertainty

A focused answer to the HSC Investigating Science Module 5 dot point on data analysis. Covers means and ranges, error bars, significant figures, random vs systematic error, outliers, and worked HSC past exam questions.

Generated by Claude Opus 4.88 min answerUpdated 2026-07-03

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this dot point is asking
The answer
Examples in context
Try this

What this dot point is asking

NESA wants you to process raw quantitative data into summary statistics, represent data with appropriate graphs and tables, account for measurement error and uncertainty, and identify outliers. Quantitative data analysis is examined in nearly every Investigating Science paper.

The answer

Processing data turns raw measurements into evidence that can be interpreted against a hypothesis. The standard steps:

Tabulate raw data.
Calculate summary statistics (mean, median, range, standard deviation).
Identify outliers and decide whether to exclude.
Estimate uncertainty.
Graph the result with error bars.
Interpret in light of the hypothesis.

Summary statistics

Mean. Sum of values divided by count. The most common measure of central tendency for normally distributed data.

\bar{x} = \frac{\sum x_i}{n}

Median: Middle value when data is ordered. Less affected by outliers than the mean.
Range: Difference between maximum and minimum. A simple measure of spread.
Standard deviation: A more rigorous measure of spread that quantifies how tightly values cluster around the mean.

s = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n - 1}}

Uncertainty

Every measurement has uncertainty arising from instrument resolution and natural variation. Standard ways to report:

Absolute uncertainty. $24.7 \pm 0.1$ cm.
Percentage uncertainty. $\frac{0.1}{24.7} \times 100 = 0.4 \%$ .

For multiple measurements, the uncertainty is usually estimated as half the range or as the standard deviation of the mean.

Significant figures

Report data with significant figures appropriate to the precision of the instrument.

A ruler graduated to 1 mm reads to $\pm 0.5$ mm; report measurements to 1 decimal place in cm.
A digital thermometer reading to 0.1 degrees Celsius reports to that resolution; report no extra digits.

When calculating, the answer cannot be more precise than the least precise input. $24.7$ cm $\times 3.1$ cm rounds to $77$ cm $^2$ (two significant figures, matching the $3.1$ ).

Random and systematic error

Random error. Unpredictable variation between repeated measurements, caused by chance fluctuations. Magnitude varies; direction varies. Random error reduces with averaging.

Systematic error. Consistent bias in one direction caused by miscalibration, methodological flaw or biased observer. Systematic error does not reduce with averaging.

Property	Random error	Systematic error
Direction	Random	Consistent
Effect	Reduces precision	Reduces accuracy
Reduction	Replication, averaging	Calibration, instrument correction
Example	Stopwatch reading by hand	Balance not zeroed

Outliers

A data point well outside the cluster of others. A common rule is values more than 2 to 3 standard deviations from the mean. Options:

Investigate. Check whether the value is a transcription error, a faulty instrument or a real but rare observation.
Repeat the measurement if possible.
Exclude with justification. Document the reason for exclusion. Do not silently drop outliers; that is selective reporting.

Graphing

Choice of graph.

Line graph. Continuous independent variable (time, temperature).
Bar graph. Categorical independent variable (treatment groups).
Scatter plot. Investigating correlation between two continuous variables.

Error bars. Vertical lines showing the range or uncertainty around each data point. Mandatory for any quantitative graph in Investigating Science.

Axes. Labelled with quantity and unit. Independent variable on the x-axis, dependent variable on the y-axis. Origin clearly marked.

Interpreting in light of the hypothesis

A finding is meaningful when:

The treatment effect is larger than the uncertainty in the measurement.
The result is reproducible across replicates.
Alternative explanations (confounders, instrument bias) can be ruled out.

A treatment difference smaller than the error bars is not evidence of effect.

Worked example

A class measures the time for a parachute to fall 2 m at five different surface areas (0.04, 0.08, 0.12, 0.16, 0.20 m $^2$ ). Each is repeated 5 times.

For the 0.12 m $^2$ parachute, results are 1.42, 1.45, 1.41, 1.44, 1.43 s.

Mean: $(1.42 + 1.45 + 1.41 + 1.44 + 1.43) / 5 = 1.43$ s.
Range: $1.45 - 1.41 = 0.04$ s.
Uncertainty: Approximately $\pm 0.02$ s.
Reported value: $1.43 \pm 0.02$ s.
Comparison: For the 0.16 m $^2$ parachute, the mean is $1.65 \pm 0.03$ s. The difference between $1.43$ and $1.65$ s is $0.22$ s, far larger than the combined uncertainty of about $0.05$ s, so the difference is significant. The hypothesis that larger surface area increases fall time is supported.

Examples in context

Example 1. Australian Bureau of Statistics Sydney population recount. After the 2021 Census, the ABS identified that an early estimate of Sydney's population had to be adjusted downward by roughly 11,000 people once dwelling-occupancy data was reconciled across local government areas. The original estimate sat outside the 95 per cent confidence interval implied by post-enumeration survey calibration, marking it as a potential outlier rather than a real demographic shift. The ABS published a revision with a clear uncertainty range (plus or minus 0.4 per cent for state-level totals). The case illustrates good practice: investigate the outlier (overcount in temporary accommodation), document the cause, and reissue with corrected uncertainty rather than silently dropping the figure.

Example 2. Cape Grim CO2 baseline uncertainty. Atmospheric CO2 at Cape Grim is measured by non-dispersive infrared spectroscopy with quoted uncertainty of plus or minus 0.1 parts per million for hourly averages. To distinguish real annual growth (now about 2.4 ppm per year) from random noise, scientists average tens of thousands of clean-air observations and calculate the standard error of the annual mean. Outliers in the raw record (often from local biomass smoke or instrument drift) are flagged and excluded with a documented filter. Error bars on the published time series show plus or minus 0.2 ppm at the 95 per cent level, allowing the growth trend to be clearly distinguished from year-to-year variability.

Try this

Q1. A class of 30 students measures the period of a pendulum and records times of 1.40, 1.42, 1.41, 1.40, 1.39, 2.10, 1.41 seconds for one student. Identify the outlier using a simple criterion and justify whether it should be excluded. [3 marks]

Cue. 2.10 s is more than 2 standard deviations above the mean. Investigate (likely a mis-start of the stopwatch); document and exclude with justification.

Q2. Annual rainfall at Bourke is reported as 340 plus or minus 60 mm for the 2010s decade and 290 plus or minus 50 mm for the 2020s decade. Comment on whether this constitutes evidence of a drying trend. [3 marks]

Cue. Difference (50 mm) is less than the combined uncertainty (about 78 mm). Difference not significant from this dataset alone; longer time series needed.

Q3. A student measures the mass of 10 fresh leaves on a balance reading to 0.01 g. (a) State how to report the uncertainty in a single measurement. (b) Explain how to estimate the uncertainty in the mean. (c) Identify one source of systematic error in this experiment. [2+2+2 marks]

Cue. (a) plus or minus 0.005 g (half the smallest division). (b) Standard error: s divided by square root of n. (c) Balance not zeroed; leaf surface moisture; air currents in the lab.

Exam-style practice questions

Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2024 HSC5 marksA student measured the length of a metal rod five times and obtained: 24.6, 24.8, 24.7, 25.4, 24.7 cm. Process this data set and discuss sources of error.

Show worked answer →

A 5-mark answer needs identification of an outlier, calculation of mean and uncertainty, and analysis of sources of error.

Inspection: The value 25.4 cm is more than 3 standard deviations from the others and is a probable outlier. Confirm by repeating the measurement or excluding from the mean.
Excluding the outlier: values 24.6, 24.8, 24.7, 24.7 cm.
Mean: $(24.6 + 24.8 + 24.7 + 24.7) / 4 = 24.7$ cm.
Range: $24.8 - 24.6 = 0.2$ cm.
Uncertainty: $\pm 0.1$ cm (half the range, or the resolution of the ruler).
Reported value: $24.7 \pm 0.1$ cm.

Sources of error.

Random error. Slight variations in ruler alignment, parallax, ruler reading. Reduced by replication.
Systematic error. Ruler calibration drift, room temperature affecting metal length, observer bias. Reduced by calibration.
Outlier (25.4 cm) likely reflects a misread or transcription error.

Markers reward identification of the outlier, mean, range, uncertainty and a distinction between random and systematic error.

2022 HSC3 marksExplain the difference between random and systematic error, with an example of each.

Show worked answer →

A 3-mark answer needs both definitions and clear examples.

Random error: Unpredictable variation in measurements caused by chance fluctuations in the measurement process. Magnitude and direction vary between measurements. Random error affects precision and is reduced by averaging multiple repeats.
Example: Reading a thermometer with mercury that is settling, giving slightly different values each time you check. Repeats give 22.1, 22.3, 21.9, 22.2 degrees Celsius. The mean is reliable, individual readings vary.
Systematic error: Consistent bias in measurements that shifts all readings in the same direction. Caused by miscalibration, faulty equipment or methodological flaws. Systematic error affects accuracy and is not reduced by averaging.
Example: A balance reads 0.1 g too high on every measurement because it was not zeroed. Every recorded mass is 0.1 g greater than the true value, regardless of how many replicates are taken.
Reduction: Random error: average multiple measurements. Systematic error: calibrate or correct the bias.

Markers reward both definitions, examples that show the direction of error, and the reduction method for each.