What are the three error categories?

Random error. Unpredictable fluctuations: human reaction time when starting a stopwatch, small variations in mixing, small variations in lighting or temperature. Reduced by averaging over many replicates. Random error affects precision.

Whether the method actually tests the hypothesis. A pH investigation with no buffering is not a valid test of enzyme activity at different pH (the pH drifts during the reaction). A measurement of "how fast plants grow" by counting leaves is not a valid measure of biomass (leaf area, dry mass, or height would be more valid).

The consistency of the measurement. If repeating the procedure produces similar results, the measurement is reliable. Reliability is improved by replication (multiple trials per condition), standardised method, and controlling variables.

How close repeated measurements are to one another. High precision means the values cluster tightly. Note that precision is independent of accuracy: a balance reading 50.000 g for a 25 g mass is precise (consistent to 0.001 g) but inaccurate.

How close a measurement is to the true value. Accuracy is improved by calibration (zeroing balances, calibrating thermometers, validating pH probes with reference buffers) and by minimising systematic bias.

What is random error?

Unpredictable fluctuations: human reaction time when starting a stopwatch, small variations in mixing, small variations in lighting or temperature. Reduced by averaging over many replicates. Random error affects precision.

What is systematic error?

A consistent bias in one direction: an uncalibrated balance reading 0.5 g high, a stopwatch that runs slow, a pH probe that reads 0.2 units high. Reduced by calibration and by checking instruments against a known standard. Systematic error affects accuracy.

A one-off mistake: misreading the instrument, contaminating a sample, transposing a number when recording. Reduced by carefully recording at the time of measurement and by double-checking. Gross errors usually appear as outliers that should be investigated, not discarded silently.

§-Syllabus dot point

VICBiologySyllabus dot point

How are errors identified, quantified and discussed in a Unit 4 AoS 3 investigation?

Evaluate the validity, reliability, precision and accuracy of the student-designed investigation, identify sources of error, and propose improvements grounded in the data

A focused VCE Biology Unit 4 AoS 3 answer on evaluating the investigation. Defines validity, reliability, precision and accuracy in VCAA's sense; categorises sources of error (random, systematic, gross); walks through worked examples of error analysis on enzyme and ecology investigations.

Generated by Claude Opus 4.89 min answerUpdated 2026-07-05

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this sub-topic is asking
The answer
Examples in context
Try this

What this sub-topic is asking

VCAA's Key Science Skill 5 expects you to evaluate the data and method. The evaluation has to use VCAA's four terms (validity, reliability, precision, accuracy) correctly, identify the specific sources of error in your investigation, and link the limitations back to the conclusion you can defend. This page covers the four terms, the error categories, and how to write the evaluation section that markers reward.

The answer

The evaluation is the most rewarded section of the poster after the discussion. Many investigations have plausible methods and reasonable conclusions; the ones that score in the top band have evaluations that name specific limitations, quantify their impact where possible, and propose realistic improvements.

The four VCAA terms

Validity: Whether the method actually tests the hypothesis. A pH investigation with no buffering is not a valid test of enzyme activity at different pH (the pH drifts during the reaction). A measurement of "how fast plants grow" by counting leaves is not a valid measure of biomass (leaf area, dry mass, or height would be more valid). Validity is about the link between what you measured and the construct you wanted.
Reliability: The consistency of the measurement. If repeating the procedure produces similar results, the measurement is reliable. Reliability is improved by replication (multiple trials per condition), standardised method, and controlling variables. A single trial per condition cannot be assessed for reliability.
Precision: How close repeated measurements are to one another. High precision means the values cluster tightly. Note that precision is independent of accuracy: a balance reading 50.000 g for a 25 g mass is precise (consistent to 0.001 g) but inaccurate.
Accuracy: How close a measurement is to the true value. Accuracy is improved by calibration (zeroing balances, calibrating thermometers, validating pH probes with reference buffers) and by minimising systematic bias.

The three error categories

Random error: Unpredictable fluctuations: human reaction time when starting a stopwatch, small variations in mixing, small variations in lighting or temperature. Reduced by averaging over many replicates. Random error affects precision.
Systematic error: A consistent bias in one direction: an uncalibrated balance reading 0.5 g high, a stopwatch that runs slow, a pH probe that reads 0.2 units high. Reduced by calibration and by checking instruments against a known standard. Systematic error affects accuracy.
Gross error: A one-off mistake: misreading the instrument, contaminating a sample, transposing a number when recording. Reduced by carefully recording at the time of measurement and by double-checking. Gross errors usually appear as outliers that should be investigated, not discarded silently.

Anomalies and outliers

A data point that lies far from the rest of the data needs investigation, not silent removal. The standard approach:

Check the logbook for any recorded oddity in that trial (apparatus problem, contamination, timing miss).
If a cause is identified, document the cause and exclude or rerun with explanation.
If no cause is identified, retain the point and acknowledge it in the discussion. Strong investigations may apply a statistical outlier rule (1.5 x IQR rule) and report the test result.

Quietly deleting a data point because it disagrees with the hypothesis is a research-integrity issue. The logbook trail should make any exclusion defensible.

Quantifying error where possible

A higher-band evaluation quantifies error rather than describing it qualitatively. Examples:

A manual stopwatch contributes approximately +/- 0.2 seconds per timing; over a 30-second measurement that is around 0.7 percent.
An analytical balance reads to +/- 0.001 g; on a 10 g sample that is 0.01 percent.
A 10 mL graduated pipette is typically +/- 0.05 mL precision; on a 10 mL aliquot that is 0.5 percent.
A pH probe is typically +/- 0.05 pH units; on a 4 to 8 pH range the relative error is small but the consequence for an enzyme assay can still be substantial.

Propagating these into the final result (or at least noting the dominant error source) is what separates an evaluation from a list of caveats.

Worked example

An enzyme assay reports peak catalase rate at pH 7 with three replicates per pH (2, 4, 6, 7, 8, 10).

Validity. The buffered conditions test the catalase rate at the intended pH (good validity). The substrate concentration is held constant across pH (good). Mass of liver homogenate is the same in each tube (good). Measurement of oxygen production by bubble counting is a proxy for rate; it is valid for relative comparisons but not for absolute kinetic constants. Note this limitation.
Reliability. Three replicates per pH allow a mean and standard deviation, so reliability is assessable. Standard deviation at pH 7 is small relative to the mean; at pH 2 it is large because rate is near zero (random counting noise dominates). Reliability is reasonable for the middle pH range and weaker at the extremes.
Precision. Bubble counts can vary by +/- 2 bubbles per 30 seconds in the same tube, partly because the bubbles vary in size. The precision is acceptable for relative comparisons across pH.
Accuracy. Without a reference standard rate, accuracy is hard to evaluate. The pH probe was calibrated with pH 4 and pH 7 buffer before measurement; bias is therefore expected to be small in the middle range but may grow at pH 2 and pH 10.
Dominant error sources. (1) bubble counting introduces random error roughly +/- 5 to 10 percent at high rate; (2) temperature drift during a 30-minute set of trials (room temperature could vary by 1 to 2 degrees) introduces systematic error of perhaps 5 to 10 percent on enzyme rate (Q10 effect); (3) liver freshness across the trial day affects activity.
Improvements. Use a gas-collection cylinder with volume measurement (replaces bubble counting); add a water-bath at 25 degrees Celsius to control temperature; use the same liver sample for all pH on the same day; add more replicates (5 instead of 3) at the extremes.

Common traps

Confusing precision and accuracy: They are independent. A balance reading 50.000 g for a 25 g mass is precise but inaccurate. A balance reading 24, 25, 26 g for the same mass is accurate on average but imprecise.
Saying "human error" without identifying the actual error: Markers reject "human error" as a non-answer. Identify what the human did wrong (stopwatch timing variation, pipetting variation, observation lag) and quantify if you can.
Recommending more replicates as the only improvement: Always say "more replicates" plus at least one specific method change (better equipment, better controls, different measurement).
Dropping anomalous data points silently: Document the anomaly. Investigate. Either explain why you excluded or retain with discussion. Silent removal is a research-integrity issue.
Treating validity and reliability as interchangeable: A method can be reliable (consistent) and invalid (measuring the wrong thing). Or valid (measuring the right thing) and unreliable (inconsistent results). Address them separately.
Confusing systematic and random error: Random error spreads results around the mean; systematic error shifts the mean. Different fixes (averaging vs calibration).

Examples in context

Example 1. A bioinformatics investigation has no instrument error but still needs evaluation. A sequence-comparison investigation using UniProt has no apparatus precision issues. Its limitations are validity (does percent sequence identity actually measure relatedness, given that some proteins are highly conserved across distant species?) and sample bias (the species sampled may not represent the diversity of the protein family). The evaluation should address these specifically rather than transplant lab-style error talk.

Example 2. The catalase pH curve as a teaching exemplar. Enzyme kinetics investigations are typical Unit 4 AoS 3 candidates. The error analysis on a pH curve is a good test of the evaluation skill because the dominant errors (temperature drift, pH buffer accuracy, enzyme freshness, bubble-counting variation) are specific and quantifiable. Many examiner reports across years cite this kind of investigation as one where the evaluation can be done at top-band depth.

Try this

Q1. Distinguish between validity and reliability with a biological example of each. [4 marks]

Cue. Validity = method tests the hypothesis (e.g. using leaf area to measure plant growth is more valid than counting leaves). Reliability = consistency of measurement on repeats (e.g. five replicate trials at the same condition give similar results).

Q2. A student investigates the effect of light intensity on the rate of photosynthesis using oxygen bubble counting from elodea. Identify two sources of systematic error and one source of random error, and propose one improvement for each. [6 marks]

Cue. Systematic: light from the room contributing to the measured intensity (improvement: cover the apparatus); temperature drift during the trial day (improvement: use a water-bath). Random: bubble-counting variation due to different bubble sizes (improvement: use volumetric oxygen measurement or photosynthesis chamber).

Q3. A student reports peak enzyme activity at pH 7 based on three trials per pH. The pH 6 mean is higher than expected and the standard deviation is large. Critique the data and recommend next steps. [4 marks]

Cue. Large SD at pH 6 reduces confidence in the mean. Check the logbook for any methodological oddity at pH 6 (buffer freshness, liver freshness, timing). Recommend repeating pH 6 with additional replicates and verified buffer; consider whether the pH 6 buffer was correctly mixed.

Exam-style practice questions

Practice questions written in the style of VCAA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2023 VCAA1 marksStudents completed their investigation and analysed their results. They suggested their results were affected by systematic errors. Systematic errors A. result in a spread of readings. B. affect the precision of a measurement. C. are easy to identify and eliminate. D. cause readings to differ from the true value by a consistent amount each time.

Show worked answer →

The answer is D.

A systematic error shifts every reading in the same direction by the same (or proportional) amount, for example an uncalibrated balance that always reads 0.2 g high. Because the offset is consistent, it affects accuracy (how close readings are to the true value), not precision (how close repeated readings are to each other).

Why the others are wrong: A and B describe random error, which produces scatter and reduces precision. C is false because systematic errors are often hard to detect precisely because the data still looks consistent and repeatable; you usually need calibration against a known standard to reveal them.

Exam tip: repeating the experiment does NOT remove a systematic error - it just gives you a consistently wrong mean. Only fixing the cause (recalibration, corrected technique) removes it.

2023 VCAA1 marksStudents designed a controlled experiment. After they had performed the experiment, another group of students gave them feedback suggesting that they should modify the experiment to improve the accuracy of their results. A change that the first group of students could make to improve the accuracy of their results could include A. ignoring outlying results. B. repeating the experiment many times. C. carefully calibrating the equipment used. D. having many people take the measurements.

Show worked answer →

The answer is C.

Accuracy is how close a measurement is to the true value. Calibrating the equipment against a known standard removes systematic offsets and brings readings closer to the true value, so it improves accuracy.

Why the others are wrong: B (repeating many times) improves reliability and the precision of the mean, but if there is a systematic error the mean is still inaccurate. A (ignoring outliers) is poor practice unless an outlier has a documented cause, and it does not address accuracy. D (many people measuring) tends to introduce more variation between observers, which can reduce precision rather than improve accuracy.

Watch the wording: VCAA separates accuracy (closeness to true value) from reliability/precision (consistency of repeats). The verb in the stem tells you which one to target.

2017 VCAA1 marksDuring the experiment, the student measured the varying pH levels using a digital pH meter. The student calibrated the meter using a pH 7 buffer solution. The reason the student calibrated the pH meter was to A. ensure a random error would not influence the results. B. eliminate the effect of all uncontrolled variables. C. enable the use of the instrument with precision. D. allow the pH to be measured accurately.

Show worked answer →

The answer is D.

Calibrating against a known pH 7 buffer sets the instrument's reference point so its readings match true pH values. This targets a systematic error in the instrument and therefore improves the accuracy (closeness to the true value) of every subsequent measurement.

Why the others are wrong: A is incorrect because calibration corrects a consistent (systematic) offset, not the random scatter between repeats. B is too broad - calibration only addresses the instrument, not every uncontrolled variable in the experiment. C confuses precision (consistency of repeats) with accuracy; calibration does not make the readings more closely clustered, it makes them closer to the true value.