What is the difference between reliability and validity in HSC Investigating Science?

Reliability is the consistency of repeated measurements: a reliable investigation produces similar results when repeated under the same conditions. Validity is whether the investigation tests what it claims to test: a valid design isolates one independent variable, holds controlled variables constant and uses a control group. An investigation can be reliable without being valid (consistently measuring the wrong thing), but it cannot be valid without being reliable.

How do I tell the difference between accuracy and precision?

Accuracy is how close a measurement is to the true or accepted value, and it is degraded by systematic error such as a balance that is not zeroed. Precision is how close repeated measurements are to each other, regardless of the true value, and it is degraded by random error. A thermometer reading consistently 2 degrees too high is precise but not accurate. The two properties are independent.

What are independent, dependent and controlled variables?

The independent variable is the one factor the researcher deliberately changes, usually set at several levels. The dependent variable is the factor measured to see how it responds. Controlled variables are the factors held constant across all groups so that any change in the dependent variable can be attributed to the independent variable rather than to a confounder.

What makes a good hypothesis in Investigating Science?

A good hypothesis is a testable, falsifiable statement that predicts a relationship between the independent and dependent variables. It should be specific and measurable, for example 'increasing caffeine concentration increases Daphnia heart rate', not vague like 'caffeine affects animals'. A hypothesis that cannot be disproved by any conceivable observation is not scientific.

How is Module 5 examined in the HSC?

Module 5 underpins the whole paper. Expect multiple choice on the four data-quality terms, short-answer questions asking you to identify variables and design a valid method, and questions asking you to process a small data set, identify outliers, calculate a mean and uncertainty, and distinguish random from systematic error. Use the precise vocabulary and worked examples markers reward.

§-Study guide

NSWInvestigating Science

HSC Investigating Science Module 5 Scientific Investigations: deep-dive 2026 guide

Deep-dive on HSC Investigating Science Module 5 Scientific Investigations. Hypotheses, independent, dependent and controlled variables, reliability, validity, accuracy, precision, error, and how to design and evaluate a depth study using the methods NESA examiners reward.

Generated by Claude Opus 4.814 min readNESA-INS-MOD-5Updated 2026-06-30

Reviewed by: AI editorial process; not yet individually human-reviewed

Jump to a section

How Module 5 fits into HSC Investigating Science
Inquiry questions and hypotheses
Variables: the structure of a valid experiment
Replication, randomisation and blinding
The four data-quality terms
Processing data: error and uncertainty
Primary versus secondary data
Peer review and reproducibility
Risk assessment and ethics
Designing and evaluating a depth study
Common HSC Module 5 examiner traps
Check your knowledge

How Module 5 fits into HSC Investigating Science

Module 5, Scientific Investigations, is the foundation of the whole course. It is where you learn the language and the logic of the scientific method: how to form a hypothesis, design a valid investigation, collect and process data, and judge the quality of evidence. Modules 6, 7 and 8 then apply this machinery to technologies, claims and society, so a marker who sees you misuse "reliable" or "valid" in Module 7 will assume you never mastered Module 5.

NESA examines Module 5 across every section of the paper. The four data-quality terms (reliability, validity, accuracy, precision) appear in multiple choice almost every year, variable identification appears in short answer, and a data-processing question that asks you to find a mean, an uncertainty and an outlier is a recurring 4 to 5 mark item. This guide works through the concepts in the order an investigation actually uses them.

Inquiry questions and hypotheses

A scientific investigation begins with an inquiry question and a hypothesis.

An inquiry question frames what you want to find out, for example "Does caffeine concentration affect the heart rate of Daphnia?"

A hypothesis is a testable, falsifiable prediction of the relationship between the independent and dependent variables. A strong hypothesis is:

Specific and measurable. "Increasing caffeine concentration increases Daphnia heart rate" beats "caffeine affects animals".
Falsifiable. There must be a possible result that would prove it wrong.
Directional where justified. State the expected direction of the effect when prior knowledge supports it.

Variables: the structure of a valid experiment

A controlled experiment changes one thing, measures the result, and holds everything else constant. Three variable types structure this design.

Independent variable (IV). The one factor the researcher deliberately changes, normally set at several levels (for example 0, 0.1, 0.5 and 1.0 per cent caffeine).
Dependent variable (DV). The factor measured to see how it responds (Daphnia heart rate in beats per minute).
Controlled variables. The factors held constant across all groups (water temperature, light, organism age and size, acclimatisation time, the observer counting beats).

A control group is a sample treated identically except that it receives no treatment, or a placebo. The control establishes the baseline against which treated groups are compared. Without it you cannot attribute a result to the IV.

A valid experiment isolates one independent variable, measures one dependent variable, holds controlled variables constant, and compares treated groups against a control group.

Replication, randomisation and blinding

A single measurement per condition cannot account for variability, so investigations use:

Replication. Multiple individuals per condition (often 5 to 10) and repeating the whole experiment (often three times) to average out random variation.
Randomisation. Assigning subjects to groups by chance to reduce selection bias.
Blinding. Preventing the subject (single-blind) or both the subject and the researcher (double-blind) from knowing the group assignment, which removes the placebo effect and observer expectancy. Double-blind is the gold standard in clinical trials.

The four data-quality terms

This is the most heavily tested vocabulary in the course. The four terms are independent properties: an investigation can be strong on some and weak on others.

Validity. Whether the investigation tests what it claims to test. Threatened by confounders, sampling bias and instruments that measure the wrong thing. Improved by better design and more controls.
Reliability. The consistency of repeated measurements. Threatened by random error and inconsistent technique. Improved by more replicates and a standardised procedure.
Accuracy. How close a measurement is to the true or accepted value. Threatened by systematic error and calibration drift. Improved by calibrating against a reference standard.
Precision. How close repeated measurements are to each other, regardless of the true value. Threatened by random error and low-resolution instruments. Improved by better technique and finer instruments.

The classic mental model is a dartboard. High accuracy and high precision puts every dart in the bullseye. High precision but low accuracy gives a tight cluster off-centre (a systematic error). High accuracy but low precision scatters the darts around the bullseye so the average is right. Low on both scatters them randomly.

Accuracy is closeness to the true value (the bullseye); precision is closeness of repeats to each other. A systematic error moves the whole cluster off-centre; random error scatters it.

Processing data: error and uncertainty

Once data is collected you must process it. NESA expects you to:

Inspect for outliers. A value far from the others (commonly more than 2 to 3 standard deviations) may be a misread or transcription error. Note it, and either re-measure or justify excluding it.
Calculate a mean of the valid replicates.
State an uncertainty, often half the range of repeated readings or the resolution of the instrument, written as the mean plus or minus the uncertainty.
Use sensible significant figures, matching the precision of the instrument.

Two kinds of error must be distinguished:

Random error. Unpredictable scatter that varies in size and direction between readings (parallax, settling instruments). It degrades precision and is reduced by averaging many repeats.
Systematic error. A consistent bias in the same direction (an un-zeroed balance, a miscalibrated thermometer). It degrades accuracy, is not reduced by averaging, and is removed only by calibration.

Worked example: processing a data set

A student measures the length of a metal rod five times: 24.6, 24.8, 24.7, 25.4, 24.7 cm.

Inspect: The value 25.4 cm sits well above the cluster around 24.7 and is a probable outlier (likely a misread). Flag it and exclude after noting why.
Mean of the valid four: $(24.6 + 24.8 + 24.7 + 24.7)/4 = 24.7$ cm.
Range: $24.8 - 24.6 = 0.2$ cm, so the uncertainty is about $\pm 0.1$ cm (half the range, consistent with the ruler resolution).
Report: $24.7 \pm 0.1$ cm.
Sources of error: Random error from parallax and ruler alignment (reduce by averaging); systematic error from ruler calibration or thermal expansion of the metal (reduce by calibration); the 25.4 cm outlier most likely a transcription error.

Primary versus secondary data

A complete investigation distinguishes the data you collect yourself from data you draw on.

Primary data is collected first-hand by the investigator through observation or experiment. It is current and tailored to the inquiry question but limited by your equipment and time.
Secondary data is sourced from others, such as databases, published studies or Bureau of Meteorology records. It can be large-scale and long-term but you cannot vouch for how it was collected.

A depth study often combines both: primary data from your own measurements, validated against secondary data from a reputable source.

Peer review and reproducibility

The credibility of an investigation does not rest on the author alone. Peer review is the scrutiny of a method and findings by independent experts before publication, which filters out weak design and overstated claims. Reproducibility is the ability of independent researchers to repeat the methodology and obtain consistent results. A finding that cannot be reproduced is treated with caution regardless of how striking it first appeared. Together they are the self-correcting machinery that distinguishes science from a single unverified report.

Risk assessment and ethics

Before any first-hand investigation you must complete a risk assessment: identify hazards (chemicals, heat, biological material, electrical equipment), assess the likelihood and severity, and state control measures (personal protective equipment, ventilation, safe disposal). Investigations involving animals or humans require ethical approval, informed consent, and consideration of animal welfare under the relevant codes. Markers reward students who name a specific hazard and its specific control rather than writing "be careful".

Designing and evaluating a depth study

The Module 5 skills come together in the depth study, an extended student investigation. A strong depth study walkthrough has a predictable shape:

Inquiry question and falsifiable hypothesis.
Variable table. IV with its levels, DV with its units and instrument, and a thorough list of controlled variables.
Method with enough detail and replication for reproducibility, plus a control group.
Risk assessment naming specific hazards and controls.
Data processing with means, uncertainties and outlier handling.
Evaluation that explicitly judges the investigation's validity, reliability, accuracy and precision, and proposes improvements.

The evaluation is where marks are won. Do not just assert "my experiment was reliable". State why: "Three replicates per concentration produced readings within 2 beats per minute of each other, indicating high reliability; however validity was limited because room temperature was not actively controlled, a confounder that future work should hold constant with a water bath."

Worked example: evaluating an investigation's method

A student investigates whether the brand of antacid affects how much acid it neutralises. They add one tablet of each of three brands to 50 mL of 0.1 mol/L hydrochloric acid, measure the final pH once per brand, and conclude that Brand A is best because it gave the highest pH.

Validity: Partly valid. The IV (brand) and DV (final pH) are clear, and acid volume and concentration are controlled. But tablet mass was not controlled: if Brand A tablets are heavier, more antacid was added, confounding the comparison. The design should standardise the mass of antacid, not the number of tablets.
Reliability: Low. Only one measurement per brand means random error cannot be detected or averaged out. At least three replicates per brand are required.
Accuracy: Depends on the pH meter being calibrated against buffer solutions of known pH before use. If uncalibrated, all readings carry a systematic error.
Improvements: Standardise antacid mass; run at least three replicates per brand; calibrate the pH meter; control temperature. The conclusion that Brand A is best is not yet supported because the comparison is confounded and unreplicated.

Common HSC Module 5 examiner traps

Using "reliable" when you mean "valid", or "accurate" when you mean "precise". Markers penalise loose vocabulary.
Claiming more replicates improve validity. Replication improves reliability and precision; validity is about design.
Confusing controlled variables with the control group.
Forgetting to state an uncertainty or to handle an outlier in a data-processing question.
Writing "wear safety goggles, be careful" instead of naming a specific hazard and a specific control.

Check your knowledge

A mix of definitional, design and data-processing questions covering this topic. Answer all under exam conditions, then check against the solutions block.

Define reliability and validity, and explain the relationship between them with an example of an investigation that is reliable but not valid. (4 marks)
A researcher investigates whether caffeine concentration affects the heart rate of Daphnia. Identify the independent, dependent and at least three controlled variables, and state the role of the control group. (5 marks)
Distinguish between accuracy and precision. Using the data sets below, classify each as accurate, precise, both or neither, given a true value of 1.00 g/mL. (a) 1.00, 1.00, 1.01, 1.00. (b) 0.85, 0.86, 0.85, 0.86. (c) 0.90, 1.05, 0.95, 1.10. (5 marks)
A student measures a reaction time five times: 0.42, 0.41, 0.43, 0.58, 0.42 s. (a) Identify and justify any outlier. (b) Calculate the mean of the valid values. (c) State the uncertainty and report the value correctly. (d) Identify one random and one systematic source of error. (6 marks)
Explain the difference between primary and secondary data, giving one advantage and one limitation of each. (4 marks)
A class investigates whether a new fertiliser increases tomato yield but omits a control group. Explain why a control group is essential and what the omission prevents the class from concluding. (4 marks)
Distinguish between random and systematic error, state which data-quality property each affects, and describe how each is reduced. (4 marks)
A depth study tests whether light intensity affects the rate of photosynthesis in pondweed, measured by bubbles of oxygen per minute. The student takes one reading at each of three light intensities and concludes that higher light always increases photosynthesis. Evaluate the validity and reliability of this design and propose two improvements. (6 marks)

Solutions

Q1: Reliability is the consistency of repeated measurements: a reliable investigation yields similar results when repeated under the same conditions. Validity is whether the investigation tests what it claims to test, isolating the independent variable with controlled variables held constant and a control group. Relationship: an investigation can be reliable without being valid because it can consistently measure the wrong thing, but it cannot be valid without being reliable. Example: measuring plant growth with a ruler that reads consistently 1 cm too long, comparing different species in sun and shade, gives reliable (consistent) readings but is not valid (species variation is a confounder and the ruler measures the wrong length).
Q2: Independent variable: caffeine concentration applied to the Daphnia (for example 0, 0.1, 0.5, 1.0 per cent). Dependent variable: Daphnia heart rate in beats per minute, counted under a microscope. Controlled variables (any three): water temperature (about 18 degrees Celsius), light intensity, age and size of Daphnia, acclimatisation time before counting, and the observer doing the counts. Control group: Daphnia in pond water with zero caffeine, treated identically in every other way, establishing the baseline heart rate against which treated groups are compared.
Q3: Accuracy is how close a measurement is to the true or accepted value; precision is how close repeated measurements are to each other regardless of the true value. (a) Both accurate (close to 1.00) and precise (tightly clustered). (b) Precise (tight cluster) but not accurate (a systematic offset of about 0.15 below the true value). (c) Accurate on average (mean about 1.00) but not precise (wide scatter from random error).
Q4: (a) The value 0.58 s is well above the cluster near 0.42 s and is a probable outlier, likely a slow response or misread; flag and exclude after noting why. (b) Mean of the valid four: $(0.42 + 0.41 + 0.43 + 0.42)/4 = 0.42$ s. (c) Range of the valid values is $0.43 - 0.41 = 0.02$ s, so uncertainty is about $\pm 0.01$ s; report as $0.42 \pm 0.01$ s. (d) Random error: variation in human reaction and timing (reduced by averaging repeats). Systematic error: a stopwatch that starts late or an observer who consistently anticipates the signal (reduced by calibration or an electronic gate).
Q5: Primary data is collected first-hand by the investigator through observation or experiment; secondary data is sourced from others such as databases or published studies. Primary data advantage: tailored exactly to the inquiry question and current; limitation: constrained by your equipment, time and sample size. Secondary data advantage: can be large-scale and long-term (for example decades of Bureau of Meteorology records); limitation: you cannot verify how it was collected or control its quality.
Q6: A control group is a sample treated identically to the experimental groups except that it receives no fertiliser (or a placebo of equal volume). It establishes a baseline yield in the absence of treatment, isolating the fertiliser's effect from confounders such as soil quality, watering, sunlight and weather. Without a control, any high yield could be due to favourable conditions rather than the fertiliser, so the class cannot conclude that the fertiliser caused the result. Correlation without a baseline comparison is not evidence of effect.
Q7: Random error is unpredictable scatter that varies in size and direction between readings (parallax, a settling instrument); it degrades precision and is reduced by averaging many repeats. Systematic error is a consistent bias in the same direction (an un-zeroed balance, a miscalibrated meter); it degrades accuracy, is not reduced by averaging, and is removed by calibrating against a reference standard.
Q8: Validity: partly valid because the IV (light intensity) and DV (bubbles per minute) are clear, but the conclusion "higher light always increases photosynthesis" overreaches. At very high intensity the rate plateaus or falls, and confounders such as water temperature (lamps heat the water) and dissolved CO2 are not controlled, so the relationship is not isolated. Reliability: low, because a single reading at each intensity cannot detect or average random error. Improvements (any two): take at least three replicate readings per intensity and average them; control temperature with a heat shield or water bath; control CO2 by using fresh sodium bicarbonate solution; test more intensities to reveal whether the rate plateaus rather than assuming a linear relationship.

investigating-science
scientific-investigations
variables
reliability
validity
accuracy
precision
depth-study
hsc-investigating-science
year-12
2026