Inquiry Question 3: How is the integrity of a scientific investigation judged?
Plan, source and acknowledge primary and secondary data appropriate to the investigation
A focused answer to the HSC Investigating Science Module 5 dot point on primary and secondary data. Covers the distinction, sourcing and acknowledging secondary data, evaluating source quality, and worked HSC past exam questions.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
NESA wants you to define primary and secondary data, identify which is appropriate for a given investigation, and evaluate the reliability of secondary data sources. Source evaluation is examined every year because it underpins evidence-based reasoning.
The answer
Scientific investigations use two distinct types of data, and most strong investigations combine them.
Primary data
Data collected directly by the investigator through observation, measurement, experiment or survey. The investigator controls how the data is gathered.
Examples.
- A student measures the pH of soil samples from three sites.
- A CSIRO scientist records seabird counts on Lord Howe Island.
- A clinical trial team measures patient outcomes in a new drug trial.
Strengths. Targeted to the investigator's hypothesis, full control over methodology, known provenance.
Limitations. Limited by the investigator's time, budget, equipment and access. Cannot retrospectively gather historical data.
Secondary data
Data collected by other researchers and accessed through published sources. The investigator did not gather the data themselves.
Examples.
- Bureau of Meteorology climate records dating back over a century.
- ABS census data on Australian population demographics.
- Peer-reviewed journal articles reporting laboratory measurements.
- AIHW health statistics (Australian Institute of Health and Welfare).
- Genome databases such as NCBI GenBank.
Strengths. Allows investigation of large-scale patterns, long time series, rare events or expensive measurements the investigator cannot replicate. Provides context for primary findings.
Limitations. The investigator did not control the methodology and must rely on the original researchers' reported methods. Risk of unreliable or biased sources if not evaluated carefully.
Sourcing secondary data
Trusted Australian sources for HSC investigations:
- CSIRO (csiro.au): peer-reviewed and government-vetted research.
- Bureau of Meteorology (bom.gov.au): authoritative climate and weather data.
- Australian Bureau of Statistics (abs.gov.au): population and economic data.
- Australian Institute of Health and Welfare (aihw.gov.au): national health statistics.
- NHMRC (nhmrc.gov.au): medical research guidelines and outcomes.
- PubMed and Google Scholar: peer-reviewed biomedical literature.
Acknowledging sources
Every secondary source used in an investigation must be cited. Standard formats include:
- APA referencing for journal articles: Author, A. (Year). Title. Journal, Volume(Issue), Pages.
- In-text citation when paraphrasing or quoting.
- Bibliography or reference list at the end of the report.
Failure to acknowledge sources is academic misconduct and a form of intellectual theft.
Evaluating secondary data quality
The CRAAP test (or similar) is the standard framework.
| Criterion | What to check |
|---|---|
| Currency | When was it published? Is current data relevant? |
| Relevance | Does it answer the inquiry question directly? |
| Authority | Who is the author? What is their expertise and institution? |
| Accuracy | Is the methodology described? Was it peer reviewed? |
| Purpose | Why was it written? Is there a conflict of interest? |
Worked example of combined use
Investigating drought trends in the Murray-Darling Basin:
- Primary data. A student measures rainfall, soil moisture and vegetation cover at a single site over three months.
- Secondary data. BOM rainfall records for the basin from 1900 to 2024, MDBA water-level data, peer-reviewed papers on climate-driven streamflow change.
The primary data anchors the investigation in real measurements; the secondary data provides historical context and the scale required to detect long-term trends.
Examples in context
Example 1. Black Summer 2019-20 smoke chemistry. After the Black Summer bushfires, CSIRO researchers studying smoke impacts on Sydney air combined primary data (filter samples collected from rooftop sites at Macquarie University and at the CSIRO Aspendale lab) with secondary data (NSW EPA hourly PM2.5 readings going back five years, BOM weather records for wind back-trajectories, satellite aerosol optical depth from the NASA MODIS sensor). The primary filter samples revealed novel organic compounds not previously catalogued, while the EPA and BOM secondary data established the duration, geographic extent and meteorological drivers of the smoke event. Neither dataset alone could have answered the inquiry question; their combination produced a publishable account of an unprecedented atmospheric event.
Example 2. Ross River virus surveillance in northern NSW. NSW Health epidemiologists tracking Ross River virus outbreaks rely on a layered data architecture. Primary data comes from sentinel mosquito traps (CDC light traps run weekly at 15 north-coast sites) and from notifications by GPs of confirmed serological cases. Secondary data is drawn from BOM rainfall records (predicting mosquito hatches), MODIS satellite NDVI vegetation indices (proxy for mosquito habitat), and the NSW Notifiable Conditions Information Management System for historical case data. Each source has its own provenance, methodological limits and citation requirement under NSW Health data-sharing agreements. The combined evidence base supports targeted vector-control responses 3 to 4 weeks before clinical cases peak.
Try this
Q1. Distinguish primary, secondary and tertiary data sources, giving one Australian example of each. [3 marks]
- Cue. Primary: own measurements (e.g. school photosynthesis trial). Secondary: peer-reviewed papers or government datasets (CSIRO, BOM). Tertiary: textbooks, Wikipedia, summary review articles.
Q2. A student investigating sea-level rise downloads tide-gauge data from the Australian Baseline Sea Level Monitoring Project. State two checks they should make on the data before using it. [2 marks]
- Cue. Check the metadata (collection method, instrument, calibration), check for known gaps or station relocations that could introduce step changes.
Q3. A teacher assigns an inquiry on the link between water hardness and limescale in NSW kettles. (a) Outline one primary data source. (b) Outline one secondary data source. (c) State how the student should cite each in their report. [2+2+2 marks]
- Cue. (a) Titrate calcium ions in tap water samples from three suburbs. (b) Sydney Water annual drinking-water-quality report. (c) APA referencing for the report; tabulated raw data with method appendix for primary.
Exam-style practice questions
Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
2022 HSC4 marksDistinguish between primary and secondary data, and explain why both are necessary in scientific investigation.Show worked answer →
A 4-mark answer needs definitions, examples, and a reason why both are required.
- Primary data
- Data collected directly by the investigator through observation, measurement or experiment. Example: a student measures the rate of photosynthesis using elodea pondweed in their classroom.
- Secondary data
- Data collected by other researchers and accessed through published sources such as journal articles, government reports, databases or textbooks. Example: a student uses Bureau of Meteorology temperature records from 1900 to 2024 to investigate climate trends.
- Why both are necessary
Primary data is essential for testing a specific hypothesis under conditions the investigator controls, and for producing first-hand evidence.
Secondary data is essential for contextualising findings, drawing on data the investigator cannot generate (long time series, large samples, expensive instruments) and avoiding wasted duplication of effort.
Most real scientific investigations combine both. Climate science depends on primary measurements at weather stations combined with secondary historical archives. Medical research combines primary trials with secondary meta-analysis.
Markers reward both definitions, an example for each, and an explicit reason both are used together.
2024 HSC3 marksList three criteria you would use to evaluate the reliability of a secondary data source.Show worked answer →
A 3-mark answer needs three distinct criteria with brief explanation.
Authority and expertise. Is the source from a qualified scientist, a peer-reviewed journal, a government agency such as CSIRO or the Bureau of Meteorology, or an established research institution? Wikipedia and blogs are not primary sources and require cross-referencing.
Peer review and methodology. Was the work peer-reviewed before publication? Does the source describe the methodology in enough detail that another researcher could replicate it? Reliable sources show their working.
Currency and conflict of interest. Is the data current and relevant to the investigation? Is the source free from undisclosed funding bias (for example, tobacco industry-funded research on smoking, or fossil-fuel-funded climate research)? Disclosure of conflicts is essential.
Other valid criteria include corroboration with other independent sources, reputation and citation count of the publishing journal, and statistical robustness. Markers reward three distinct criteria with reasoning.
