Skip to main content
ExamExplained
NSW · Investigating Science
Investigating Science study scene
§-Syllabus dot point
NSWInvestigating ScienceSyllabus dot point

Inquiry Question 2: How do scientific claims become misinterpreted and how can scientific evidence be evaluated?

Evaluate the validity, reliability and accuracy of scientific evidence presented in claims, considering the hierarchy of evidence in medical research

A focused answer to the HSC Investigating Science Module 7 dot point on evaluating evidence. Covers the hierarchy of evidence, what each level contributes, how to identify weak claims, and worked HSC past exam questions on medical and scientific reporting.

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. The answer
  3. Examples in context
  4. Try this

What this dot point is asking

NESA wants you to evaluate scientific evidence according to the hierarchy of evidence, identify when a claim is overstated relative to the evidence, and apply the framework to media reporting of science. This is among the most heavily tested topics in Investigating Science.

The answer

Not all evidence is equal. The hierarchy of evidence ranks study designs by their ability to establish reliable claims about cause and effect.

The hierarchy of evidence

From strongest to weakest:

1. Systematic reviews and meta-analyses.

Statistical synthesis of all available studies on a question, with explicit inclusion criteria and quality assessment. Cochrane reviews and NHMRC clinical guidelines are the gold standard. A meta-analysis can detect small effects invisible in single studies and quantify how reliable the combined evidence is.

2. Randomised controlled trials (RCTs).

Participants are randomly assigned to treatment or control groups, ideally blinded and double-blinded. Randomisation balances confounders across groups, so any difference in outcome can be attributed to the treatment. Modern RCTs are pre-registered (protocol locked before data collection).

3. Cohort studies.

A group of people followed over time, with exposures recorded prospectively. Useful for studying long-term outcomes. Cannot establish causation alone because confounders may be unequally distributed between exposure groups. Australian examples: the 45 and Up Study (260,000 NSW participants), the Australian Longitudinal Study on Women's Health.

4. Case-control studies.

Compare people with a disease to similar people without it, retrospectively asking about exposures. Cheap and fast but vulnerable to recall bias and selection bias.

5. Cross-sectional studies and surveys.

Snapshot of a population at one moment. Useful for prevalence and association but cannot establish temporal sequence (which came first, the exposure or the outcome).

6. Case reports and case series.

A single patient or small group. Hypothesis-generating only. Useful for novel diseases (the first reports of HIV in 1981 were case series).

7. Expert opinion and anecdote.

The lowest level. Important for context and clinical experience, but a single doctor's clinical impression is not evidence of an effect.

What each level rules out

A claim about cause and effect requires study designs that can rule out:

  • Confounders. Variables associated with both exposure and outcome.
  • Reverse causation. The outcome causing the exposure, not vice versa.
  • Selection bias. Sampling that does not represent the target population.
  • Chance. Random variation producing apparent effects.
  • Publication bias. Tendency to publish positive results.
Study type Confounders Reverse causation Chance
Meta-analysis Best Best Best
RCT Best Best Good
Cohort Limited Good Limited
Case-control Poor Limited Limited
Cross-sectional Limited Poor Limited
Case report None None None

Applying the hierarchy to a media claim

When a news headline claims "X causes Y":

  1. What study is cited? A single study? A meta-analysis?
  2. What study type? RCT? Cohort? Case report?
  3. What is the sample size? Hundreds of thousands give confidence; dozens do not.
  4. Is the effect large? Effect sizes (relative risk, odds ratio) matter as much as significance.
  5. Has it been replicated? Single-study claims are provisional.
  6. Who funded it? Conflict of interest can shape conclusions.

NHMRC and the Australian context

The National Health and Medical Research Council uses the hierarchy of evidence to set Australian clinical guidelines. NHMRC publishes evidence grading:

  • Level I. Systematic review of RCTs.
  • Level II. Properly designed RCT.
  • Level III. Pseudo-randomised or comparative studies.
  • Level IV. Case series.

Each guideline cites the evidence level supporting it. Practitioners are expected to weigh recommendations accordingly.

Common situations the hierarchy clarifies

A new "miracle drug" reported in the news
Usually a Phase II trial or even pre-clinical animal data. Promising but provisional. Treatment in patients requires Phase III RCTs and regulatory approval (TGA in Australia, FDA in the US).
Diet and cancer risk claims
Usually based on observational cohort studies. Associations are real but confounders are common. Strong dietary recommendations need RCTs (rare for diet because of compliance challenges) or very consistent observational evidence with biological mechanism (e.g. processed meat and bowel cancer).
Vitamin and supplement claims
Industry-funded short trials may show effects. Independent meta-analyses (e.g. of vitamin C and the common cold) often show no clinical benefit.

When evidence is uncertain

Even the highest levels of evidence can be uncertain. RCTs may be too small, too short or conducted on a non-representative population. Meta-analyses depend on the quality of included studies. Honest scientists report uncertainty alongside best estimates.

The right response to uncertain evidence is not to claim certainty in the opposite direction but to communicate the uncertainty honestly. This was a major lesson of COVID-19 public communication.

Examples in context

Example 1. Hydroxychloroquine Australian ASCOT trial. During the early COVID-19 pandemic, hydroxychloroquine (HCQ) was widely promoted based on a French in vitro study and a small uncontrolled clinical case-series. Australia's ASCOT trial (Australasian COVID-19 Trial), coordinated by the Doherty Institute and running across multiple hospitals, was one of several large RCTs that tested whether HCQ improved outcomes in hospitalised COVID-19 patients. The trial used a pre-registered primary outcome (28-day mortality), randomisation and blinding. The result, combined with the UK RECOVERY trial (over 11,000 participants), found no benefit and possible harm. The evidence pyramid was applied: a Level I systematic review now ranks HCQ as not effective for COVID-19. The case shows how high-quality evidence rapidly overturned a claim based on low-quality evidence.

Example 2. Vitamin D supplementation and respiratory infection. The 2017 BMJ meta-analysis of 25 RCTs (over 11,000 participants) is a textbook example of Level I evidence. Individual trials varied in dose, duration and target population, with mixed results (some positive, some null). The meta-analysis combined the data, weighted by trial size and quality, and found a modest reduction in acute respiratory infections, larger in people who were vitamin D deficient at baseline. A 2022 Cochrane review confirmed the finding with high heterogeneity. The case illustrates that media reporting of any single trial ("vitamin D fights flu") usually overstates the effect, while the systematic review provides the genuinely interpretable estimate of the true population effect.

Try this

Q1. Rank these study designs from strongest to weakest evidence: case report, cohort study, randomised controlled trial, systematic review. [3 marks]

  • Cue. Systematic review > RCT > cohort > case report.

Q2. A magazine reports that "eating chocolate prevents heart disease, according to a new study." Outline three questions to ask before accepting the claim. [3 marks]

  • Cue. Study design (observational or RCT?); sample size and population; conflict of interest in funding; independent replication; effect size and clinical relevance.

Q3. A pharmaceutical company submits a new statin to the PBS. (a) Identify one type of evidence the PBAC requires. (b) Identify one limitation of relying only on the manufacturer's submitted data. (c) Identify one safeguard the PBAC applies. [2+2+2 marks]

  • Cue. (a) Phase III RCT comparing the new drug to existing standard of care. (b) Conflict of interest in trial design and selective reporting. (c) Independent Cochrane or NHMRC review; mandatory disclosure; post-marketing surveillance via AIHW.

Exam-style practice questions

Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2023 HSC5 marksA media article claims that 'drinking coffee reduces the risk of heart disease' based on a single observational study. Evaluate this claim.
Show worked answer →

A 5-mark answer needs evaluation of the study type, sample, confounders and an explicit judgement.

Study type
A single observational study is at the lower end of the hierarchy of evidence. It can show correlations but cannot establish causation. Compare with randomised controlled trials (RCTs) or systematic reviews, which are stronger.
Sample
A claim based on one study should ask: what was the sample size? Was it representative of the population the headline applies to? Were specific demographics over- or under-represented?
Confounders
Coffee drinkers tend to differ from non-drinkers in age, occupation, socioeconomic status, exercise and other dietary habits. Without statistical adjustment for these confounders, observed associations may reflect lifestyle differences rather than coffee itself.

Other limitations.

  • Recall bias. Participants self-reporting coffee consumption may misremember amounts.
  • Reverse causation. People with heart problems may drink less coffee on doctor's advice, creating an apparent protective effect.
  • Publication bias. A single study reporting positive results may have been selected from many studies showing no effect.

Judgement. The headline is not warranted by the evidence. A more accurate framing would be: "Coffee consumption is associated with lower heart-disease risk in this cohort, but causal inference requires randomised trials." Multiple large cohorts and a meta-analysis would be required for a stronger claim.

Markers reward identification of study type, named confounders, biases and a clear judgement.

2022 HSC4 marksOutline the hierarchy of evidence used in medical research and explain why it matters.
Show worked answer →

A 4-mark answer needs the hierarchy with examples, what each level contributes and the reasoning.

Hierarchy of evidence (strongest at top).

  1. Systematic reviews and meta-analyses. Statistically combine many studies. Example: Cochrane reviews. Strongest because they integrate all available evidence and quantify uncertainty.

  2. Randomised controlled trials (RCTs). Random assignment to treatment and control groups, ideally double-blinded. Establish causation by controlling for confounders.

  3. Cohort studies. Track groups over time. Can detect associations but not establish causation.

  4. Case-control studies. Compare people with and without a condition. Cheap and fast but vulnerable to recall bias.

  5. Cross-sectional studies. Snapshot at one time point. Limited inference.

  6. Case reports and case series. Single patient or small group. Useful for generating hypotheses but cannot establish prevalence or causation.

  7. Expert opinion and anecdote. Lowest level. Important for context but not evidence of fact.

Why it matters. Strong claims about treatments, diet or risk factors require evidence at the level of RCTs or systematic reviews. A claim based only on case reports or expert opinion is provisional. The hierarchy lets clinicians, regulators and journalists weigh competing claims and reject those based on weak evidence.

Markers reward at least four levels, the reasoning behind ordering and explicit application.

ExamExplained