Year 12: Statistical Analysis

NSWMaths Standard 2Syllabus dot point

What is the difference between interpolation and extrapolation, and why is extrapolation less reliable?

Distinguish between interpolation and extrapolation when using a regression line, and assess the reliability of predictions

A focused answer to the HSC Maths Standard 2 dot point on interpolation vs extrapolation. The reliability of predictions inside and outside the data range, examples of when extrapolation breaks down, and Australian-context worked examples.

Generated by Claude OpusReviewed by Better Tuition Academy6 min answer

Have a quick question? Jump to the Q&A page

What this dot point is asking

NESA wants you to classify a prediction from a regression line as interpolation (inside the data range) or extrapolation (outside it), and to comment on the reliability of each. This is a standard exam question that appears in almost every paper.

The answer

Interpolation

Interpolation is making a prediction at an xx value inside the range of the observed data. If the data covers x=2x = 2 to x=20x = 20, then any prediction with xx between 22 and 2020 is interpolation.

Interpolation is generally reliable, provided:

  • The scatterplot shows a clear linear pattern.
  • The correlation coefficient is moderately strong or stronger.
  • There are no extreme outliers driving the fit.

Extrapolation

Extrapolation is making a prediction at an xx value outside the range of the observed data. If the data covers x=2x = 2 to x=20x = 20, then a prediction at x=25x = 25 or x=0x = 0 is extrapolation.

Extrapolation is generally less reliable because:

  • The pattern observed in the data may not continue beyond the data range.
  • New factors may dominate at extreme values (saturation, exhaustion of supply, regime change, physical limits).
  • The relationship may be non-linear at the extremes even if it looks linear in the middle.

How to comment on reliability

In the exam, always:

  1. State whether the prediction is interpolation or extrapolation.
  2. Comment on whether the relationship is likely to continue (give a context-specific reason).
  3. Mention the data range explicitly.

How far is too far?

Mild extrapolation (just beyond the data range) is sometimes acceptable. Substantial extrapolation (significantly beyond) is usually unreliable. The HSC will rarely test you on numerical thresholds; it tests whether you recognise extrapolation when you see it.

Worked examples of extrapolation breakdowns

  • Population growth. Linear extrapolation may overshoot because of housing or resource limits.
  • Athletic records. Linear improvement in running times cannot continue past human biology limits.
  • Compound investments. A linear model is a poor fit; the underlying process is exponential.
  • Children's height. Linear growth from age 00 to 1010 cannot extrapolate to age 5050, because growth stops in adolescence.

Past exam questions, worked

Real questions from past NESA papers on this dot point, with our answer explainer.

2022 HSC Q173 marksThe least-squares regression line y=1.5x+4y = 1.5 x + 4 is computed from a dataset where xx ranges from 22 to 2020. Classify the following predictions as interpolation or extrapolation, and comment on reliability. (a) x=10x = 10. (b) x=25x = 25. (c) x=3x = -3.
Show worked answer →

(a) x=10x = 10 is inside the data range [2,20][2, 20], so this is interpolation. Predictions inside the range are generally reliable, assuming the regression line is a good fit.

(b) x=25x = 25 is outside the data range, so this is extrapolation. The prediction is less reliable because the linear relationship may not extend beyond the observed range.

(c) x=3x = -3 is well outside the data range and may also be physically implausible depending on context. This is extrapolation and is the least reliable prediction.

Markers reward correct classification and a brief reliability comment for each.

2023 HSC Q223 marksA regression line for the population of a regional Australian town from 19901990 to 20202020 predicts population growth. Use the line to predict the population in 20802080 and discuss whether the prediction is reasonable.
Show worked answer →

Substituting x=2080x = 2080 into the line gives a numerical prediction (the question's specific equation would supply the number).

This is an extreme extrapolation, 6060 years beyond the latest data point. Possible reasons the prediction may be wrong: changes in regional employment, climate impact on agriculture, policy changes, immigration patterns, or saturation of available housing.

Linear growth over 6060 years extrapolated from 3030 years of data is rarely reliable. State the predicted value but caveat strongly that the model assumes the past linear trend continues indefinitely, which is unlikely.

Markers reward identification as extrapolation, the prediction, and at least two specific reasons it may be unreliable.

Related dot points