← Year 12: Statistical Analysis
What is the difference between interpolation and extrapolation, and why is extrapolation less reliable?
Distinguish between interpolation and extrapolation when using a regression line, and assess the reliability of predictions
A focused answer to the HSC Maths Standard 2 dot point on interpolation vs extrapolation. The reliability of predictions inside and outside the data range, examples of when extrapolation breaks down, and Australian-context worked examples.
Have a quick question? Jump to the Q&A page
What this dot point is asking
NESA wants you to classify a prediction from a regression line as interpolation (inside the data range) or extrapolation (outside it), and to comment on the reliability of each. This is a standard exam question that appears in almost every paper.
The answer
Interpolation
Interpolation is making a prediction at an value inside the range of the observed data. If the data covers to , then any prediction with between and is interpolation.
Interpolation is generally reliable, provided:
- The scatterplot shows a clear linear pattern.
- The correlation coefficient is moderately strong or stronger.
- There are no extreme outliers driving the fit.
Extrapolation
Extrapolation is making a prediction at an value outside the range of the observed data. If the data covers to , then a prediction at or is extrapolation.
Extrapolation is generally less reliable because:
- The pattern observed in the data may not continue beyond the data range.
- New factors may dominate at extreme values (saturation, exhaustion of supply, regime change, physical limits).
- The relationship may be non-linear at the extremes even if it looks linear in the middle.
How to comment on reliability
In the exam, always:
- State whether the prediction is interpolation or extrapolation.
- Comment on whether the relationship is likely to continue (give a context-specific reason).
- Mention the data range explicitly.
How far is too far?
Mild extrapolation (just beyond the data range) is sometimes acceptable. Substantial extrapolation (significantly beyond) is usually unreliable. The HSC will rarely test you on numerical thresholds; it tests whether you recognise extrapolation when you see it.
Worked examples of extrapolation breakdowns
- Population growth. Linear extrapolation may overshoot because of housing or resource limits.
- Athletic records. Linear improvement in running times cannot continue past human biology limits.
- Compound investments. A linear model is a poor fit; the underlying process is exponential.
- Children's height. Linear growth from age to cannot extrapolate to age , because growth stops in adolescence.
Past exam questions, worked
Real questions from past NESA papers on this dot point, with our answer explainer.
2022 HSC Q173 marksThe least-squares regression line is computed from a dataset where ranges from to . Classify the following predictions as interpolation or extrapolation, and comment on reliability. (a) . (b) . (c) .Show worked answer →
(a) is inside the data range , so this is interpolation. Predictions inside the range are generally reliable, assuming the regression line is a good fit.
(b) is outside the data range, so this is extrapolation. The prediction is less reliable because the linear relationship may not extend beyond the observed range.
(c) is well outside the data range and may also be physically implausible depending on context. This is extrapolation and is the least reliable prediction.
Markers reward correct classification and a brief reliability comment for each.
2023 HSC Q223 marksA regression line for the population of a regional Australian town from to predicts population growth. Use the line to predict the population in and discuss whether the prediction is reasonable.Show worked answer →
Substituting into the line gives a numerical prediction (the question's specific equation would supply the number).
This is an extreme extrapolation, years beyond the latest data point. Possible reasons the prediction may be wrong: changes in regional employment, climate impact on agriculture, policy changes, immigration patterns, or saturation of available housing.
Linear growth over years extrapolated from years of data is rarely reliable. State the predicted value but caveat strongly that the model assumes the past linear trend continues indefinitely, which is unlikely.
Markers reward identification as extrapolation, the prediction, and at least two specific reasons it may be unreliable.
Related dot points
- Construct and interpret scatterplots to describe the relationship between two variables in bivariate data
A focused answer to the HSC Maths Standard 2 dot point on scatterplots. Reading form, direction and strength of association, identifying outliers, and worked Australian examples using ABS-style economic and demographic data.
- Calculate and interpret Pearson's correlation coefficient using statistical technology, including the sign and magnitude
A focused answer to the HSC Maths Standard 2 dot point on Pearson's correlation coefficient. What measures, how to interpret its sign and magnitude, the limitations of in non-linear relationships, and how to compute it using calculator statistics functions.
- Find and use the equation of the least-squares regression line to model a linear relationship between two variables
A focused answer to the HSC Maths Standard 2 dot point on the least-squares regression line. The equation , finding the gradient and intercept using calculator statistics functions, interpreting the gradient in context, and worked Australian examples.