What is the difference between interpolation and extrapolation, and why is extrapolation less reliable?
Distinguish between interpolation and extrapolation when using a regression line, and assess the reliability of predictions
A focused answer to the HSC Maths Standard 2 dot point on interpolation vs extrapolation. The data range as the dividing line, why predictions inside are reliable and outside are suspect, a stage-by-stage diagram of the zones, how the questions are worded, and Australian-context worked examples.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
What this dot point is asking
NESA wants you to do two things. First, classify a prediction from a regression line (the line of best fit) as interpolation (inside the data range) or extrapolation (outside it). Second, comment on how reliable each prediction is. This task shows up in almost every paper, usually attached to a least-squares question. The marks come not from the arithmetic but from naming the zone and giving a context-specific reason.
The answer
The one idea: the data range decides everything
Everything in this dot point hinges on a single boundary: the range of -values that were actually observed. Inside that range you have evidence for the trend, so a prediction interpolates and is generally trustworthy. Outside it you have no evidence. You only have an assumption that the line keeps going, so a prediction extrapolates and is shaky. The four-stage diagram below builds that picture up step by step: plot the data, fit the line, mark the zones, then contrast a safe prediction with a risky one.
Stage 1, plot the data and see its range. The scatter only covers a limited span of -values. That span, between the smallest and largest observed , is the observed data range, and it is the boundary that classifies every later prediction.
Stage 2, fit the line; inside is interpolation. The least-squares line is fitted to the data. Any prediction for an inside the shaded data range is interpolation: it falls among the points the line was actually built from, so the line has evidence there.
Stage 3, beyond the range is extrapolation. Extending the line past the data (shown dashed) reaches -values where nothing was ever observed. Predictions in those outer zones are extrapolation, and they rest entirely on the assumption that the straight-line trend continues, which the data cannot confirm.
Stage 4, reliable inside versus risky outside. A prediction read off inside the data range (solid point) is reliable: it is supported by nearby observations. A prediction read off well beyond the range (open point) is risky: no data backs it, and the real relationship may bend, flatten or break down out there.
Interpolation
Interpolation is a prediction at an inside the observed range. If the data covers to , then any prediction with between and interpolates.
It is generally reliable, provided:
- The scatterplot shows a clear linear pattern.
- The correlation coefficient is moderately strong or stronger.
- No extreme outlier is distorting the fit.
Even so, interpolation gives the best linear estimate, not a guarantee: individual points scatter around the line, so a single prediction can still be off by the typical residual.
Extrapolation
Extrapolation is a prediction at an outside the observed range. With data from to , a prediction at or extrapolates.
It is generally less reliable because:
- The pattern in the data need not continue past the range.
- New factors can dominate at extreme values: saturation, exhaustion of supply, regime change, physical limits.
- The true relationship may be non-linear at the extremes even when it looks linear in the middle.
How far is too far?
Mild extrapolation, just past the data range, is sometimes defensible. You need a good reason to think the trend keeps going for a little longer. Substantial extrapolation, far past the range, is usually unreliable. The HSC rarely sets an exact cut-off number. It tests whether you spot the extrapolation and can argue, in context, why it is risky. The deeper point is simple: a prediction gets less reliable the further it sits from the data. The further out you push, the more it leans on an assumption you cannot check.
When extrapolation breaks down: concrete cases
- Population growth. A linear fit may overshoot once housing or resources saturate.
- Athletic records. Linear improvement in running times cannot continue past human physiological limits.
- Compound investments. A linear model is the wrong shape; the real process is exponential, so extrapolating a line understates the long-run value.
- Children's height. Linear growth fitted from ages to cannot reach age , because growth stops in adolescence.
How exam questions ask about interpolation and extrapolation
The wording is varied but the task is fixed: locate the relative to the data range, classify, then justify. Translate the phrasings:
- "Classify this prediction as interpolation or extrapolation." State the data range, say whether the is inside or outside, and name the type.
- "Comment on the reliability of the prediction." Pair the classification with a context-specific reason: interpolation is reliable if the fit is good; extrapolation is suspect because the trend may not continue (then give a concrete reason for that data).
- "Use the line to predict when , and discuss." Substitute to get the number, then classify and caveat. The discussion is where the marks sit.
- "Why might this prediction be unreliable?" Give a specific mechanism (market saturation, biological limit, economic shift), not the bare phrase "the trend might change".
- "Is it reasonable to use this model here?" A reliability judgement: reasonable inside the range and mild extrapolation; unreasonable far outside.
Exam-style practice questions
Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
2022 HSC-style3 marksThe least-squares regression line is computed from a dataset where ranges from to . Classify the following predictions as interpolation or extrapolation, and comment on reliability. (a) . (b) . (c) .Show worked answer →
(a) is inside the data range , so this is interpolation. Predictions inside the range are generally reliable, assuming the regression line is a good fit.
(b) is outside the data range, so this is extrapolation. The prediction is less reliable because the linear relationship may not extend beyond the observed range.
(c) is well outside the data range and may also be physically implausible depending on context. This is extrapolation and is the least reliable prediction.
Markers reward correct classification and a brief reliability comment for each.
2023 HSC-style3 marksA regression line for the population of a regional Australian town from to predicts population growth. Use the line to predict the population in and discuss whether the prediction is reasonable.Show worked answer →
Substituting into the line gives a numerical prediction (the question's specific equation would supply the number).
This is an extreme extrapolation, years beyond the latest data point. Possible reasons the prediction may be wrong: changes in regional employment, climate impact on agriculture, policy changes, immigration patterns, or saturation of available housing.
Linear growth over years extrapolated from years of data is rarely reliable. State the predicted value but caveat strongly that the model assumes the past linear trend continues indefinitely, which is unlikely.
Markers reward identification as extrapolation, the prediction, and at least two specific reasons it may be unreliable.
Practice questions
Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.
foundation1 marksA least-squares regression line was computed from data where ranges from to . Classify a prediction at as interpolation or extrapolation.
Show worked solution →
- State the data range
- The data covers to , so any prediction with inside interpolates and anything outside extrapolates.
- Locate the prediction
- Here , and , so this value lies inside the observed range.
- Answer
- is inside the data range, so the prediction is interpolation.
foundation2 marksA regression line was computed from data where ranges from to . (a) Classify a prediction at . (b) Classify a prediction at .
Show worked solution →
State the data range. The data covers to . Inside is interpolation; outside is extrapolation.
Classify each value.
- : this is greater than , so it lies above the range. This is extrapolation.
- : this is less than , so it lies below the range. This is extrapolation.
Answer: both predictions are extrapolation, because and each lie outside the data range .
foundation2 marksA least-squares line is , computed from data where ranges from to . (a) Use the line to predict when . (b) Classify this prediction as interpolation or extrapolation.
Show worked solution →
Substitute to get the prediction. Put into the line:
State the data range and classify. The data covers to , and , so is inside the range.
Answer: the prediction is , and because is inside the data range it is interpolation.
foundation2 marksA regression line is fitted from data on children aged to . (a) Classify a prediction for age . (b) Classify a prediction for age .
Show worked solution →
State the data range. The data covers ages to , so ages inside interpolate and ages outside extrapolate.
Classify each age.
- Age : since , this is inside the range, so interpolation.
- Age : since , this is outside the range, so extrapolation.
Answer: age is interpolation (inside the data range) and age is extrapolation (outside the data range).
core3 marksThe population of a regional NSW town is modelled by , where is the number of years since . The data ran from to . (a) Predict the population in and classify the prediction. (b) Predict the population in and classify it.
Show worked solution →
Convert the years and note the data range. With measured from , the data covers to (the years to ). The year is and is .
Predict and classify . Substitute :
Since , this is interpolation.
Predict and classify . Substitute :
Since , this is extrapolation.
Answer: the prediction of is interpolation; the prediction of is extrapolation, years beyond the data.
core3 marksA cafe's monthly revenue is modelled by in dollars, where is the number of months since opening. The data covered months to . (a) Predict the revenue in month and classify the prediction. (b) Predict the revenue in month and comment on its reliability with a context-specific reason.
Show worked solution →
State the data range. The data covers months to , so a month inside interpolates and a month outside extrapolates.
Predict and classify month . Substitute :
Since , this is interpolation, so the predicted $ is reliable.
Predict month and assess reliability. Substitute :
Since , this is extrapolation. The line assumes revenue keeps climbing by $ every month forever, but the cafe will eventually saturate its local market and growth will flatten, so the predicted $ is likely to overstate revenue.
Answer: month gives $ (interpolation, reliable); month gives $ (extrapolation, unreliable because of market saturation).
core3 marksA child's height is modelled by , where is height in centimetres and is age in years. The data covered ages to . (a) Predict the height at age and classify it. (b) Predict the height at age and explain, with a context-specific reason, why the prediction is unreliable.
Show worked solution →
State the data range. The data covers ages to , so ages inside interpolate.
Predict and classify age . Substitute :
Since , this is interpolation, so cm is reliable.
Predict age and assess. Substitute :
Since , this is substantial extrapolation. The model keeps adding cm every year, but human growth stops in the late teens (a biological limit), so a predicted height of cm is impossible.
Answer: age gives cm (interpolation, reliable); age gives cm, an unreliable extrapolation because growth stops in adolescence.
core4 marksA farmer models crop yield as , where is yield in tonnes per hectare and is the growing-season rainfall in centimetres. The data covered rainfall from to cm. (a) Predict the yield at cm of rainfall and classify it. (b) Predict the yield at cm and classify it. (c) State which prediction is more reliable and give one context-specific reason.
Show worked solution →
State the data range. The rainfall data covers to cm, so a value inside interpolates.
Predict and classify cm. Substitute :
Since , this is interpolation.
Predict and classify cm. Substitute :
Since , this is extrapolation.
Compare reliability. The cm prediction is more reliable because it interpolates. The cm prediction extrapolates below the data, where a drought-level rainfall could stunt the crop far more than the line suggests, so tonnes per hectare may be too high.
Answer: cm gives t/ha (interpolation, more reliable); cm gives t/ha (extrapolation, less reliable because severe water shortage breaks the linear trend).
exam4 marksA regional town's population is modelled by , where is the number of years since . The data ran from to . (a) Predict the population in and classify the prediction. (b) Predict the population in . (c) State which prediction is more reliable, giving two context-specific reasons the figure may not hold.
Show worked solution →
Note the data range. With measured from , the data covers to . The year is and is .
Predict and classify . Substitute :
Since , this is interpolation.
Predict . Substitute :
Since , this is extrapolation, years past the last data point.
Compare reliability. The figure is more reliable because it interpolates. The figure of assumes the past linear trend simply continues; in reality the town could stall if a major employer closes, or stop growing once available housing land runs out.
Answer: gives (interpolation, reliable); gives (substantial extrapolation, unreliable because of possible employer loss and housing saturation).
exam4 marksA cafe's weekly revenue is modelled by in dollars, where is the number of weeks since opening. The data covered weeks to . (a) Predict the revenue in week and classify it. (b) Predict the revenue in week and classify it. (c) Explain why the week prediction is a more serious extrapolation than a prediction for week would be.
Show worked solution →
State the data range. The data covers weeks to , so a week inside interpolates.
Predict and classify week . Substitute :
Since , this is interpolation, so $ is reliable.
Predict and classify week . Substitute :
Since , this is extrapolation.
Compare mild and substantial extrapolation. Week is only weeks past the data, so mild extrapolation; it is defensible because the trend has little room to drift. Week is weeks past, doubling the time span, so the prediction relies far more heavily on the unverified assumption that linear growth continues, and the local market is likely to saturate first.
Answer: week gives $ (interpolation); week gives $ (substantial extrapolation, far less reliable than a mild extrapolation to week ).
exam5 marksA study of children models height as , where is height in centimetres and is age in years. The data covered ages to . (a) Predict the height at age and classify it. (b) Classify a prediction at age as mild or substantial extrapolation, and say whether it could be defensible. (c) Predict the height at age and explain, with a context-specific reason, why the model fails there.
Show worked solution →
State the data range. The data covers ages to , so ages inside interpolate.
Predict and classify age . Substitute :
Since , this is interpolation, so cm is reliable.
Classify age . Age is just year past the range, so mild extrapolation. It could be defensible, since a year old is still growing and the trend is unlikely to change sharply in one year. (For reference, the line gives cm.)
Predict age and assess. Substitute :
Since is far above , this is substantial extrapolation. Growth stops in the late teens (a biological limit), yet the line keeps adding cm per year, so the predicted cm is impossible.
Answer: age gives cm (interpolation); age is mild extrapolation, possibly defensible; age gives cm, an unreliable extrapolation because human growth ceases in adolescence.
exam5 marksAn electricity provider models summer peak demand as , where is demand in megawatts and is the daily maximum temperature in degrees Celsius. The data covered maximum temperatures from to degrees. (a) Predict the demand at degrees and classify it. (b) Predict the demand at degrees and classify it. (c) Give one context-specific reason the degree prediction may be unreliable, and state whether it is mild or substantial extrapolation.
Show worked solution →
State the data range. The temperature data covers to degrees, so a value inside interpolates.
Predict and classify degrees. Substitute :
Since , this is interpolation, so MW is reliable.
Predict and classify degrees. Substitute :
Since , this is extrapolation, degrees beyond the data.
Assess the degree prediction. On extreme heat days demand can rise faster than the line predicts, because almost every air conditioner runs at once, so the linear rate of MW per degree may understate true demand. At only degrees beyond a degree data span this is mild extrapolation, so the figure is a rough guide rather than worthless.
Answer: degrees gives MW (interpolation, reliable); degrees gives MW (mild extrapolation, possibly an underestimate because demand surges non-linearly in extreme heat).
