Skip to main content
ExamExplained
NSW · Maths Standard 2
Maths Standard 2 study scene
§-Syllabus dot point
NSWMaths Standard 2Syllabus dot point

What is the difference between interpolation and extrapolation, and why is extrapolation less reliable?

Distinguish between interpolation and extrapolation when using a regression line, and assess the reliability of predictions

A focused answer to the HSC Maths Standard 2 dot point on interpolation vs extrapolation. The data range as the dividing line, why predictions inside are reliable and outside are suspect, a stage-by-stage diagram of the zones, how the questions are worded, and Australian-context worked examples.

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

What this dot point is asking

NESA wants you to do two things. First, classify a prediction from a regression line (the line of best fit) as interpolation (inside the data range) or extrapolation (outside it). Second, comment on how reliable each prediction is. This task shows up in almost every paper, usually attached to a least-squares question. The marks come not from the arithmetic but from naming the zone and giving a context-specific reason.

The answer

The one idea: the data range decides everything

Everything in this dot point hinges on a single boundary: the range of xx-values that were actually observed. Inside that range you have evidence for the trend, so a prediction interpolates and is generally trustworthy. Outside it you have no evidence. You only have an assumption that the line keeps going, so a prediction extrapolates and is shaky. The four-stage diagram below builds that picture up step by step: plot the data, fit the line, mark the zones, then contrast a safe prediction with a risky one.

Stage 1, plot the data and see its range. The scatter only covers a limited span of xx-values. That span, between the smallest and largest observed xx, is the observed data range, and it is the boundary that classifies every later prediction.

Interpolation and extrapolation stage 1 Stage 1: plot the data; note the x-values only span the observed range. y x observed data range Stage 1 Stage 1: plot the data; note the x-values only span the observed range.

Stage 2, fit the line; inside is interpolation. The least-squares line is fitted to the data. Any prediction for an xx inside the shaded data range is interpolation: it falls among the points the line was actually built from, so the line has evidence there.

Interpolation and extrapolation stage 2 Stage 2: fit the regression line; inside the data range, predictions interpolate. y x interpolation (inside data range) Stage 2 Stage 2: fit the regression line; inside the data range, predictions interpolate.

Stage 3, beyond the range is extrapolation. Extending the line past the data (shown dashed) reaches xx-values where nothing was ever observed. Predictions in those outer zones are extrapolation, and they rest entirely on the assumption that the straight-line trend continues, which the data cannot confirm.

Interpolation and extrapolation stage 3 Stage 3: beyond the data range the line is dashed; predictions there extrapolate. y x interpolation extrapolation (less reliable) extrapolation (less reliable) Stage 3 Stage 3: beyond the data range the line is dashed; predictions there extrapolate.

Stage 4, reliable inside versus risky outside. A prediction read off inside the data range (solid point) is reliable: it is supported by nearby observations. A prediction read off well beyond the range (open point) is risky: no data backs it, and the real relationship may bend, flatten or break down out there.

Interpolation and extrapolation stage 4 Stage 4: an interpolated prediction is reliable; an extrapolated one may not be. y x reliable risky Stage 4 Stage 4: an interpolated prediction is reliable; an extrapolated one may not be.

Interpolation

Interpolation is a prediction at an xx inside the observed range. If the data covers x=2x = 2 to x=20x = 20, then any prediction with xx between 22 and 2020 interpolates.

It is generally reliable, provided:

  • The scatterplot shows a clear linear pattern.
  • The correlation coefficient is moderately strong or stronger.
  • No extreme outlier is distorting the fit.

Even so, interpolation gives the best linear estimate, not a guarantee: individual points scatter around the line, so a single prediction can still be off by the typical residual.

Extrapolation

Extrapolation is a prediction at an xx outside the observed range. With data from x=2x = 2 to x=20x = 20, a prediction at x=25x = 25 or x=0x = 0 extrapolates.

It is generally less reliable because:

  • The pattern in the data need not continue past the range.
  • New factors can dominate at extreme values: saturation, exhaustion of supply, regime change, physical limits.
  • The true relationship may be non-linear at the extremes even when it looks linear in the middle.

How far is too far?

Mild extrapolation, just past the data range, is sometimes defensible. You need a good reason to think the trend keeps going for a little longer. Substantial extrapolation, far past the range, is usually unreliable. The HSC rarely sets an exact cut-off number. It tests whether you spot the extrapolation and can argue, in context, why it is risky. The deeper point is simple: a prediction gets less reliable the further it sits from the data. The further out you push, the more it leans on an assumption you cannot check.

When extrapolation breaks down: concrete cases

  • Population growth. A linear fit may overshoot once housing or resources saturate.
  • Athletic records. Linear improvement in running times cannot continue past human physiological limits.
  • Compound investments. A linear model is the wrong shape; the real process is exponential, so extrapolating a line understates the long-run value.
  • Children's height. Linear growth fitted from ages 44 to 1212 cannot reach age 5050, because growth stops in adolescence.

How exam questions ask about interpolation and extrapolation

The wording is varied but the task is fixed: locate the xx relative to the data range, classify, then justify. Translate the phrasings:

  • "Classify this prediction as interpolation or extrapolation." State the data range, say whether the xx is inside or outside, and name the type.
  • "Comment on the reliability of the prediction." Pair the classification with a context-specific reason: interpolation is reliable if the fit is good; extrapolation is suspect because the trend may not continue (then give a concrete reason for that data).
  • "Use the line to predict yy when x=x = \ldots, and discuss." Substitute to get the number, then classify and caveat. The discussion is where the marks sit.
  • "Why might this prediction be unreliable?" Give a specific mechanism (market saturation, biological limit, economic shift), not the bare phrase "the trend might change".
  • "Is it reasonable to use this model here?" A reliability judgement: reasonable inside the range and mild extrapolation; unreasonable far outside.

Exam-style practice questions

Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2022 HSC-style3 marksThe least-squares regression line y=1.5x+4y = 1.5 x + 4 is computed from a dataset where xx ranges from 22 to 2020. Classify the following predictions as interpolation or extrapolation, and comment on reliability. (a) x=10x = 10. (b) x=25x = 25. (c) x=3x = -3.
Show worked answer →

(a) x=10x = 10 is inside the data range [2,20][2, 20], so this is interpolation. Predictions inside the range are generally reliable, assuming the regression line is a good fit.

(b) x=25x = 25 is outside the data range, so this is extrapolation. The prediction is less reliable because the linear relationship may not extend beyond the observed range.

(c) x=3x = -3 is well outside the data range and may also be physically implausible depending on context. This is extrapolation and is the least reliable prediction.

Markers reward correct classification and a brief reliability comment for each.

2023 HSC-style3 marksA regression line for the population of a regional Australian town from 19901990 to 20202020 predicts population growth. Use the line to predict the population in 20802080 and discuss whether the prediction is reasonable.
Show worked answer →

Substituting x=2080x = 2080 into the line gives a numerical prediction (the question's specific equation would supply the number).

This is an extreme extrapolation, 6060 years beyond the latest data point. Possible reasons the prediction may be wrong: changes in regional employment, climate impact on agriculture, policy changes, immigration patterns, or saturation of available housing.

Linear growth over 6060 years extrapolated from 3030 years of data is rarely reliable. State the predicted value but caveat strongly that the model assumes the past linear trend continues indefinitely, which is unlikely.

Markers reward identification as extrapolation, the prediction, and at least two specific reasons it may be unreliable.

Practice questions

Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.

foundation1 marksA least-squares regression line was computed from data where xx ranges from 55 to 4040. Classify a prediction at x=22x = 22 as interpolation or extrapolation.
Show worked solution →
State the data range
The data covers x=5x = 5 to x=40x = 40, so any prediction with xx inside [5,40][5, 40] interpolates and anything outside extrapolates.
Locate the prediction
Here x=22x = 22, and 522405 \le 22 \le 40, so this value lies inside the observed range.
Answer
x=22x = 22 is inside the data range, so the prediction is interpolation.
foundation2 marksA regression line was computed from data where xx ranges from 55 to 4040. (a) Classify a prediction at x=55x = 55. (b) Classify a prediction at x=2x = 2.
Show worked solution →

State the data range. The data covers x=5x = 5 to x=40x = 40. Inside [5,40][5, 40] is interpolation; outside is extrapolation.

Classify each value.

  • x=55x = 55: this is greater than 4040, so it lies above the range. This is extrapolation.
  • x=2x = 2: this is less than 55, so it lies below the range. This is extrapolation.

Answer: both predictions are extrapolation, because x=55x = 55 and x=2x = 2 each lie outside the data range [5,40][5, 40].

foundation2 marksA least-squares line is y=3x+10y = 3x + 10, computed from data where xx ranges from 44 to 1616. (a) Use the line to predict yy when x=9x = 9. (b) Classify this prediction as interpolation or extrapolation.
Show worked solution →

Substitute to get the prediction. Put x=9x = 9 into the line:

y=3×9+10=27+10=37.y = 3 \times 9 + 10 = 27 + 10 = 37.

State the data range and classify. The data covers x=4x = 4 to x=16x = 16, and 49164 \le 9 \le 16, so x=9x = 9 is inside the range.

Answer: the prediction is y=37y = 37, and because x=9x = 9 is inside the data range it is interpolation.

foundation2 marksA regression line height (cm)=7age (years)+72\text{height (cm)} = 7 \cdot \text{age (years)} + 72 is fitted from data on children aged 22 to 1010. (a) Classify a prediction for age 66. (b) Classify a prediction for age 1414.
Show worked solution →

State the data range. The data covers ages 22 to 1010, so ages inside [2,10][2, 10] interpolate and ages outside extrapolate.

Classify each age.

  • Age 66: since 26102 \le 6 \le 10, this is inside the range, so interpolation.
  • Age 1414: since 14>1014 > 10, this is outside the range, so extrapolation.

Answer: age 66 is interpolation (inside the data range) and age 1414 is extrapolation (outside the data range).

core3 marksThe population of a regional NSW town is modelled by P=120t+4200P = 120t + 4200, where tt is the number of years since 20002000. The data ran from 20002000 to 20152015. (a) Predict the population in 20102010 and classify the prediction. (b) Predict the population in 20352035 and classify it.
Show worked solution →

Convert the years and note the data range. With tt measured from 20002000, the data covers t=0t = 0 to t=15t = 15 (the years 20002000 to 20152015). The year 20102010 is t=10t = 10 and 20352035 is t=35t = 35.

Predict and classify 20102010. Substitute t=10t = 10:

P=120×10+4200=1200+4200=5400.P = 120 \times 10 + 4200 = 1200 + 4200 = 5400.

Since 010150 \le 10 \le 15, this is interpolation.

Predict and classify 20352035. Substitute t=35t = 35:

P=120×35+4200=4200+4200=8400.P = 120 \times 35 + 4200 = 4200 + 4200 = 8400.

Since 35>1535 > 15, this is extrapolation.

Answer: the 20102010 prediction of 54005400 is interpolation; the 20352035 prediction of 84008400 is extrapolation, 2020 years beyond the data.

core3 marksA cafe's monthly revenue is modelled by R=800m+5000R = 800m + 5000 in dollars, where mm is the number of months since opening. The data covered months 11 to 1818. (a) Predict the revenue in month 1212 and classify the prediction. (b) Predict the revenue in month 4848 and comment on its reliability with a context-specific reason.
Show worked solution →

State the data range. The data covers months 11 to 1818, so a month inside [1,18][1, 18] interpolates and a month outside extrapolates.

Predict and classify month 1212. Substitute m=12m = 12:

R=800×12+5000=9600+5000=14600.R = 800 \times 12 + 5000 = 9600 + 5000 = 14600.

Since 112181 \le 12 \le 18, this is interpolation, so the predicted $1460014600 is reliable.

Predict month 4848 and assess reliability. Substitute m=48m = 48:

R=800×48+5000=38400+5000=43400.R = 800 \times 48 + 5000 = 38400 + 5000 = 43400.

Since 48>1848 > 18, this is extrapolation. The line assumes revenue keeps climbing by $800800 every month forever, but the cafe will eventually saturate its local market and growth will flatten, so the predicted $4340043400 is likely to overstate revenue.

Answer: month 1212 gives $1460014600 (interpolation, reliable); month 4848 gives $4340043400 (extrapolation, unreliable because of market saturation).

core3 marksA child's height is modelled by H=6a+76H = 6a + 76, where HH is height in centimetres and aa is age in years. The data covered ages 33 to 1111. (a) Predict the height at age 77 and classify it. (b) Predict the height at age 3030 and explain, with a context-specific reason, why the prediction is unreliable.
Show worked solution →

State the data range. The data covers ages 33 to 1111, so ages inside [3,11][3, 11] interpolate.

Predict and classify age 77. Substitute a=7a = 7:

H=6×7+76=42+76=118.H = 6 \times 7 + 76 = 42 + 76 = 118.

Since 37113 \le 7 \le 11, this is interpolation, so 118118 cm is reliable.

Predict age 3030 and assess. Substitute a=30a = 30:

H=6×30+76=180+76=256.H = 6 \times 30 + 76 = 180 + 76 = 256.

Since 30>1130 > 11, this is substantial extrapolation. The model keeps adding 66 cm every year, but human growth stops in the late teens (a biological limit), so a predicted height of 256256 cm is impossible.

Answer: age 77 gives 118118 cm (interpolation, reliable); age 3030 gives 256256 cm, an unreliable extrapolation because growth stops in adolescence.

core4 marksA farmer models crop yield as Y=0.9w+2.5Y = 0.9w + 2.5, where YY is yield in tonnes per hectare and ww is the growing-season rainfall in centimetres. The data covered rainfall from 2020 to 6060 cm. (a) Predict the yield at 4545 cm of rainfall and classify it. (b) Predict the yield at 1515 cm and classify it. (c) State which prediction is more reliable and give one context-specific reason.
Show worked solution →

State the data range. The rainfall data covers 2020 to 6060 cm, so a value inside [20,60][20, 60] interpolates.

Predict and classify 4545 cm. Substitute w=45w = 45:

Y=0.9×45+2.5=40.5+2.5=43.Y = 0.9 \times 45 + 2.5 = 40.5 + 2.5 = 43.

Since 20456020 \le 45 \le 60, this is interpolation.

Predict and classify 1515 cm. Substitute w=15w = 15:

Y=0.9×15+2.5=13.5+2.5=16.Y = 0.9 \times 15 + 2.5 = 13.5 + 2.5 = 16.

Since 15<2015 < 20, this is extrapolation.

Compare reliability. The 4545 cm prediction is more reliable because it interpolates. The 1515 cm prediction extrapolates below the data, where a drought-level rainfall could stunt the crop far more than the line suggests, so 1616 tonnes per hectare may be too high.

Answer: 4545 cm gives 4343 t/ha (interpolation, more reliable); 1515 cm gives 1616 t/ha (extrapolation, less reliable because severe water shortage breaks the linear trend).

exam4 marksA regional town's population is modelled by P=250t+8000P = 250t + 8000, where tt is the number of years since 19951995. The data ran from 19951995 to 20202020. (a) Predict the population in 20152015 and classify the prediction. (b) Predict the population in 20452045. (c) State which prediction is more reliable, giving two context-specific reasons the 20452045 figure may not hold.
Show worked solution →

Note the data range. With tt measured from 19951995, the data covers t=0t = 0 to t=25t = 25. The year 20152015 is t=20t = 20 and 20452045 is t=50t = 50.

Predict and classify 20152015. Substitute t=20t = 20:

P=250×20+8000=5000+8000=13000.P = 250 \times 20 + 8000 = 5000 + 8000 = 13000.

Since 020250 \le 20 \le 25, this is interpolation.

Predict 20452045. Substitute t=50t = 50:

P=250×50+8000=12500+8000=20500.P = 250 \times 50 + 8000 = 12500 + 8000 = 20500.

Since 50>2550 > 25, this is extrapolation, 2525 years past the last data point.

Compare reliability. The 20152015 figure is more reliable because it interpolates. The 20452045 figure of 2050020500 assumes the past linear trend simply continues; in reality the town could stall if a major employer closes, or stop growing once available housing land runs out.

Answer: 20152015 gives 1300013000 (interpolation, reliable); 20452045 gives 2050020500 (substantial extrapolation, unreliable because of possible employer loss and housing saturation).

exam4 marksA cafe's weekly revenue is modelled by R=350w+1800R = 350w + 1800 in dollars, where ww is the number of weeks since opening. The data covered weeks 22 to 2626. (a) Predict the revenue in week 2020 and classify it. (b) Predict the revenue in week 5252 and classify it. (c) Explain why the week 5252 prediction is a more serious extrapolation than a prediction for week 2828 would be.
Show worked solution →

State the data range. The data covers weeks 22 to 2626, so a week inside [2,26][2, 26] interpolates.

Predict and classify week 2020. Substitute w=20w = 20:

R=350×20+1800=7000+1800=8800.R = 350 \times 20 + 1800 = 7000 + 1800 = 8800.

Since 220262 \le 20 \le 26, this is interpolation, so $88008800 is reliable.

Predict and classify week 5252. Substitute w=52w = 52:

R=350×52+1800=18200+1800=20000.R = 350 \times 52 + 1800 = 18200 + 1800 = 20000.

Since 52>2652 > 26, this is extrapolation.

Compare mild and substantial extrapolation. Week 2828 is only 22 weeks past the data, so mild extrapolation; it is defensible because the trend has little room to drift. Week 5252 is 2626 weeks past, doubling the time span, so the prediction relies far more heavily on the unverified assumption that linear growth continues, and the local market is likely to saturate first.

Answer: week 2020 gives $88008800 (interpolation); week 5252 gives $2000020000 (substantial extrapolation, far less reliable than a mild extrapolation to week 2828).

exam5 marksA study of children models height as H=5.5a+82H = 5.5a + 82, where HH is height in centimetres and aa is age in years. The data covered ages 44 to 1212. (a) Predict the height at age 9.59.5 and classify it. (b) Classify a prediction at age 1313 as mild or substantial extrapolation, and say whether it could be defensible. (c) Predict the height at age 4040 and explain, with a context-specific reason, why the model fails there.
Show worked solution →

State the data range. The data covers ages 44 to 1212, so ages inside [4,12][4, 12] interpolate.

Predict and classify age 9.59.5. Substitute a=9.5a = 9.5:

H=5.5×9.5+82=52.25+82=134.25.H = 5.5 \times 9.5 + 82 = 52.25 + 82 = 134.25.

Since 49.5124 \le 9.5 \le 12, this is interpolation, so 134.25134.25 cm is reliable.

Classify age 1313. Age 1313 is just 11 year past the range, so mild extrapolation. It could be defensible, since a 1313 year old is still growing and the trend is unlikely to change sharply in one year. (For reference, the line gives 5.5×13+82=153.55.5 \times 13 + 82 = 153.5 cm.)

Predict age 4040 and assess. Substitute a=40a = 40:

H=5.5×40+82=220+82=302.H = 5.5 \times 40 + 82 = 220 + 82 = 302.

Since 4040 is far above 1212, this is substantial extrapolation. Growth stops in the late teens (a biological limit), yet the line keeps adding 5.55.5 cm per year, so the predicted 302302 cm is impossible.

Answer: age 9.59.5 gives 134.25134.25 cm (interpolation); age 1313 is mild extrapolation, possibly defensible; age 4040 gives 302302 cm, an unreliable extrapolation because human growth ceases in adolescence.

exam5 marksAn electricity provider models summer peak demand as D=1.4T+18D = 1.4T + 18, where DD is demand in megawatts and TT is the daily maximum temperature in degrees Celsius. The data covered maximum temperatures from 1818 to 3838 degrees. (a) Predict the demand at 3030 degrees and classify it. (b) Predict the demand at 4444 degrees and classify it. (c) Give one context-specific reason the 4444 degree prediction may be unreliable, and state whether it is mild or substantial extrapolation.
Show worked solution →

State the data range. The temperature data covers 1818 to 3838 degrees, so a value inside [18,38][18, 38] interpolates.

Predict and classify 3030 degrees. Substitute T=30T = 30:

D=1.4×30+18=42+18=60.D = 1.4 \times 30 + 18 = 42 + 18 = 60.

Since 18303818 \le 30 \le 38, this is interpolation, so 6060 MW is reliable.

Predict and classify 4444 degrees. Substitute T=44T = 44:

D=1.4×44+18=61.6+18=79.6.D = 1.4 \times 44 + 18 = 61.6 + 18 = 79.6.

Since 44>3844 > 38, this is extrapolation, 66 degrees beyond the data.

Assess the 4444 degree prediction. On extreme heat days demand can rise faster than the line predicts, because almost every air conditioner runs at once, so the linear rate of 1.41.4 MW per degree may understate true demand. At only 66 degrees beyond a 2020 degree data span this is mild extrapolation, so the figure is a rough guide rather than worthless.

Answer: 3030 degrees gives 6060 MW (interpolation, reliable); 4444 degrees gives 79.679.6 MW (mild extrapolation, possibly an underestimate because demand surges non-linearly in extreme heat).

ExamExplained