Topic 1: Bivariate data analysis - how do we model a linear relationship and use it to make predictions?
Fit a least-squares regression line to bivariate data, interpret the slope and intercept in context, use the line to make predictions through interpolation and extrapolation, and assess the fit using a residual plot and the coefficient of determination
A focused answer to the QCE General Mathematics Unit 3 dot point on least-squares regression. Covers fitting the regression line with CAS, interpreting slope and intercept in context, interpolation versus extrapolation, residuals and residual plots, and using the coefficient of determination to judge fit, with worked CAS examples for IA2 and the external assessment.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
What this dot point is asking
QCAA wants you to move from describing a linear association to modelling it with an equation. You fit a least-squares regression line, write it with the variable names in context, interpret what the slope and intercept actually mean, use the line to predict, and then judge how trustworthy the prediction is using residuals and the coefficient of determination. This dot point sits in Unit 3 Topic 1, directly after correlation, and is one of the highest-yielding areas in IA2 and the external assessment.
The answer
The least-squares regression line
The least-squares line is the straight line that minimises the sum of the squared vertical distances (residuals) between the data points and the line. In General Mathematics you read its coefficients off CAS after entering the paired data. The line is written
where is the explanatory variable, is the predicted response, is the slope (gradient) and is the -intercept. The hat on signals that the line gives a predicted value, not an observed one.
Interpreting slope and intercept in context
A full-mark interpretation always names the variables and units.
- Slope . For every increase of one unit in , the predicted value of changes by units. A positive means rises; a negative means falls.
- Intercept . The predicted value of when . This is only meaningful if is within or near the data range; otherwise it is a mathematical artefact.
Prediction: interpolation and extrapolation
Substituting an value into the equation gives a prediction .
- Interpolation predicts inside the range of the observed data. It is generally reliable.
- Extrapolation predicts outside the range of the observed data. It is unreliable because there is no evidence the linear pattern continues. QCAA reliably awards a mark for identifying a prediction as extrapolation and cautioning against it.
Residuals and the residual plot
A residual is the gap between an observed value and the value the line predicts:
A residual plot graphs each residual against . The key reading rule:
- Random scatter about zero confirms a linear model is appropriate.
- A clear curved pattern signals the underlying relationship is non-linear, so the straight line is the wrong model even if looked high.
Coefficient of determination as a fit measure
The coefficient of determination is the proportion of the variation in the response variable explained by the linear model. A value close to means the line explains most of the variation; a value near means it explains little. Always quote as a percentage in context alongside the residual plot, because the two together tell you whether the model is trustworthy.
Exam-style practice questions
Practice questions written in the style of QCAA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
2022 QCAA3 marksThe table shows the number of sales for a small business in their first six months of trading. Time in months, t: 1, 2, 3, 4, 5, 6. Number of sales, n: 86, 180, 160, 226, 240, 335. a) Use your calculator to determine the equation of the least-squares line. [1 mark] b) Use the equation from part a) to predict the number of sales in the 21st month. [2 marks]Show worked answer →
a) Fit the line (1 mark). Enter t as the explanatory variable and n as the response variable into your calculator's two-variable statistics, then read off the least-squares line: n = 42.6t + 55.4 (slope and intercept to one decimal place).
b) Predict (2 marks). Substitute t = 21 (1 mark): n = 42.6 x 21 + 55.4 = 894.6 + 55.4 = 950. The model predicts about 950 sales in the 21st month (1 mark).
Note this is extrapolation, since t = 21 is well outside the data range of months 1 to 6, so the prediction is less reliable. Always state the substitution and the final value with its meaning; a bare number does not earn the second mark.
2021 QCAA4 marksThe table shows the profit made each year (in thousands of dollars) by a small business. Year: 2015, 2016, 2017, 2018, 2019, 2020. Profit ($'000s): 42.1, 36.9, 48.4, 52.3, 56.1, 59.8. a) Use a mathematical model to determine the equation of the least-squares line to fit this data. [2 marks] b) Use the least-squares line to forecast the profit in 2021, to the nearest hundred dollars. [2 marks]Show worked answer →
a) Fit the line (2 marks). First define the variables (1 mark): let x = the number of years since 2014 (so 2015 is x = 1) and let y = annual profit in $'000s. Entering the data and running a least-squares regression gives y = 4.286x + 34.267 (1 mark).
b) Forecast (2 marks). For 2021, x = 7 (1 mark). Substitute: y = 4.286 x 7 + 34.267 = 30.002 + 34.267 = 64.269 (in 64 300 (1 mark).
The marks reward defining x clearly (the choice of base year fixes the intercept) and converting the $'000s answer back into dollars at the required rounding.
2022 QCAA4 marksThe least-squares line has been provided for a scatterplot showing the association between an employee's years of experience, n, and their hourly pay, p. a) Given that the least-squares line passes directly through the points (2, 20) and (7, 40), determine its equation. [2 marks] b) Use the equation from part a) to predict the hourly pay of an employee with 15 years experience. [2 marks]Show worked answer →
a) Equation (2 marks). Slope (1 mark): m = (40 - 20)/(7 - 2) = 20/5 = 4. Using point-slope through (2, 20): p - 20 = 4(n - 2), so p - 20 = 4n - 8, giving p = 4n + 12 (1 mark).
b) Predict (2 marks). Substitute n = 15 (1 mark): p = 4 x 15 + 12 = 60 + 12 = 72. An employee with 15 years experience is predicted to earn about $72 per hour (1 mark).
This is extrapolation (15 years is beyond the plotted range), so flag the prediction as less reliable. The slope 4 means pay rises by about $4 per hour for each extra year of experience, and the intercept 12 is the modelled starting pay at zero years.