§-Syllabus dot point

QLDGeneral MathematicsSyllabus dot point

Topic 1: Bivariate data analysis - how do we use residuals to judge whether a straight line is the right model?

Calculate residuals for a least-squares line, construct and interpret a residual plot, use the pattern in the residual plot to decide whether a linear model is appropriate, and identify when a transformation is needed because the residuals show curvature

A focused answer to the QCE General Mathematics Unit 3 dot point on residual analysis. Covers what a residual is, how to calculate residuals from a least-squares line, how to build and read a residual plot, and how a random scatter versus a clear pattern tells you whether a linear model fits, with arithmetic-verified worked examples for IA2 and the external assessment.

Generated by Claude Opus 4.86 min answerUpdated 2026-05-29

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

What this dot point is asking

QCAA wants you to go one step past fitting a least-squares line and check whether a straight line was actually the right choice. You do this with residuals: the gaps between the real data and the line. By calculating each residual and plotting it, you produce a residual plot, and the shape of that plot tells you whether the linear model is appropriate or whether the data is secretly curved. This is a distinct Unit 3 Topic 1 skill that sits between fitting a regression line and transforming data, and it appears in IA1, IA2 and the external assessment.

The answer

What a residual is

A residual measures how far an actual data point sits above or below the fitted least-squares line at the same value of the explanatory variable. For each point,

\text{residual} = y_{\text{actual}} - y_{\text{predicted}}.

The predicted value comes from substituting the point's $x$ into the regression equation $y = a + bx$ . A positive residual means the point lies above the line (the model underpredicted); a negative residual means it lies below the line (the model overpredicted). A residual of zero means the point sits exactly on the line.

Why residuals matter

Pearson's $r$ and the coefficient of determination $r^2$ tell you how strong a linear association is, but a strong $r$ does not guarantee that a line is the best model. A gently curving relationship can still produce a high $r$ . The residuals expose the curvature that $r$ hides, because once you remove the straight-line trend, any leftover pattern is what is left to see.

Constructing a residual plot

A residual plot graphs each residual on the vertical axis against the explanatory variable on the horizontal axis. The horizontal axis is the line where residual equals zero. You plot one point per data pair.

Reading the residual plot

There are only two outcomes you need to describe.

Random scatter. If the residuals are scattered randomly above and below the zero line with no pattern, the linear model is appropriate. The straight line has captured the trend and only random noise remains.
A clear pattern. If the residuals form a curve (for example, a U-shape or an arch), the linear model is not appropriate. The pattern means there is structure the straight line failed to capture, and the relationship is non-linear. This is the signal that you should apply a transformation.

The size of the residuals also matters: large residuals indicate points poorly explained by the line, and a single very large residual flags a possible outlier.

Worked example

Calculating residuals and judging the fit

A least-squares line $\hat{y} = 2 + 3x$ has been fitted to four data points: $(1, 6)$ , $(2, 7)$ , $(3, 12)$ and $(4, 15)$ .

Step 1: Predicted values. Substitute each $x$ into $\hat{y} = 2 + 3x$ .

\hat{y}(1) = 2 + 3 = 5, \quad \hat{y}(2) = 2 + 6 = 8, \quad \hat{y}(3) = 2 + 9 = 11, \quad \hat{y}(4) = 2 + 12 = 14.

Step 2: Residuals are actual minus predicted.

6 - 5 = 1, \quad 7 - 8 = -1, \quad 12 - 11 = 1, \quad 15 - 14 = 1.

Verify the second one: $7 - 8 = -1$ , so that point sits below the line.

Step 3: Read the pattern. The residuals are $1, -1, 1, 1$ . They alternate in sign without a smooth curve, so they look like random scatter rather than a U-shape or arch. With only one negative value among small residuals there is no systematic curvature, so a linear model is reasonable here.

Step 4: Interpretation. Because the residual plot shows no clear pattern, the straight line is an appropriate model for this data and no transformation is needed.

Common traps

Computing predicted minus actual: The definition is actual minus predicted. Reversing it flips every sign and inverts your reading of which points lie above the line.
Reading the residual plot like the scatterplot: A residual plot is not meant to show a trend; the trend has already been removed. You are looking only for leftover pattern versus random scatter, not for a slope.
Concluding a line fits because r is high: A high $r$ can still come with a curved residual plot. Always check the residuals before declaring a linear model appropriate.
Plotting residuals against the response variable: The horizontal axis of a residual plot is the explanatory variable (or the predicted value), not the actual response.
Calling one large residual a non-linear pattern: A single large residual is an outlier signal, not curvature. Non-linearity shows up as a smooth shape across most of the points.

Exam-style practice questions

Practice questions written in the style of QCAA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2021 QCAA6 marksResearchers gathered data to determine if a model could reliably predict systolic blood pressure given a person's age. One candidate was 31 years old with a systolic blood pressure of 119. For this data the correlation coefficient r is 0.875, the standard deviation for age is 4 and for systolic blood pressure is 6. A residual plot was produced for the model (the residual for the 31-year-old reads -0.75, and the residual for the oldest person, aged 40, reads 1.4). Determine the actual systolic blood pressure, to the nearest whole number, for the oldest person in the sample (40 years old).

Show worked answer →

The key idea is residual = actual - predicted, so actual = predicted + residual once you have the least-squares line.

Step 1 - use the residual to find a predicted value (1 mark). The residual plot shows -0.75 for the 31-year-old, so 119 - predicted = -0.75, giving predicted blood pressure 119.75 at age 31.

Step 2 - slope (1 mark): b = r x (s_y / s_x) = 0.875 x (6 / 4) = 1.3125.

Step 3 - intercept (1 mark): using y = bx + a at (31, 119.75): 119.75 = 1.3125 x 31 + a, so a = 119.75 - 40.6875 = 79.0625. The model is y = 1.3125x + 79.0625.

Step 4 - predicted value for the oldest person (1 mark): at x = 40, y = 1.3125 x 40 + 79.0625 = 131.5625.

Step 5 - apply the residual (1 mark): the residual for the 40-year-old is 1.4, so actual = predicted + residual = 131.5625 + 1.4 = 132.96.

Step 6 - round and state (1 mark): the oldest person's actual systolic blood pressure is about 133.

The whole question hinges on reading residuals correctly off the residual plot and remembering that a positive residual means the actual value sits above the line.