§-Syllabus dot point

TASMathematics ApplicationsSyllabus dot point

How do we check whether a straight line is the right model for bivariate data?

Calculate residuals and use a residual plot to assess the appropriateness of a linear model.

Calculating residuals, constructing and reading residual plots, and judging whether a linear model fits bivariate data in TCE Mathematics Applications.

Generated by Claude Opus 4.87 min answerUpdated 2026-06-02

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

What this dot point is asking

A high correlation coefficient suggests a linear model might fit, but it does not prove the relationship is straight. Residual analysis is the proper check. It looks at what the line misses, point by point.

Calculating residuals

For each data point, find the predicted value $\hat{y}$ by substituting its $x$ into the regression equation, then subtract that from the actual $y$ . A point exactly on the line has a residual of zero.

The residual plot

A residual plot graphs each residual (vertical axis) against the explanatory variable (horizontal axis). It magnifies the leftover pattern that the eye cannot see in the original scatterplot.

When the residual plot shows a smile or frown shaped curve, the data bends and a straight line systematically over-predicts in some regions and under-predicts in others. That is a signal to consider a transformation or a non-linear model instead.

Why correlation alone is not enough

Two datasets can share the same correlation coefficient while one is genuinely linear and the other is strongly curved. The number $r$ measures only the strength of the linear part, so a curved dataset can still produce a moderately high $r$ . The residual plot is what exposes the curve, which is why it is the deciding test for whether the line is suitable.

A funnel shape, where residuals spread out more at one end, signals that the spread of the response changes across the data. The line may still capture the average trend, but predictions become less reliable where the residuals fan out.

Building a residual plot from a table

In a calculator-assisted question you are often given a column of $x$ values, the actual $y$ values, and a regression equation. Compute each $\hat{y}$ , subtract to get each residual, then plot the residuals (vertical axis) against $x$ (horizontal axis). The signs alone tell a story: a run of positive residuals followed by a run of negative ones traces a curve, whereas an alternating mix of small positives and negatives signals random scatter.

Tabulating residuals

We have three data points and a regression line. For each point we compute the predicted value from the line and then subtract to find the residual, which tells us how far the actual observation sits above or below the model.

The regression equation is $\hat{y} = 2 + 2x$ , fitted to the points $(1, 4)$ , $(2, 5)$ , and $(3, 9)$ .

Step 1: Compute the predicted value and residual for $x = 1$

Substitute $x = 1$ into the regression equation to get the predicted $y$ , then subtract from the actual value:

\hat{y} = 2 + 2(1) = 4 \qquad \text{residual} = 4 - 4 = 0

The point lies exactly on the line.

Step 2: Compute the predicted value and residual for $x = 2$

\hat{y} = 2 + 2(2) = 6 \qquad \text{residual} = 5 - 6 = -1

The actual value is 1 unit below the line, so the model over-predicts here.

Step 3: Compute the predicted value and residual for $x = 3$

\hat{y} = 2 + 2(3) = 8 \qquad \text{residual} = 9 - 8 = +1

The actual value is 1 unit above the line, so the model under-predicts here.

Step 4: Interpret the residual pattern

The residuals $0, -1, +1$ alternate in sign and show no clear curved trend. With only three points there is no strong evidence against the linear model, but more data would be needed to judge the pattern confidently.

Final answer: The residuals are $0$ , $-1$ , and $+1$ for $x = 1$ , $2$ , and $3$ respectively.

A complete answer calculates residuals as actual minus predicted, describes the residual plot as showing either random scatter or a systematic pattern, and states clearly whether the linear model is appropriate based on that pattern.

Exam-style practice questions

Practice questions written in the style of TASC exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

TCE 20244 marksTable 7 and Figure 5 show the population of India from 1950 to 2000, with a residual column and a regression line. a) Use your calculator to complete the Residuals column to two decimal places. b) Interpret the residual point (1970, -36.30). c) Is the linear model an appropriate choice? Explain.

Show worked answer →

a) (part of the question) Residual = actual value minus predicted value. Using the regression equation, subtract the predicted population from each actual population. For 1990 and 2000 the residuals come out large and positive, continuing the curved pattern in the residual plot.

b) (2 marks) The residual -36.30 means that in 1970 the actual population (558 million) was 36.30 million below the value predicted by the regression line. A negative residual means the model over-predicted for that year.

c) (2 marks) No, the linear model is not appropriate. The residual plot is not randomly scattered about zero, it shows a clear curved (U-shaped) pattern, with residuals going positive, then negative, then strongly positive again. This systematic pattern indicates the underlying relationship is non-linear (the population is growing roughly exponentially), so a straight line is a poor model.

TCE 20192 marksA linear model has been fitted to athletic data and a plot of residuals to this model is presented. State the size of the largest residual and why this is relevant.

Show worked answer →

(2 marks) Read the residual furthest from the horizontal axis (zero line) on the plot, taking its size (magnitude) regardless of sign.

The largest residual identifies the data point for which the model's prediction is least accurate, that is, the actual value lies furthest above or below the regression line. It is relevant because a single large residual flags a possible outlier and, more generally, the spread of residuals tells you how well the line fits: small residuals throughout mean a good fit, while one large residual warns that the model predicts poorly for that case.