Skip to main content
TASMathematics ApplicationsSyllabus dot point

How do we check whether a straight line is the right model for bivariate data?

Calculate residuals and use a residual plot to assess the appropriateness of a linear model.

Calculating residuals, constructing and reading residual plots, and judging whether a linear model fits bivariate data in TCE Mathematics Applications.

Generated by Claude Opus 4.77 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

What this dot point is asking

A high correlation coefficient suggests a linear model might fit, but it does not prove the relationship is straight. Residual analysis is the proper check. It looks at what the line misses, point by point.

Calculating residuals

For each data point, find the predicted value y^\hat{y} by substituting its xx into the regression equation, then subtract that from the actual yy. A point exactly on the line has a residual of zero.

The residual plot

A residual plot graphs each residual (vertical axis) against the explanatory variable (horizontal axis). It magnifies the leftover pattern that the eye cannot see in the original scatterplot.

When the residual plot shows a smile or frown shaped curve, the data bends and a straight line systematically over-predicts in some regions and under-predicts in others. That is a signal to consider a transformation or a non-linear model instead.

Why correlation alone is not enough

Two datasets can share the same correlation coefficient while one is genuinely linear and the other is strongly curved. The number rr measures only the strength of the linear part, so a curved dataset can still produce a moderately high rr. The residual plot is what exposes the curve, which is why it is the deciding test for whether the line is suitable.

A funnel shape, where residuals spread out more at one end, signals that the spread of the response changes across the data. The line may still capture the average trend, but predictions become less reliable where the residuals fan out.

A complete answer calculates residuals as actual minus predicted, describes the residual plot as showing either random scatter or a systematic pattern, and states clearly whether the linear model is appropriate based on that pattern.

Exam-style practice questions

Practice questions written in the style of TASC exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2024 TASC General Mathematics4 marksTable 7 and Figure 5 show the population of India from 1950 to 2000, with a residual column and a regression line. a) Use your calculator to complete the Residuals column to two decimal places. b) Interpret the residual point (1970, -36.30). c) Is the linear model an appropriate choice? Explain.
Show worked answer β†’

a) (part of the question) Residual = actual value minus predicted value. Using the regression equation, subtract the predicted population from each actual population. For 1990 and 2000 the residuals come out large and positive, continuing the curved pattern in the residual plot.

b) (2 marks) The residual -36.30 means that in 1970 the actual population (558 million) was 36.30 million below the value predicted by the regression line. A negative residual means the model over-predicted for that year.

c) (2 marks) No, the linear model is not appropriate. The residual plot is not randomly scattered about zero, it shows a clear curved (U-shaped) pattern, with residuals going positive, then negative, then strongly positive again. This systematic pattern indicates the underlying relationship is non-linear (the population is growing roughly exponentially), so a straight line is a poor model.

2019 TASC General Mathematics2 marksA linear model has been fitted to athletic data and a plot of residuals to this model is presented. State the size of the largest residual and why this is relevant.
Show worked answer β†’

(2 marks) Read the residual furthest from the horizontal axis (zero line) on the plot, taking its size (magnitude) regardless of sign.

The largest residual identifies the data point for which the model's prediction is least accurate, that is, the actual value lies furthest above or below the regression line. It is relevant because a single large residual flags a possible outlier and, more generally, the spread of residuals tells you how well the line fits: small residuals throughout mean a good fit, while one large residual warns that the model predicts poorly for that case.