Topic 1: Bivariate data analysis - how do we straighten a curved relationship so a least-squares line can be fitted?
Apply a square, logarithmic or reciprocal transformation to one variable to linearise a non-linear association, fit a least-squares line to the transformed data, use the transformed equation to predict, and choose the transformation that best straightens the scatter
A focused answer to the QCE General Mathematics Unit 3 dot point on data transformation. Covers when to transform, the square, log and reciprocal transformations, how to fit and use a least-squares line on transformed data, and how to predict by back-substituting, with arithmetic-verified worked examples for IA2 and the external assessment.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
What this dot point is asking
QCAA wants you to handle bivariate data whose scatter is clearly curved, where fitting a straight line directly would be wrong. The fix is to transform one of the variables (square it, take its logarithm, or take its reciprocal) so the relationship straightens out, then fit a least-squares line to the transformed data and use that line to predict. You also have to choose which transformation does the best straightening. This is the natural follow-on from residual analysis in Unit 3 Topic 1 and is regularly tested in IA1, IA2 and the external assessment.
The answer
Why transform at all
Least-squares regression only describes a straight-line relationship. When a scatterplot or residual plot shows a smooth curve, a straight line fitted to the raw data gives biased predictions. Rather than abandon regression, you change the scale of one variable so the curve becomes a line. This is called linearising the data.
The three transformations
In General Mathematics you choose from three transformations, applied to either the explanatory variable or the response variable .
- The squared transformation ( or ) stretches the upper end of a variable. It straightens data that curves upward more and more steeply.
- The logarithmic transformation ( or ) compresses the upper end. It straightens data that rises quickly then flattens, or data that grows by a roughly constant percentage.
- The reciprocal transformation ( or ) strongly compresses large values and is used for data that drops steeply and then levels off towards an asymptote.
Choosing the transformation
You pick the transformation that makes the transformed scatterplot look most like a straight line. In practice you compare residual plots or the value of for the candidate transformations and select the one with the most random residuals and the highest . The transformation can be applied to either axis; sometimes squaring works while sometimes taking works, so test rather than guess.
Fitting and predicting
Once transformed, treat the new variable exactly like ordinary data: fit the least-squares line on CAS. The fitted equation is written in terms of the transformed variable, for example
To predict, substitute the value into the transformed equation, then undo any transformation on the response variable. If you transformed to , you must take the antilog ( to the power) at the end to return to the original units.