When a scatterplot is curved, how do the squared, log and reciprocal transformations straighten the data so a least-squares line can be fitted?
Recognise non-linear association from a scatterplot and residual plot, apply the squared, logarithmic or reciprocal transformation to the explanatory or response variable to linearise the data, fit a least-squares line to the transformed data, and use it to predict
A focused answer to the VCE General Mathematics Unit 3 Data analysis key-knowledge point on data transformation. Spotting curvature, the circle-of-transformations idea, applying the squared, log and reciprocal transformations, fitting a line to transformed data, and predicting back.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
VCAA wants you to handle bivariate data whose scatterplot is curved rather than linear. A least-squares line should only be fitted to a linear relationship, so when the residual plot shows a clear pattern you first transform one of the variables, using a squared, logarithmic or reciprocal transformation, to straighten the data. You then fit the least-squares line to the transformed data and use it to predict, remembering to undo the transformation at the end. This is the natural follow-on from correlation and regression.
Spotting that a transformation is needed
A single curved scatterplot, or a residual plot with a clear arch or U-shape rather than random scatter, signals that a straight line is the wrong model. Rather than abandon regression, you re-express one variable so that the relationship becomes linear.
Choosing the transformation
The direction the curve bends tells you which transformation to apply. Stretching the high end of the -axis (squaring ) or compressing it (log or reciprocal of ) shifts points to straighten the bulge. In the exam you are usually told which transformation to apply, or you pick the one that gives the better on the transformed data.
Fitting and predicting with a transformed model
Once a variable is transformed, treat the transformed quantity as a new variable and fit the least-squares line as normal.
Reading the transformed equation back
The fitted equation already contains the transformation, so prediction is just careful substitution. If the transformation was on the response variable, for example written as or with replaced by , you must undo it at the end: square both sides, or raise to the power. Always check whether the transformation sits on or on before predicting.
Why this matters for the exams
Transformation questions appear most years and reward students who keep track of which variable was transformed and who undo it correctly when predicting. They build directly on correlation and least-squares regression: the residual plot is the trigger, the transformation is the fix, and the prediction is the payoff. Show the transformed value explicitly in your working so a marker can follow each step.
Exam-style practice questions
Practice questions written in the style of VCAA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
2023 VCAA1 marksA scatterplot of tree height (m) against age (years) is linearised using a logarithm (base 10) transformation applied to the variable age. The equation of the least squares line is height = -3.8 + 12.6 x log10(age). Using this equation, the age, in years, of a tree with a height of 8.52 m is closest to A. 7.9 B. 8.9 C. 9.1 D. 9.5 E. 9.9Show worked answer →
Substitute height = 8.52 into the transformed equation and solve for age.
8.52 = -3.8 + 12.6 x log10(age).
12.6 x log10(age) = 8.52 + 3.8 = 12.32, so log10(age) = 12.32 / 12.6 = 0.97778.
age = 10^0.97778 = 9.50 years.
This is closest to 9.5, so the answer is D. Remember to undo the log by raising 10 to the power of both sides.
2025 VCAA1 marksA squared transformation is applied to the variable doctors (number per 1000 people) when modelling life expectancy in years, life. The equation of the least squares line fitted to this transformed data is of the form life = a + b x (doctors)^2. Using this equation, the predicted life, in years, for a country with two doctors per 1000 people is closest to A. 73.6 B. 74.0 C. 74.5 D. 74.9Show worked answer →
Using the data table, the squared transformation creates a new explanatory variable (doctors)^2. Fitting a least squares line of life on (doctors)^2 with a calculator gives, to four significant figures, life = 63.12 + 2.842 x (doctors)^2.
To predict for two doctors per 1000 people, substitute doctors = 2, so (doctors)^2 = 4.
life = 63.12 + 2.842 x 4 = 63.12 + 11.37 = 74.5 years.
This is closest to 74.5, so the answer is C. The key step is squaring the value before multiplying by the slope.