Year 12: Statistical Analysis

NSWMaths Standard 2Syllabus dot point

How is the least-squares regression line calculated, and how is it used to model a linear relationship between two variables?

Find and use the equation of the least-squares regression line to model a linear relationship between two variables

A focused answer to the HSC Maths Standard 2 dot point on the least-squares regression line. The equation y=mx+by = mx + b, finding the gradient and intercept using calculator statistics functions, interpreting the gradient in context, and worked Australian examples.

Generated by Claude OpusReviewed by Better Tuition Academy7 min answer

Have a quick question? Jump to the Q&A page

What this dot point is asking

NESA wants you to find the equation of the least-squares regression line using calculator statistics functions, write it in y=mx+by = mx + b form, use it to predict yy from xx, and interpret the gradient and intercept in the context of the worded problem.

The answer

Scatterplot with least-squares regression line A scatter of data points with an upward trend, overlaid with the best-fit straight line that minimises the sum of squared vertical distances from each point to the line. x y y = mx + b vertical distances to line are residuals

The least-squares regression line

For bivariate data, the least-squares regression line is the straight line that minimises the sum of the squared vertical distances from the data points to the line. It is the standard "best-fit" line for linear association.

The equation has the form:

y=mx+by = m x + b

where mm is the gradient and bb is the yy-intercept.

Finding the line on a calculator

You will not be asked to compute the gradient and intercept by hand. The procedure on a NESA-approved scientific calculator:

  1. Enter statistics mode (typically MODE STAT 2-VAR or similar).
  2. Enter the (x,y)(x, y) pairs into the statistical lists.
  3. Read off mm (sometimes labelled aa or B) and bb (sometimes labelled AA or aa) from the regression-result menu.
  4. Read off rr (correlation coefficient) at the same time.

Different calculator models label these differently. Practise on the exact model you will use in the exam.

Predicting yy from IMATH_18

Substitute the xx value into the line equation. This is the model's predicted yy at that xx. The actual value may be slightly different; the line is the best linear fit, not a guarantee.

Interpreting the gradient

The gradient mm has units of (y units) per (x unit). For every unit increase in xx, yy changes by mm units on average.

Always include the word "average" or "on average" and the units in your answer. Markers reward this explicitly.

Interpreting the yy-intercept

The intercept bb is the predicted yy value when x=0x = 0. In context this is sometimes meaningful (e.g. base salary at zero years of experience) and sometimes extrapolation (e.g. predicted food spending at zero income).

If x=0x = 0 lies well outside the dataset, comment that the intercept is an extrapolation and may be unreliable.

When to use the line

The least-squares line is appropriate when:

  • The scatterplot suggests an approximately linear relationship.
  • The correlation coefficient r|r| is moderately strong or stronger.
  • There are no extreme outliers distorting the fit.

If the scatterplot is clearly non-linear, the line will be a poor model even if rr is not zero.

Past exam questions, worked

Real questions from past NESA papers on this dot point, with our answer explainer.

2022 HSC Q214 marksFor a dataset of 5050 pairs, calculator output gives gradient m=2.5m = 2.5 and intercept b=8b = -8 for the least-squares regression line. Write the equation, predict yy when x=12x = 12, and interpret the gradient.
Show worked answer →

Equation: y=2.5x8y = 2.5 x - 8.

At x=12x = 12: y=2.5×128=308=22y = 2.5 \times 12 - 8 = 30 - 8 = 22.

Interpretation: for every increase of 11 in xx, yy increases by 2.52.5 (on average, according to the model).

Markers reward the equation, the substitution, and an interpretation of the gradient that uses the word "average" or "on average" to acknowledge the model is a best fit, not exact.

2023 HSC Q213 marksA linear model of weekly food spending (yy, \)onweeklyincome() on weekly income (x,, \$)for) for 80Sydneyhouseholdsgivestheline Sydney households gives the line y = 0.18 x + 95.Interpretthegradientandthe. Interpret the gradient and the y$-intercept in this context.
Show worked answer →

Gradient 0.180.18: for every extra dollar of weekly income, the household spends, on average, an additional \0.18onfood.Inotherwords,about on food. In other words, about 18%$ of each additional dollar of income goes to food.

yy-intercept 9595: a household with zero income is predicted to spend \95perweekonfood.Thisisthemodelsbaseline.Inpractice, per week on food. This is the model's baseline. In practice, \00 income is well outside the dataset, so the intercept may not be reliable (extrapolation).

Markers reward the gradient interpretation in context with units, the intercept interpretation in context, and a brief caveat about extrapolation for the intercept.

Related dot points