How do we describe and model the relationship between two numerical variables?
Analyse bivariate data using scatterplots, correlation, and least-squares regression lines.
Scatterplots, correlation coefficient, the coefficient of determination, and least-squares regression for prediction in TCE Mathematics Applications.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
What this dot point is asking
Bivariate data is paired data: each subject gives two numbers, such as a person's height and weight. The whole topic is about whether one variable helps predict the other, and how strongly.
Describing a scatterplot
When you read a scatterplot, comment on four features: form (linear or non-linear), direction (positive or negative), strength (how tightly the points cluster about a line), and any outliers. A positive direction means tends to rise as rises.
Correlation coefficient
Pearson's correlation coefficient measures the strength and direction of a linear relationship. It always lies between and . Values near mean a strong linear pattern; values near mean little or no linear relationship.
Coefficient of determination
The coefficient of determination is simply . It gives the proportion of the variation in the response variable that is explained by the linear relationship with the explanatory variable. If then , so about of the variation in is explained by (and is due to other factors).
Least-squares regression line
The least-squares line is the straight line that minimises the sum of the squared vertical distances from the points to the line. In a TCE exam you usually read (intercept) and (slope/gradient) from technology, then interpret them.
Interpolation and extrapolation
Predicting inside the range of the data (interpolation) is reasonably safe. Predicting outside the range (extrapolation) is risky because the linear pattern may not continue. State which you are doing whenever you make a prediction.
Reporting good practice: quote to two decimals, state the form/direction/strength of the relationship, give the equation of the line, and interpret slope and intercept in the words of the context.
Exam-style practice questions
Practice questions written in the style of TASC exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
2024 TASC General Mathematics6 marksTable 5 shows a measure of blood pressure for a random selection of people of different ages. a) Assuming Age to be the independent (x) variable, use the regression function on your calculator to find a linear equation representing this data. Use variables A (age) and B (blood pressure) and three decimal places. b) Find r and r-squared to four decimal places and interpret the correlation coefficient (r). c) Use your equation to predict the blood pressure of a 22 year old.Show worked answer →
a) (2 marks) Enter the eight (Age, Blood Pressure) pairs into the calculator's linear regression and read off slope and intercept to 3 dp, giving a model of the form B = 0.755A + 102.439 (use the values your calculator returns from the data).
b) (3 marks) From the same regression, r is approximately 0.6049 and r-squared is approximately 0.3659 (4 dp). Interpret r: there is some (weak to moderate) positive linear correlation between age and blood pressure, that is, blood pressure tends to rise as age increases, but the association is not strong.
c) (1 mark) Substitute A = 22 into the equation: B = 0.755(22) + 102.439, which is about 119 mmHg. Markers reward including the units mmHg.
2024 TASC General Mathematics2 marksTable 6 gives the women's Olympic 400m freestyle winning time (in seconds) from 1924 to 1960, with columns for xy and x squared. Given Sigma x = 44, Sigma y = 2581, Sigma xy = 13658, Sigma x squared = 324 and n = 8, use the regression formula from your Information Sheet to calculate values for a and b (the slope and y-intercept of the regression line). Give values to two decimal places.Show worked answer →
(2 marks) Use the least-squares formulae from the Information Sheet.
Slope b = (n Sigma xy - Sigma x Sigma y) / (n Sigma x squared - (Sigma x) squared) = (8 x 13658 - 44 x 2581) / (8 x 324 - 44 x 44) = (109264 - 113564) / (2592 - 1936) = -4300 / 656 = -6.55 (2 dp).
Intercept a = (Sigma y - b Sigma x) / n = (2581 - (-6.55)(44)) / 8 = (2581 + 288.20) / 8 = 358.65 (2 dp). So the regression line is Time = -6.55x + 358.65 (small differences arise from rounding of b).