Topic 1: Bivariate data analysis - how do we describe and measure the relationship between two numerical variables?
Construct a scatterplot, describe the association between two numerical variables in terms of direction, form and strength, calculate and interpret Pearson's correlation coefficient and the coefficient of determination , and recognise that correlation does not imply causation
A focused answer to the QCE General Mathematics Unit 3 dot point on bivariate data. Covers scatterplots, describing association by direction, form and strength, Pearson's correlation coefficient , the coefficient of determination , and the difference between correlation and causation, with CAS-supported worked examples for IA2 and the external assessment.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
What this dot point is asking
QCAA wants you to take a set of paired measurements on two numerical variables, plot them, describe the relationship in words, and then quantify that relationship with Pearson's correlation coefficient and the coefficient of determination . You also need to interpret these numbers in context and resist the classic trap of reading correlation as causation. Bivariate analysis is the foundation of Unit 3 Topic 1 and feeds straight into least-squares regression. It appears in IA1, IA2 and the external assessment.
The answer
The explanatory and response variables
Bivariate data consists of pairs measured on the same individuals. The explanatory variable (also called the independent variable) is the one we believe may explain or predict the other, plotted on the horizontal axis. The response variable (dependent variable) is plotted on the vertical axis. Choosing these correctly matters because the regression line you fit later depends on which variable is which.
Constructing and reading a scatterplot
A scatterplot displays each pair as a single point. Once plotted, describe the association using three features.
- Direction. Positive (as increases, tends to increase) or negative (as increases, tends to decrease).
- Form. Linear (points cluster about a straight line) or non-linear (a curve). Pearson's is only valid for linear form.
- Strength. How tightly the points cluster about the underlying pattern: strong, moderate or weak.
Always also scan for outliers, which are points that sit far from the main cluster and can distort .
Pearson's correlation coefficient
For linear associations, Pearson's correlation coefficient measures the direction and strength of the linear relationship on a scale from to :
You are not expected to compute this sum by hand in General Mathematics; you read off CAS after entering the paired data. Interpret the value as follows.
- perfect positive linear; perfect negative linear; no linear association.
- strong; moderate; weak; very weak.
The sign of always matches the direction of the scatter, so a negative slope gives a negative .
The coefficient of determination
The coefficient of determination is simply , the square of the correlation coefficient. It is usually quoted as a percentage and interpreted as the proportion of the variation in the response variable that is explained by the linear relationship with the explanatory variable.
For example, if then , so 81 percent of the variation in the response variable is explained by the explanatory variable, and the remaining 19 percent is due to other factors or random variation. A full-mark interpretation always names both variables in context.
Correlation is not causation
A strong tells you two variables move together; it does not tell you that one causes the other. The apparent link may be coincidental, may run in the reverse direction, or may be driven by a hidden third variable (a confounding or lurking variable). QCAA examiners reliably award marks for explicitly stating that an observed correlation does not establish a causal mechanism and for naming a plausible confounder.
Exam-style practice questions
Practice questions written in the style of QCAA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
2022 QCAA5 marksThe maximum temperature and the number of pies sold each day at a bakery are provided in the table. Maximum temperature (degrees C): 29, 20, 31, 27, 23, 25, 22, 33. Number of pies sold: 32, 39, 25, 33, 37, 35, 37, 30. a) Construct a scatterplot to display the data on the grid provided. [3 marks] b) Describe the association between the maximum temperature and the number of pies sold in terms of direction and strength. [2 marks]Show worked answer →
a) Scatterplot (3 marks). Put the explanatory variable, maximum temperature, on the horizontal axis and the response variable, pies sold, on the vertical axis (1 mark for choosing axes correctly). Scale each axis evenly to span the data, for example temperature 20 to 33 and pies 25 to 39, and label both axes with units (1 mark for scaling and labelling). Plot all eight points accurately, e.g. (29, 32), (20, 39), (31, 25) (1 mark).
b) Description (2 marks). As temperature rises the number of pies sold tends to fall, so the direction is negative (1 mark). The points lie fairly close to a straight line, so the strength is strong (1 mark). A full description reads: a strong, negative, linear association.
Markers want direction and strength as separate, explicitly named features. "Linear" describes form and is a useful third word, but the two marks here are for negative and strong.
2023 QCAA4 marksHiroki believes more fish are caught on warmer days. Jiro believes the number of fish caught is more dependent on the number of people fishing. Temperature, t (degrees C): 32, 26, 20, 27, 23, 29. Number of fish caught, f: 530, 400, 320, 220, 180, 120. Number of people fishing, p: 46, 58, 38, 34, 30, 28. Calculate the correlation coefficient for each dataset and use the results to identify the explanatory variable for the stronger linear association. Use the least-squares line equation for the stronger association to predict the number of fish caught on a 25 degrees C day when 50 people are fishing.Show worked answer →
Step 1 - two correlations (1 mark). With your calculator in two-variable mode, r for temperature versus fish is about 0.31, and r for people fishing versus fish is about 0.81.
Step 2 - identify the explanatory variable (1 mark). Because 0.81 is much closer to 1 than 0.31, the number of people fishing has the stronger linear association with fish caught, so people fishing (p) is the explanatory variable. Jiro is correct.
Step 3 - least-squares line for the stronger association (1 mark): fitting f on p gives f = 10.89p - 129.8 (your calculator reports the coefficients to a few decimals).
Step 4 - predict (1 mark). The 25 degrees C is a distractor, since temperature is not the chosen explanatory variable. Substitute p = 50: f = 10.89 x 50 - 129.8 = 414.8, so about 415 fish.
The lesson is to let the correlation coefficient decide which variable to model, then ignore the irrelevant one.