What does Pearson's correlation coefficient measure, and how is it interpreted?
Calculate and interpret Pearson's correlation coefficient using statistical technology, including the sign and magnitude
A focused answer to the HSC Maths Standard 2 dot point on Pearson's correlation coefficient. What measures, how to read its sign and magnitude, the strength scale, the non-linear limitation, computing it on a calculator, and worked Australian examples.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
What this dot point is asking
NESA wants you to do four things with Pearson's correlation coefficient for a bivariate dataset (a set of paired values, like height and weight for each person). Interpret what tells you. Tell the sign apart from the magnitude (the size). Know that does not work well for non-linear data. And compute from a dataset using the calculator's statistics functions. The number on its own earns no marks: the marks are for what it tells you about the relationship, said in the right words.
The answer
What measures
Pearson's correlation coefficient measures the strength and direction of the linear relationship between two variables. It is a single number that summarises a whole scatterplot, and it is bounded:
- : perfect positive linear relationship (every point exactly on a rising line).
- : perfect negative linear relationship (every point exactly on a falling line).
- : no linear relationship.
- The sign ( or ) gives the direction; the magnitude (how close to ) gives the strength.
The gallery above shows the same idea four times. The closer the points crowd around a single straight line, the closer is to . The looser the cloud, the closer is to . The direction of the slope sets the sign.
Strength descriptors
Standard 2 uses approximate verbal labels for :
| range | Strength |
|---|---|
| - | Very weak |
| - | Weak |
| - | Moderate |
| - | Strong |
| - | Very strong |
These bands are rough, and markers accept a reasonable adjacent label near a boundary (for example calling "moderate to strong"). Use the magnitude only, so is very strong even though it is negative.
Sign and magnitude are separate questions
This is the single most examined idea on this dot point, so it is worth seeing it isolated. The two plots below have exactly the same closeness of fit, so they are equally strong; only the sign differs, which flips the direction.
So when a question asks you to compare two coefficients, compare their magnitudes for strength and their signs for direction separately. is a stronger relationship than , even though one is negative; the cloud hugs its line much more tightly.
The linear-only limitation
Pearson's only detects linear association, meaning a straight-line trend. A dataset that follows a curve perfectly can still give close to zero. This happens because the rises and falls of the curve cancel out when you measure the straight-line trend. The plot below is a perfect U-shape: a textbook strong relationship, yet .
This is why the scatterplot comes first. A small rules out a straight-line trend, but it does not rule out a curved one. Always look at the plot before trusting the number.
Computing on a calculator
NESA-approved scientific calculators include statistics-mode (STAT) functions. The procedure is typically:
- Switch to statistics mode (e.g. MODE 2 STAT, then a 2-variable option).
- Enter the pairs into the two lists.
- Read from the regression-results menu.
You will not be asked to compute by hand. The marks come from entering the data correctly and interpreting the value, not from the arithmetic. Different calculator models label and reach the result in different ways. So practise on the exact calculator you will take into the exam, and clear old data before entering a new dataset.
Reading the sign off a scatterplot
If you only have the plot (no number), you can still state the sign and a rough magnitude:
- Cloud rises to the right: .
- Cloud falls to the right: .
- Tight band: near . Loose cloud: near . Round, tiltless blob: .
This is the same reading you did for direction and strength on the scatterplot, now phrased as the sign and magnitude of .
Correlation versus causation
A strong correlation does not prove causation. There are three ways to get a strong :
- causes .
- causes (reverse causation).
- a third variable causes both, so and move together as effects of a common cause.
The classic example: ice-cream sales and drownings are positively correlated. Hot weather drives both, but neither causes the other. In the exam, if a worded question invites a causal claim, state that shows association only and use cautious language.
How exam questions ask about
- "Interpret " Give strength (from ), direction (from the sign) and the word "linear", in one sentence: "a strong, negative, linear relationship".
- "Compare the correlation in datasets A and B." Compare magnitudes for strength and signs for direction. The larger is the stronger relationship regardless of sign.
- "Explain why a low does not mean no relationship." Because measures only linear association; a curved (non-linear) pattern can give . Look at the scatterplot.
- "Calculate for this data." Enter the pairs in statistics mode and read off; quote it to two decimal places.
- "Does this prove causes ?" No: correlation is not causation; a third variable may be responsible.
Edge cases worth knowing
- near a band boundary. Quote the value and give the nearest sensible label; a value like sits right on the strong/very strong line, so "strong, almost very strong" is fine.
- exactly on a clear curve. Deterministic but non-linear; report that misses it and point to the scatterplot.
- A high from a tiny sample. Two or three points can force near by accident. A large from very few pairs is not strong evidence.
- Restricted range. If the data only covers a narrow slice of , can look weaker than the true relationship over the full range. Standard 2 will not test this directly, but it is why the data range matters.
Exam-style practice questions
Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
2022 HSC-style3 marksA dataset of pairs gives Pearson's correlation coefficient . Interpret this value.Show worked answer →
The negative sign means the relationship is inverse: as increases, tends to decrease.
The magnitude is close to , indicating a strong linear relationship.
Overall, indicates a strong, negative, linear association between the two variables.
Markers reward identification of sign (direction), magnitude (strength) and the linear qualifier.
2023 HSC-style3 marksTwo datasets are presented. Dataset A has . Dataset B has . Describe the relationship in each, and explain why a low does not necessarily mean no relationship.Show worked answer →
Dataset A: strong positive linear relationship. As increases, increases, with points closely clustered around the line.
Dataset B: very weak or no linear relationship. The points show essentially no straight-line pattern.
A low value of measures only the linear association. A scatterplot may show a strong non-linear pattern (for example, parabolic, exponential or U-shaped), in which case Pearson's will be near zero despite a clear relationship. Always look at the scatterplot before relying on .
Markers reward describing both datasets correctly with sign, strength and linear qualifier, and the caveat that low does not preclude non-linear patterns.
Practice questions
Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.
foundation2 marksA study of pairs of data gives Pearson's correlation coefficient . (a) State the direction of the relationship. (b) State its strength. (c) Write a one-sentence interpretation that names strength, direction and the word "linear".Show worked solution →
- Read the sign for direction (a)
- The value is positive, so the relationship is positive: as one variable increases, the other tends to increase.
- Read the magnitude for strength (b)
- The size is , which falls in the to band, so the relationship is very strong.
- Write the full interpretation (c)
- Combine all three parts: indicates a very strong, positive, linear relationship between the two variables.
- Check
- The sentence names strength (very strong), direction (positive) and includes the word "linear", which is exactly what markers reward.
foundation3 marksA scatterplot of daily maximum temperature against the number of cold drinks sold shows points that rise steadily to the right and lie in a fairly tight band close to a straight line. (a) State whether is positive or negative. (b) Estimate whether is close to or close to . (c) Give a single value of that would be consistent with this description.Show worked solution →
- Read the direction from the slope (a)
- The cloud rises to the right (drink sales go up as temperature goes up), so the trend is positive and .
- Read the strength from the spread (b)
- The points lie in a fairly tight band close to a straight line, so the linear fit is strong and is close to , not close to .
- Choose a consistent value (c)
- A tight rising band matches a large positive value, for example . (Any value such as to is reasonable.)
- Check
- Positive sign matches the rising cloud and a magnitude near matches the tight band, so fits the description.
foundation4 marksA gardener waters five seedlings with different amounts of water and records the height after two weeks. The water (in litres) and height (in centimetres) are: , , , , . (a) Use the statistics mode of your calculator to find , correct to two decimal places. (b) Describe the relationship in words.Show worked solution →
- Enter the data into statistics mode (a)
- Clear any old data, switch to -variable statistics mode, and enter the five pairs into the and lists in order: and .
- Read the coefficient (a)
- From the regression-results menu the calculator returns (the unrounded value is ).
- Describe the relationship (b)
- The sign is positive and is in the to band, so this is a very strong, positive, linear relationship: more water is associated with greater height.
- Check
- Both and increase together and the points lie almost exactly on a line, so a value very close to is expected, agreeing with .
foundation3 marksTwo studies are reported. Study P has and study Q has . (a) Which study shows the stronger linear relationship? (b) Which shows a negative relationship? (c) Explain why a negative can still describe a stronger relationship than a positive one.Show worked solution →
- Compare magnitudes for strength (a)
- Strength depends only on the size . Here and , and , so study P shows the stronger linear relationship.
- Compare signs for direction (b)
- The sign gives the direction. Study P has a negative value, so study P shows the negative relationship.
- Explain the separation (c)
- Sign and magnitude answer different questions: the sign is only the direction, while the magnitude is the strength. A value of sits closer to than sits to , so its points hug the line more tightly and the relationship is stronger, even though it slopes downward.
- Check
- Study P is both stronger (larger magnitude) and negative, which is consistent: strength and direction are read separately.
core4 marksA student records the hours of revision and the test mark (out of ) for six classmates: , , , , , . (a) Find using your calculator, correct to two decimal places. (b) Interpret the value in context.Show worked solution →
- Enter the pairs into statistics mode (a)
- Clear old data, then enter and as six pairs.
- Read the coefficient (a)
- The calculator gives (the unrounded value is , rounded to two decimal places).
- Interpret in context (b)
- The sign is positive and lies in the to band, so there is a very strong, positive, linear relationship: more revision hours are associated with higher test marks.
- Check
- The marks generally climb as revision hours rise, with only small dips, so a large positive value near is sensible.
core4 marksThe daily recreational screen time (in hours) and the nightly sleep (in hours) for six teenagers are: , , , , , . (a) Find to two decimal places. (b) Describe the direction and strength. (c) State what the negative sign means in plain words.Show worked solution →
- Enter the data (a)
- In -variable statistics mode enter and .
- Read the coefficient (a)
- The calculator returns (unrounded ).
- Describe direction and strength (b)
- The sign is negative, so the direction is negative; is in the to band, so the relationship is very strong. Overall this is a very strong, negative, linear relationship.
- Explain the sign (c)
- The negative sign means the variables move in opposite directions: as screen time increases, nightly sleep tends to decrease.
- Check
- Sleep hours fall as screen time rises, so a strong negative value near is expected, matching .
core4 marksA health survey records hours spent outdoors per week and a vitamin D score for eight people: , , , , , , , . (a) Find to two decimal places. (b) Give the strength band. (c) Explain why is not exactly even though the trend is clearly upward.Show worked solution →
- Enter the eight pairs (a)
- In statistics mode enter and .
- Read the coefficient (a)
- The calculator gives (unrounded ).
- Give the strength band (b)
- The magnitude lies in the to band, so the linear relationship is very strong and positive.
- Explain why it is not exactly (c)
- A coefficient of exactly needs every point to sit on one straight line. Here the points rise overall but zig-zag slightly (for example dips from down to , and from down to ), so the fit is very strong but not perfect, giving rather than .
- Check
- The upward trend matches the positive sign, and the small wobbles explain why the value is high but below .
core3 marksA teacher wonders whether shoe size is linked to a spelling-test mark. For seven students the shoe size and mark are: , , , , , , . (a) Find to two decimal places. (b) Interpret the result. (c) Does this mean shoe size has no effect on spelling skill?Show worked solution →
- Enter the data (a)
- In statistics mode enter and .
- Read the coefficient (a)
- The calculator returns (unrounded value ).
- Interpret the result (b)
- With the value sits in the to band, so there is essentially no linear relationship between shoe size and spelling mark.
- Answer the effect question (c)
- A value near tells us there is no straight-line link in this sample; it does not "prove" anything about cause. We would not expect shoe size to affect spelling, and the near-zero is consistent with two unrelated variables.
- Check
- The marks bounce up and down with no upward or downward drift as shoe size grows, so a coefficient close to is exactly what we expect.
exam5 marksA used-car dealer records the age (in years) and the price (in thousands of dollars) of seven cars of the same model: , , , , , , . (a) Find to two decimal places. (b) Interpret the value fully. (c) A salesperson says "this proves that getting older causes a car to lose value". Comment on this claim.Show worked solution →
- Enter the data into statistics mode (a)
- Clear old data, then enter and as seven pairs.
- Read the coefficient (a)
- The calculator gives (unrounded ).
- Interpret fully (b)
- The sign is negative, so the direction is negative; is in the to band, so the relationship is very strong. In context: there is a very strong, negative, linear relationship, so older cars of this model tend to be cheaper.
- Comment on the causal claim (c)
- A strong shows association, not proof of cause. While age plausibly contributes here, alone cannot establish causation: other factors (kilometres driven, condition, demand) move with age and also affect price. State the link as an association and use cautious language.
- Check
- Price falls steadily as age rises, so a value very close to is expected, agreeing with .
exam5 marksA school investigates whether class attendance is linked to the final exam mark. For eight classes the average attendance (in days per term) and average mark (out of ) are: , , , , , , , . (a) Find to two decimal places. (b) Describe the relationship. (c) Explain what a student should look at before trusting this value.Show worked solution →
- Enter the eight pairs (a)
- In statistics mode enter and .
- Read the coefficient (a)
- The calculator gives (unrounded ).
- Describe the relationship (b)
- The sign is positive and is in the to band, so there is a very strong, positive, linear relationship: higher attendance is associated with higher exam marks.
- State the check before trusting it (c)
- Always look at the scatterplot first. A high measures only linear association, so a curved pattern or a single outlier could distort the picture. Confirm the cloud genuinely follows a straight-line trend before relying on .
- Check
- Marks climb steadily as attendance rises with only small wobbles, so a value very close to is sensible.
exam6 marksTwo data sets are collected. Data set A pairs an advertising spend with weekly sales : , , , , , . Data set B pairs the hours of machine downtime with units produced : , , , , , . (a) Find for each set, correct to two decimal places. (b) State which relationship is stronger and justify your answer. (c) Describe each relationship in one sentence.Show worked solution →
- Find for data set A (a)
- Enter and in statistics mode; the calculator gives (unrounded ).
- Find for data set B (a)
- Clear the lists, then enter and ; the calculator gives (unrounded ).
- Compare magnitudes (b)
- Strength is set by the size only: and . Since , data set A has the stronger linear relationship, even though B is negative.
- Describe each set (c)
- Data set A: a very strong, positive, linear relationship (more advertising is associated with more sales). Data set B: a very strong, negative, linear relationship (more downtime is associated with fewer units produced).
- Check
- A rises together so its sign is positive, B falls so its sign is negative, and both clouds hug their lines tightly, so two large magnitudes are expected with A slightly larger.
exam5 marksA science class measures a quantity at seven settings of a control and records: , , , , , , . (a) Find to two decimal places. (b) The class concludes "there is no relationship between and ". Explain why this conclusion is wrong. (c) State the lesson about using .Show worked solution →
- Enter the data (a)
- In statistics mode enter and .
- Read the coefficient (a)
- The calculator returns (unrounded ).
- Explain why the conclusion is wrong (b)
- A value of means there is no linear relationship, not no relationship at all. Plotting the points shows a clear U-shape (a strong non-linear pattern): falls to a minimum at and then rises symmetrically. The fall on the left cancels the rise on the right, forcing to even though the pattern is obvious.
- State the lesson (c)
- Pearson's measures linear association only, so always look at the scatterplot before trusting it. A small rules out a straight-line trend but not a curved one.
- Check
- The values are symmetric about , so by symmetry the linear trend must cancel to give , confirming the calculator value.
