Skip to main content
ExamExplained
NSW · Maths Standard 2
Maths Standard 2 study scene
§-Syllabus dot point
NSWMaths Standard 2Syllabus dot point

How do scatterplots reveal the form, direction and strength of the relationship between two variables?

Construct and interpret scatterplots to describe the relationship between two variables in bivariate data

A focused answer to the HSC Maths Standard 2 dot point on scatterplots. Reading form, direction and strength of association, identifying outliers, correlation versus causation, a stage-by-stage construction, and worked Australian examples using study, climate and household data.

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

What this dot point is asking

NESA wants you to do three things. First, build a scatterplot from a table of bivariate data (data where you measure two things about each item). Second, describe the form, direction and strength of the relationship between the two variables. Third, identify any outliers and suggest what might have caused them. Almost every Statistical Analysis paper opens with a scatterplot, because it is the picture the rest of the module rests on. The correlation coefficient and the least-squares line are just numbers that put a value on what the scatterplot already shows you. Read the plot well and the calculator work that follows is almost automatic.

The answer

Scatterplot showing a strong positive linear association About eighteen data points clustered along an upward sloping line indicating a strong positive linear relationship between the x and y variables. x y strong positive linear

What a scatterplot is

A scatterplot is a graph of one variable against another with each data point plotted as a single dot. By convention:

  • xx-axis: the independent (or explanatory) variable, the one you think drives the other.
  • yy-axis: the dependent (or response) variable, the one you think responds.

For each (x,y)(x, y) pair in the data, plot one dot at that location. Do not connect the dots: a scatterplot shows a cloud of separate observations, not a single quantity changing over time. Joining them turns it into a line graph and tells the marker you have misread the question.

Plot the data before you calculate. Your eyes catch things a single number hides, such as a curve, two separate clusters, or one wild point. The correlation coefficient and the regression line both assume the pattern is a straight line. The scatterplot is your check that this assumption is reasonable in the first place.

Describing the relationship: form, direction, strength

Three descriptors, always all three:

  • Form. Linear (points cluster around a straight line), non-linear (they follow a curve), or no clear pattern at all.
  • Direction. Positive (as xx increases, yy tends to increase, so the cloud rises to the right), negative (as xx increases, yy tends to decrease, so the cloud falls to the right), or no association.
  • Strength. Strong (points cluster tightly around the pattern), moderate, weak, or no association. Strength is about scatter: the less the points spread away from the trend, the stronger the relationship.

State all three when describing a scatterplot. Markers expect the explicit words "linear", "positive" or "negative", and "strong", "moderate" or "weak", with a one-line justification drawn from the plot ("the points rise steadily with little scatter").

Direction and strength are independent. A cloud can be strongly negative (tight and falling) or weakly positive (loose and rising). Do not let a steep-looking slope trick you into calling something strong: strength is set by the tightness of the scatter, not by how steep the trend is.

Outliers

An outlier is a point that lies far from the bulk of the data, well away from the pattern the other points form. Outliers matter because they can drag the correlation coefficient and the least-squares line a long way. A single odd point can distort the whole analysis.

Causes:

  • Data error. A wrong digit entered, a wrong unit, a transcription slip.
  • Genuine atypical case. A real but rare observation, for example a household with unusual circumstances.
  • Subgroup effect. Two distinct populations plotted together, where one group sits apart from the other.

Decide whether to remove an outlier based on its cause. Errors should be corrected or removed. Genuine atypical cases should usually be kept in and reported with a comment. Never delete a point just because it is inconvenient. In the exam, say what the outlier is, suggest a plausible reason, and state your decision with that reason.

Correlation does not imply causation

A strong positive correlation between xx and yy does not prove that xx causes yy. A third variable (a hidden factor that affects both) may drive them, or the link may be coincidence. Standard 2 expects you to flag this whenever a worded question invites a causal claim. Use cautious language ("is associated with", "tends to") unless the question hands you an actual mechanism. Here is a favourite example: ice-cream sales and shark attacks both rise together, but neither causes the other. Hot weather drives both.

Reading scatterplots quickly

A pattern checklist for the opening glance:

  • Straight band going up to the right: positive linear.
  • Straight band going down to the right: negative linear.
  • A curve (U-shape, hump, or steep-then-flat): non-linear. A straight-line model and the correlation coefficient will both mislead here.
  • A shapeless cloud with no tilt: no association.
  • Two distinct clouds: a subgroup effect, often best modelled as two separate relationships rather than one.

Construct a scatterplot, stage by stage

Turning a table into a described scatterplot is a short, repeatable routine. Below it is built up one stage at a time, using weekly study hours (xx) against an HSC trial mark out of 100100 (yy) for a class of 1414 students.

Stage 1, set up and scale the axes, then plot the first points. Put the independent variable (study hours) on the horizontal axis and the response (mark) on the vertical axis. Choose scales that spread the data across the whole plot. Hours run 00 to about 2424 and marks 3030 to 100100, so the axes need not start at zero on the mark side. Then plot the first few pairs, here (2,41)(2, 41), (4,52)(4, 52) and (5,49)(5, 49), as single dots.

Set up the axes and plot the first pointsAxes labelled study hours per week against exam mark out of one hundred, with the first three data pairs from the table plotted as dots in the accent colour.study hours / weekmark / 10004812162024405060708090100Step 1Hours on the x-axis, mark on the y-axis. Plot (2, 41), (4, 52), (5, 49).

Stage 2, plot every data pair. Continue until all 1414 points are on the plot. Each pair is one dot, placed at its (x,y)(x, y) position and left unconnected. The shape now emerges: a cloud running from the lower left to the upper right.

Plot every data pairThe full scatterplot with all fourteen data pairs plotted as separate dots, rising from lower left to upper right and not joined by lines.study hours / weekmark / 10004812162024405060708090100Step 2Plot all 14 pairs as separate dots. Never join them with lines.

Stage 3, describe the form, direction and strength. Look at the cloud as a whole. The points sit close to a single straight band that rises to the right, so the form is linear, the direction is positive (more study hours, higher marks) and the strength is strong (little scatter away from the band). A faint guide line through the middle of the cloud makes the trend easy to see.

Describe form, direction and strengthThe scatterplot with a faint straight guide line through the cloud, annotated as a linear form, a positive direction and a strong association.study hours / weekmark / 10004812162024405060708090100linear, positive, strongStep 3Points hug a rising straight line: form linear, direction positive,strength strong.

Stage 4, check for outliers. Scan for any point sitting well away from the band. Suppose one student studied 1919 hours but scored only 4747. That point sits far below the trend (you would expect a mark near 8585 at 1919 hours), so it is an outlier. Flag it, give a plausible reason (illness on exam day, perhaps), and state whether you keep it. Because it looks like a genuine atypical case rather than a typing error, you would normally keep it and comment.

Identify the outlierThe scatterplot with one extra point at about nineteen study hours and a mark of forty-seven, sitting well below the rising trend line. It is marked as the outlier with a cross through it, an encircling ring, and a leader line to a text label reading outlier nineteen comma forty-seven, so it is identifiable without relying on colour.study hours / weekmark / 10004812162024405060708090100outlier (19, 47)Step 4One student studied 19 h but scored 47: far below the trend.Flag it and explain it; decide whether to keep or remove it.

How exam questions ask about scatterplots

The wording varies but the task is almost always one of these:

  • "Describe the relationship shown in the scatterplot." Give all three descriptors: form, direction, strength, each with a one-line justification from the plot.
  • "Construct a scatterplot from the table." Set up and scale the axes (independent variable on xx), plot each pair, do not join the dots, label both axes with their quantity and units.
  • "Identify any outliers and suggest a cause." Name the outlying point by its coordinates, give a plausible reason, and say whether you would keep or remove it and why.
  • "Does the scatterplot show that xx causes yy?" No: a scatterplot shows association, not cause. Mention the chance of a third variable and answer in cautious language.
  • "Is a linear model appropriate?" Only if the form is linear. If the cloud curves, say a straight-line model (and the correlation coefficient) would be misleading.

The verbs map straight to the method: "construct" or "plot" means draw it; "describe" means form, direction, strength; "comment on" or "discuss" usually means address outliers or causation as well.

Edge cases worth knowing

  • A subgroup effect masquerading as one relationship. If a plot shows two separate clouds (say, weekday and weekend data), a single trend through both can be meaningless. Describe each group, or model them separately.
  • A curve that a quick glance reads as a line. Spending versus income often bends (it flattens at high income). If the cloud curves, the form is non-linear and a straight-line summary understates the truth.
  • No association. A round, tiltless cloud has no direction and no strength. Say so plainly rather than forcing a trend onto it.
  • One outlier hiding a trend, or creating one. A single extreme point can both inflate and deflate an apparent relationship, which is why you flag it before moving on to the correlation coefficient.

Exam-style practice questions

Practice questions written in the style of NESA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

2022 HSC-style3 marksA scatterplot of xx (years of schooling) and yy (weekly income, in $) shows the points clustered closely around an upward-sloping line. Describe the form, direction and strength of the relationship between xx and yy.
Show worked answer →

Form: linear (points cluster around a straight line).

Direction: positive (as years of schooling increase, weekly income tends to increase).

Strength: strong (points cluster closely; little scatter around the line).

Markers reward all three descriptors and the use of the words "linear", "positive", "strong" with brief justification from the plot.

2023 HSC-style4 marksA scatterplot of household income (xx, in $000 per year) against hours of recreational TV viewing per week (yy) for 5050 Australian households shows a moderately strong downward trend. There is one obvious outlier at $30000 per year and 55 hours per week. Describe the relationship and discuss the outlier.
Show worked answer →

Form: roughly linear (the trend is steady, not curved).

Direction: negative (higher income associated with less TV).

Strength: moderate (not all points sit tightly on a line, but a clear trend).

Outlier: at (30000,5)(30000, 5), i.e. $30000 income, watching only 55 hours despite low income, well below the trend line. Possible explanations include working multiple jobs, having young children, or another lifestyle factor. The outlier should be considered for removal if it is a data error or a genuinely atypical case; otherwise it stays in the analysis.

Markers reward all three descriptors, identification of the outlier, and a brief sensible interpretation.

Practice questions

Original practice questions graded from foundation to exam level, each with a full worked solution. Try them before revealing the solution.

foundation1 marksA researcher studies whether the number of hours of sunlight a tomato plant receives each day affects the number of tomatoes it produces. (a) State the independent variable. (b) State the dependent variable. (c) State which variable belongs on the horizontal axis of a scatterplot.
Show worked solution →
Decide which variable drives the other
The study asks whether sunlight affects the harvest, so sunlight is the variable doing the driving and the tomato count is the variable responding.
Part (a) independent variable
The independent (explanatory) variable is the hours of sunlight per day.
Part (b) dependent variable
The dependent (response) variable is the number of tomatoes produced.
Part (c) horizontal axis
By convention the independent variable goes on the horizontal axis, so hours of sunlight goes on the xx-axis (and tomato count on the yy-axis). Check: the wording "whether sunlight affects the harvest" names sunlight as the cause, which matches putting it on the xx-axis.
foundation2 marksAt six cafes, the daytime temperature (xx, in degrees Celsius) and the number of iced coffees sold (yy) are recorded. On the scatterplot the points rise steadily from the lower left to the upper right and lie close to a straight line. (a) State the direction of the association. (b) State whether higher temperatures are linked to more or fewer iced coffees sold.
Show worked solution →
Read the tilt of the cloud
Points that rise from the lower left to the upper right mean that as xx increases, yy also increases.
Part (a) direction
An upward trend is a positive association.
Part (b) plain-language reading
A positive association here means hotter days are linked to more iced coffees sold.
Answer
The association is positive, so higher temperatures are linked to more iced coffees. Check: a rising cloud and "more sold when it is hotter" are two ways of saying the same positive trend, so they agree.
foundation2 marksA scatterplot of a car's age (xx, in years) against its resale value (yy, in thousands of dollars) shows the points falling from the upper left to the lower right and clustered tightly around a straight line. Describe the relationship by giving its form, direction and strength.
Show worked solution →
Form
The points cluster around a single straight line, so the form is linear.
Direction
The cloud falls from the upper left to the lower right, so as age increases value decreases: the direction is negative.
Strength
The points lie tightly around the line with little scatter, so the strength is strong.
Answer
The scatterplot shows a strong negative linear relationship between a car's age and its resale value. Check: older cars being worth less is the expected real-world pattern, so a strong negative trend is sensible.
foundation3 marksThe table records, for five students, the number of days absent (xx) and the test mark out of 1010 (yy): (2,9)(2, 9), (3,7)(3, 7), (5,6)(5, 6), (6,4)(6, 4), (8,2)(8, 2). (a) State the range of the number of days absent. (b) As days absent increase, do the marks rise or fall? (c) Name the direction of the association.
Show worked solution →

Part (a) range of xx. The days absent run from a smallest of 22 to a largest of 88, so the range is

82=6 days8 - 2 = 6 \text{ days}

Part (b) trend in the marks
Reading the marks in order gives 9,7,6,4,29, 7, 6, 4, 2, which decrease at every step, so as days absent increase the marks fall.
Part (c) direction
A trend where one variable rises while the other falls is a negative association.
Answer
The range of days absent is 66, marks fall as absences rise, and the association is negative. Check: each extra day absent comes with a lower mark in the table, which matches a negative trend.
core4 marksA small business records monthly advertising spend (xx, in hundreds of dollars) and units sold (yy) for six months: (10,15)(10, 15), (20,26)(20, 26), (30,34)(30, 34), (40,47)(40, 47), (50,52)(50, 52), (60,66)(60, 66). (a) State the independent variable. (b) Find the mean advertising spend. (c) Find the mean number of units sold. (d) Describe the form and direction of the association.
Show worked solution →

Part (a) independent variable. Advertising spend is chosen by the business and is expected to drive sales, so xx (advertising spend) is the independent variable.

Part (b) mean of xx. Add the spends and divide by 66:

xˉ=10+20+30+40+50+606=2106=35\bar{x} = \frac{10 + 20 + 30 + 40 + 50 + 60}{6} = \frac{210}{6} = 35

so the mean spend is 3535 hundred dollars.

Part (c) mean of yy. Add the units sold and divide by 66:

yˉ=15+26+34+47+52+666=2406=40\bar{y} = \frac{15 + 26 + 34 + 47 + 52 + 66}{6} = \frac{240}{6} = 40

so the mean is 4040 units.

Part (d) form and direction. The values of yy rise steadily as xx rises and lie close to a straight line, so the form is linear and the direction is positive. Check: both means fall inside the data ranges (xx from 1010 to 6060, yy from 1515 to 6666), which is what a mean must do.

core3 marksSeven swimmers record training sessions per week (xx) and laps completed in a time trial (yy): (1,20)(1, 20), (2,24)(2, 24), (3,28)(3, 28), (4,32)(4, 32), (5,12)(5, 12), (6,40)(6, 40), (7,44)(7, 44). (a) Identify the outlier. (b) Estimate the laps you would expect at 55 sessions if it followed the trend. (c) Suggest a plausible cause for the outlier.
Show worked solution →
Spot the broken step
Apart from one point the laps climb by 44 each time: 20,24,28,32,,40,4420, 24, 28, 32, \ldots, 40, 44. The point (5,12)(5, 12) sits far below this rising band.
Part (a) outlier
The outlier is (5,12)(5, 12): 55 sessions but only 1212 laps.
Part (b) expected value
Following the steady increase of 44 laps per session, after (4,32)(4, 32) the next value should be

32+4=36 laps32 + 4 = 36 \text{ laps}

so about 3636 laps would be expected at 55 sessions, far above the recorded 1212.

Part (c) cause. A plausible cause is a genuine atypical case, such as the swimmer being unwell or injured during that trial, or a recording error. Check: 3636 sits neatly between the neighbours 3232 and 4040, confirming 1212 is well off the trend.

core3 marksA scatterplot of height (xx) against shoe size (yy) is drawn for a class that contains both Year 7 students and Year 12 students. The plot shows two separate clouds of points: one cluster at shorter heights and smaller shoe sizes, and another at taller heights and larger shoe sizes, with a gap between them. (a) What feature of the data does the gap suggest? (b) Explain why fitting a single straight line to all the points could be misleading.
Show worked solution →
Read the shape
Two distinct clouds with a gap between them is the signature of two different groups plotted together, not one smooth relationship.
Part (a) feature
The gap suggests a subgroup effect: the class is really two populations, the Year 7 students and the Year 12 students, each forming its own cluster.
Part (b) why one line misleads
A single straight line drawn through both clouds would pass through the empty gap where no student actually sits, so it would describe nobody well and could overstate how strongly height and shoe size are related within either year group. It is usually better to describe each group separately or model them apart. Check: the trend you would report for thirteen-year-olds need not match the trend for eighteen-year-olds, which is exactly why combining them can mislead.
core3 marksAcross many Australian beach towns, a survey finds a strong positive association between the number of ice creams sold on a day (xx) and the number of people treated for sunburn that day (yy). (a) Describe the strength and direction of the association. (b) Does this prove that buying ice creams causes sunburn? (c) Suggest a third variable that could explain the link.
Show worked solution →
Part (a) strength and direction
A strong positive association means that on days with high ice cream sales the sunburn count also tends to be high, with the points clustering tightly around a rising trend.
Part (b) causation
No. A scatterplot shows association, not cause. A strong correlation between xx and yy does not prove that one causes the other.
Part (c) third variable
Hot, sunny weather is a plausible third variable: it independently drives both more ice cream sales and more time in the sun, hence more sunburn. Check: replacing the false story "ice cream burns skin" with "sun does both" explains the link without either variable causing the other, so cautious language ("is associated with") is the correct answer style.
exam5 marksA study of eight teenagers records daily phone use (xx, in hours) and a sleep-quality score out of 100100 (yy): (1,94)(1, 94), (2,88)(2, 88), (3,82)(3, 82), (4,76)(4, 76), (5,70)(5, 70), (6,64)(6, 64), (7,58)(7, 58), (8,52)(8, 52). (a) State the dependent variable. (b) Find the mean sleep-quality score. (c) Find the range of the sleep-quality scores. (d) Describe the form, direction and strength of the association. (e) Is it correct to conclude that more phone use causes poorer sleep? Justify your answer.
Show worked solution →

Part (a) dependent variable. Sleep quality is expected to respond to phone use, so yy (the sleep-quality score) is the dependent variable.

Part (b) mean of yy. Add the eight scores and divide by 88:

yˉ=94+88+82+76+70+64+58+528=5848=73\bar{y} = \frac{94 + 88 + 82 + 76 + 70 + 64 + 58 + 52}{8} = \frac{584}{8} = 73

so the mean sleep-quality score is 7373.

Part (c) range of yy. The scores run from 5252 to 9494, so

9452=4294 - 52 = 42

Part (d) form, direction, strength. The scores fall by 66 at every step, so the points lie exactly on a straight falling line: the form is linear, the direction is negative, and the strength is strong (no scatter at all).

Part (e) causation. Not necessarily. The data show a strong negative association, but a scatterplot cannot prove cause; a third variable (for example, less time in bed, or a stimulating bedtime routine) could drive both. The safe wording is that more phone use "is associated with" poorer sleep. Check: the mean 7373 lies inside the range 5252 to 9494, as a mean must, so the calculations are consistent.

exam5 marksA market stall records daily rainfall (xx, in mm) and umbrellas sold (yy) on six days: (10,8)(10, 8), (25,15)(25, 15), (40,23)(40, 23), (55,30)(55, 30), (70,38)(70, 38), (85,9)(85, 9). (a) For the first five days, describe the form and direction of the association. (b) Identify the outlier and estimate the sales you would expect there from the trend. (c) Suggest a plausible cause for the outlier. (d) State whether you would keep or remove the outlier and why.
Show worked solution →

Part (a) the trend in the first five days. Sales rise from 88 to 3838 as rainfall rises from 1010 mm to 7070 mm, climbing by roughly 77 to 88 each step, so the form is linear and the direction is positive.

Part (b) the outlier and its expected value. The point (85,9)(85, 9) breaks the rising pattern: at the highest rainfall the sales are almost the lowest. Continuing the trend of about 7.57.5 extra sales per step past (70,38)(70, 38) gives

38+7.545 to 46 umbrellas38 + 7.5 \approx 45 \text{ to } 46 \text{ umbrellas}

so around 4545 would be expected at 8585 mm, far above the recorded 99.

Part (c) cause. A plausible cause is a genuine atypical event, such as the stall running out of stock, closing early, or a data-entry slip swapping the figure.

Part (d) keep or remove. If it is a recording error it should be corrected or removed; if it is a real event (sold out) it should be kept and reported with a note, since it is genuine. Check: 4545 sits just above the previous value 3838 on the steady rising trend, confirming 99 is a true outlier.

exam4 marksA scatterplot of weekly revision time (xx, in hours) against exam mark (yy) gives the points (2,40)(2, 40), (4,55)(4, 55), (6,66)(6, 66), (8,74)(8, 74), (10,79)(10, 79), (12,81)(12, 81), (14,80)(14, 80). (a) Find the increase in mark from each point to the next. (b) Use these increases to explain why the form is non-linear rather than linear. (c) Explain why fitting a straight line and quoting a correlation coefficient could be misleading here.
Show worked solution →

Part (a) step-by-step increases. Subtracting each mark from the next gives

15,11,8,5,2,115, \quad 11, \quad 8, \quad 5, \quad 2, \quad -1

Part (b) why the form is non-linear. For a linear (straight-line) relationship the increases would be roughly constant. Here they shrink steadily from 1515 down to 22 and then turn negative, so the marks rise quickly at first, then level off and dip. A pattern that flattens like this is non-linear (a curve), not a straight line.

Part (c) why a straight line misleads. The correlation coefficient and the least-squares line both assume a linear pattern. Fitting them to a curve would understate the strong early gains and the later flattening, and the correlation coefficient would not capture the true curved shape. Check: equal 22-hour steps in xx producing ever-smaller gains in yy is the hallmark of a curve, confirming a straight-line summary is the wrong tool.

ExamExplained