Back to the full dot-point answer
NSWMaths AdvancedQuick questions
Year 12: Statistical Analysis
Quick questions on Bivariate data: scatter plots, Pearson correlation and least-squares regression for HSC Maths Advanced
14short Q&A pairs drawn directly from our worked dot-point answer. For full context and worked exam questions, read the parent dot-point page.
What is scatter plots?Show answer
A scatter plot displays paired data $(x_i, y_i)$ as points in the plane. Read it for:
What is pearson's correlation coefficient?Show answer
The Pearson correlation coefficient $r$ measures the strength and direction of the linear relationship between two variables. It is defined by
What is the least-squares regression line?Show answer
The least-squares regression line of $y$ on $x$ is the line $y = a + b x$ that minimises the sum of squared vertical residuals $\sum (y_i - (a + b x_i))^2$. The solution is
What is prediction, interpolation and extrapolation?Show answer
Once you have $y = a + b x$, substitute any $x$ to predict $y$. Prediction inside the observed range of $x$ is called interpolation and is usually safe. Prediction outside the observed range is extrapolation and is risky: the linear pattern may not continue.
What is correlation is not causation?Show answer
A strong $r$ tells you the two variables move together. It does not establish that one causes the other. Lurking variables, reverse causation, and pure coincidence can all produce strong correlations.
What is reading a scatter plot?Show answer
A plot of height (cm) against shoe size has points rising from lower left to upper right, tightly clustered. Direction positive, form linear, strength strong, no obvious outliers. Estimate $r \approx 0.9$.
What is computing $r$ and the regression line?Show answer
Suppose a small dataset gives $\bar{x} = 5$, $\bar{y} = 20$, $s_x = 2$, $s_y = 6$ and $r = 0.8$.
What is interpreting slope and intercept?Show answer
For a regression of exam mark $y$ on hours studied $x$ with $y = 44 + 2 x$:
What is spotting an outlier?Show answer
A scatter plot of weight on height has one point well above the line. That point pulls the regression line upward and inflates the residual. Refitting without it will usually increase $|r|$ and shift the slope. Outliers should be checked for data entry errors before any decision to remove.
What is confusing strong with steep?Show answer
A nearly horizontal line through tightly clustered points still has $|r|$ close to $1$. Strength is about closeness to the line, not slope size.
What is treating a low $r$ as no relationship?Show answer
$r$ measures only linear association. A clear curved pattern can give $r \approx 0$.
What is extrapolating without warning?Show answer
Predicting $y$ for $x$ values far outside the data range can give nonsense (negative weights, marks above $100$). Always check the prediction sits inside the data range, or flag the caveat.
What is swapping the slope formula?Show answer
It is $b = r \cdot \frac{s_y}{s_x}$, not $r \cdot \frac{s_x}{s_y}$. The units must work out: rise over run.
What is claiming causation?Show answer
"Correlation does not imply causation" is a standard one-mark response to any question that asks what $r$ tells you about cause and effect.