What you need to know
Scatter graphs are a tool that we use to display data with two variables. For example, you might collect data how each of the people in your class performed in their Maths and English tests. To plot a scatter graph from this data, you would firstly draw a pair of axes with Maths grades on the x-axis and English grades on the y-axis. Then, each student’s pair of grades forms a pair of coordinates that we can plot on our Maths/English graph. The result of plotting all the students’ grades is a scatter graph. Let’s see an example.
Example: Below is a table of 11 student’s scores out of 100 on their Maths and English tests. Plot a scatter graph from this data.
As stated, we put the Maths mark on the x-axis and the English mark on the y-axis. It doesn’t matter which way round these things go, as long as you draw the graph correctly.
Then, we plot each individual student’s Maths mark against their English mark in the way that we normally plot coordinates. The resulting graph is on the right.
The aim of drawing a scatter graph is to determine if there is a link or relationship between the two variables that have been plotted. If yes, then we say there is correlation.
There are two types of correlation:
– Positive correlation – as one variable increases, the other one also increases.
– Negative correlation – as one variable increases, the other decreases.
Looking at this graph, we can see that, in general, as people’s grade in Maths increase, their grades in English tend to decrease. So, there is negative correlation between Maths and English grades.
We can also comment on how strong the correlation is. If all the points are very closely aligned (either in negative or positive correlation), then we say that there is strong correlation. If there is correlation but the points are quite spread out and not clearly in line, we say that there is weak correlation. If the reality is somewhere in between, then there is moderate correlation.
In the example above, we might say there is strong negative correlation. On the contrary, the picture to the left is an example of weak positive correlation.
It helps a lot to have a clear ruler when doing this – it makes it a lot easier to make sure your line is right where you want it. Then, once you’ve drawn the line, check how many points fall on either side of the line. If the number is roughly the same for both sides, then that’s a positive sign.
The line of best fit for the scatter graph above has been drawn in green onto the picture on the left. Counting the points, we can see that there are 6 points underneath the line and 5 above it, so that’s all good.
Now, what is the point of this? Well, a line of best fit is supposed to represent the correlation of the data. In other words, the line of best fit gives us a clear outline of the relationship between the two variables, and it gives us a tool to make predictions about future data points. Note: a positive correlation will give a line of best fit with positive gradient. The same goes for the negative case.
On this note, the question has asked us to predict the English mark of someone who managed a mark of 60 in Maths. To do this, we draw a straight, vertical line (orange) up from 60 on the Maths axis until we hit the line of best fit. Then, we draw a horizontal line (also orange) across from that point to the English axis. It touches that axis at 50, so 50 is the predicted English grade.
The point (50, 60) on the graph is right in the middle of the data collected. When we make predictions within the range of our data like this, it is known an interpolation and is generally quite reliable. When we make predictions outside of the range of our data, that is known as extrapolation and is risky – you don’t know how the pattern of the data might change outside that range. For example, if you were to use this line of best fit to predict the Maths mark of someone who got 2 on their Maths exam, you would predict them to get over 100 on their English exam, which is impossible. Extrapolation can be useful, but it is important to be wary of the risk.
1) For each of the scatter graphs below, state whether or not there is correlation and if so, state the strength and type of that correlation.
A) One variable is increasing with the other, and all the points are close together in a straight line, so this graph has strong positive correlation.
B) There is no clear pattern here, so this graph has no correlation.
C) One variable is decreasing as the other is increasing, and all the points are neither very close together nor are they far apart, so this graph has moderate negative correlation.
2) Rey recorded the heights and weights of her students in the table below.
a) Draw a scatter graph of this data and state the type and strength of correlation.
b) Draw a line of best fit and use it to predict how much someone who is 190cm would weigh. Explain why this might not be an accurate prediction.
a) The results of plotting the points on a graph should look like the picture below.
b) The line of best fit should look like the green line below.
To predict the weight of someone who’s 190cm tall, draw a line up from 190cm to the line of best fit, and then across from that line over to the weight axis. The prediction, according to this line of best fit is 95kg. Your line of best fit may be slightly different, in which case any answers between 93kg and 97kg are acceptable.
This might not be accurate because we are extrapolating, and the pattern outside the range of the data might change.