Scatter Graphs Worksheets | Questions and Revision | MME

# Scatter Graphs Worksheets, Questions and Revision

Level 1 Level 2 Level 3

## Scatter Graphs

Scatter graphs are a tool that we use to display data with two variables

Having a good knowledge of gradients of lines will help you to understand correlation which forms a large part of the scatter graph topic.

## Correlation

The aim of drawing a scatter graph is to determine if there is a link or relationship between the two variables that have been plotted. If yes, then we say there is correlation.

There are two types of correlation:

Positive correlation – as one variable increases, the other one also increases.

Negative correlation – as one variable increases, the other decreases.

We can also comment on how strong the correlation is. If all the points are very closely aligned (either in negative or positive correlation), then we say that there is strong correlation. If there is correlation but the points are quite spread out, we say that there is weak correlation

## Drawing the Line of Best Fit

A line of best fit is supposed to represent the correlation of the data. In other words, the line of best fit gives us a clear outline of the relationship between the two variables, and it gives us a tool to make predictions about future data points.

It helps a lot to have a clear ruler and sharp pencil when drawing a line of best fit. You should draw the the line of best fit so that it goes through the middle of all the scatter points on a graph, and usually has an equal number of points on either side of the line.

## Example 1: Plotting Scatter Graphs

Below is a table of 11 student’s scores out of 100 on their Maths and English tests. Plot a scatter graph from this data.

As stated, we put the Maths mark on the $x$-axis and the English mark on the $y$-axis. It doesn’t matter which way round these things go, as long as you draw the graph correctly.

Then, we plot each individual student’s Maths mark against their English mark in the way that we normally plot coordinates. The resulting graph is shown below

Looking at this graph, we can see that, in general, as people’s grade in Maths increase, their grades in English tend to decrease. So, there is negative correlation between Maths and English grades. You could even say there is strong negative correlation as the gradient s steep.

## Example 2: Using the Line of Best Fit

Use the following scatter graph to predict the English mark of someone who managed a mark of 60 in Maths.

To do this, we draw a straight, vertical line (orange) up from 60 on the Maths axis until we hit the line of best fit. Then, we draw a horizontal line (also orange) across from that point to the English axis. It touches that axis at 50, so 50 is the predicted English grade.

The point (50, 60) on the graph is right in the middle of the data collected. When we make predictions within the range of our data like this, it is known an interpolation and is generally quite reliable. When we make predictions outside of the range of our data, that is known as extrapolation and is less reliable as we don’t know if or when a correlation will change.

### Example Questions

a) In general, we can see that as the $x$ variable increases, the $y$ variable also increases. This indicates that there is a positive correlation. Since all the points are close together in a straight line (if you draw in a line of best fit, then the points would not be far from this line), this graph has strong positive correlation.

b) There is no clear pattern here (the points seem to be scattered at random), so this graph has no correlation.

c) As the $x$ variable increases, the $y$ variable decreases, so there is a negative correlation. Since all the points are reasonably close to the line of best fit, this graph has moderate negative correlation.

a) The results of plotting the ten points on a graph should look like the below:

As far as the scale is concerned, since the heights and weights do not start from 0, it would be a good idea to jump to an appropriate starting point, such as 150 cm for the height and 60 kg for the weight.

b) The line of best fit will cut through what you believe to be the middle of all the points and should be similar to the green line below:

To predict the weight of someone with a height of 190cm, locate 190 on the horizontal $x$-axis and draw a vertical line up to your line of best fit. Then draw across from this point to the corresponding value on the $y$-axis. The prediction, according to this line of best fit, is 95kg. This is shown with the red lines below:

(Your line of best fit may be slightly different, in which case any answers between 93kg and 97kg are acceptable.)

a) Since time is the first row of the data table, time should be on the $x$-axis. As far as your scale is concerned, on the $x$-axis, it would make sense to go up in increments of one or two minutes. On the $y$-axis, you can use can go up in increments of 10 or 20 degrees. (Since the temperature starts at $45\degree$ and goes up to $95\degree$, you could start your temperature at $40\degree$ instead of $0\degree$ if you prefer.)

Your completed scatter graph should look like the below:

b) In order to work out what type of correlation there is, we need to draw in a line of best fit. Draw a line that cuts through the middle of as many of the dots as possible. As the $x$ variable increases, the $y$ variable decreases, so there is a negative correlation. Since all the points are very close to the line of best fit, this graph has strong negative correlation.

c) Since the $y$ variable decreases as the $x$ variable increases, this tells us that the temperature of the cup of tea is reducing over time (the cup of tea is getting colder over time).

d) In order to find an estimate for the temperature of the cup of tea after 6 minutes, we need to locate 6 minutes on the $x$-axis and draw a vertical line from 6 until it touches your line of best fit, then draw a horizontal line to the left to find the corresponding temperature value. The estimated temperature is $66\degree$. (A couple of degrees above or below 66 will be acceptable since drawing the line of best fit is never that precise.)

e)  It would be inappropriate to find an estimate for the temperature after 45 minutes as 45 minutes is beyond the range of the data.

(The temperature of the cup of tea will not continue to drop forever. After a certain amount of time, the cup of tea will reach room temperature and will stay at this temperature. This could be before 45 minutes have elapsed.)

a) Since time is the first row of the data table, time should be on the $x$-axis.

As far as your scale is concerned, on the $x$-axis, it would make sense to use one box to represent 10 kilograms in weight. Since there are no values between 0 and 50, I would start the scale from 50. On the $y$-axis, you can use one large box, or even two large boxes for every 10 minutes.

Your completed scatter graph should look like the below:

b) As the $x$ variable increases, the $y$ variable increases, so there is a positive correlation. Since all the points are very close to the line of best fit, this graph has strong positive correlation.

c) Since the $y$ variable increases as the $x$ variable increases, this tells us that the time taken to run 5 kilometres is greater for a heavier runner (the heavier the runner, the slower the time / the lighter the runner, the quicker the time).

d) It would be inappropriate to find an estimate for the time taken for a runner of 40 kilograms since 40 kilograms is beyond the range of the data.

Although the data suggests that the lighter you are, the quicker you are, but there has to be a limit. Taking this to a ridiculous extreme, a person could practically starve themselves to death, but this would not likely make them a fast runner!

a) Since the top row of the data is the number of hours of revising, this should be plotted on the $x$-axis and the exam score should be plotted on the $y$-axis. As far as the scale is concerned, the hours range from 0 to 25, so it would be sensible to use either a large box (or half a large box) for every 5 hours. As far as the exam results are concerned, they range from 20 to 85, so it would be sensible to start at 0 and finish at 90 or 100, with each large box representing a change of 10 marks.

Your completed scatter graph should look like the below:

b) In order to work out the type of correlation shown, you will need to draw in a line of best fit. This line should go through the middle of as many of the points as possible. Since the line of best fit goes up and, generally, the points are close to this line, there is a positive correlation.

c) Since the $y$ values (the exam scores) increase as the $x$ values (time spent revising) increase, we can say that the more revision you do, the better your exam score is likely to be.

d) An outlier is any point which is a long way from the line of best fit. From the completed scatter graph above, we can see that there is one point which is much further from the line of best fit than any of the other points, and this is the point that corresponds to the student who scored 85 with only 6 hours of revision.

(You don’t even need the scatter graph to see that this point is the outlier as it is quite clear in the table too; the student achieved the best result of the group with barely any revision.)

e) For this question, we need to locate 22 hours on the $x$-axis, draw a line vertically up until it touches the line of best fit, and then draw a line across to the corresponding exam score on the $y$-axis. As you can see below, 22 hours of revision corresponds to an exam score of approximately 73 marks. (This is an approximation, so a score that is a couple marks above or below this is still acceptable.)

f) It would be inappropriate to find an estimate for an exam score for a student doing 85 hours of revision as this is beyond the range of the data.

(Presumably the exam has a maximum score, which might be achievable with, say, 40 hours of revision. Doing 85 hours of revision will not allow you to score greater than the maximum possible score.)

Level 1-3

Level 1-3

### Learning resources you may be interested in

We have a range of learning resources to compliment our website content perfectly. Check them out below.