Correlation and Regression

Correlation and Regression

A LevelAQAEdexcelOCREdexcel 2022

Correlation and Regression

Correlation and regression both pertain to data measured in pairs – called bivariate data. Correlation is a measure of how closely linked the two data sets are, and how they affect each other. Regression is the line of best fit.

A Level AQA Edexcel OCR

Correlation and Scatter Graphs

When we have bivariate data, one variable will be the independent (or explanatory) variable, and another will be the dependent (or response) variable. The independent variable is the one that you can control, and goes on the x axis. The dependent variable is the one that is being affected, and it goes on the y axis.

There won’t always be a clear independent and dependent variable, and in that case it is not as important which way round they go on a scatter graph.

Correlation comes in three flavours: positive, negative and no correlation. In positive correlation, as one variable goes up so does the other. In negative correlation, as one variable goes up the other goes down. In no correlation, there is no clear link between the variables.

Correlation can also be strong or weak. In strong correlation, the data is very close to forming a line. In weak correlation, the data is not close to forming a line.

Outliers look obvious on a scatter graph. They can be ignored in subsequent calculation – but if you do plan to ignore an outlier make sure you clearly mark it on the graph as such.

You should also be aware of clusters. This is where the data forms several separate groups on the graph. We can talk about overall correlation and correlation in clusters. For example, in the graph on the right, there is negative correlation overall but positive correlation in the clusters.

A LevelAQAEdexcelOCR

Regression

The regression line of \mathbf{y} on \mathbf{x} is the line of best fit. It is always written in the same form:

y=a+bx

a is the y intercept

b is the gradient

The regression line can be used to predict values of the dependent variable. This comes in two flavours:

  • Interpolation – If the value of x being used in the prediction falls inside the range of the values of x in the data. The predicted value should be reliable.
  • Extrapolation – If the value of x being used in the prediction falls outside of the range of the values of x in the data. The predicted value might be unreliable.
A LevelAQAEdexcelOCR
A Level Edexcel

Regression with Coded Data

Regression can also be done on coded data. All we do is we substitute the coding into our regression line, then rearrange to get it back in straight line form.

Example: y=4+10x, with coding s=2y, t=x-3 becomes:

\begin{aligned}\dfrac{s}{2}&=4+10(t+3)\\[1.2em]&=4+10t+30\\[1.2em]&=34+10t\end{aligned}

s=68+20t

 

Also, we can form regression lines from non-linear data in some cases.

Example: y=ax^{n} becomes log(y)=log(a)+nlog(x) which is regression in log(y) and log(x).

A LevelEdexcel
A Level AQA Edexcel OCR

Example 1: Correlation

Plot a scatter graph then describe the correlation of this data:

[4 marks]

Plot the points on the graph:

This graph shows positive correlation.

A LevelAQAEdexcelOCR

Example 2: Regression

Data is collected about the temperature of the water in a kettle in °C over time in minutes. The regression line is:

y=20+40x

What is the gradient and y-intercept, and what do these mean in the context of the experiment. How long does the kettle take to boil?

[5 marks]

Gradient is 40°C

y-intercept is 20°C

In context, this means that the water starts at a temperature of 20°C and rises by 40°C every minute.

The kettle finishes boiling at 100°C. Substitute this value into the expression:

100=20+40x

40x=80

x=2 minutes

A LevelAQAEdexcelOCR

Example Questions

Plot the points on the graph:

 

 

This graph shows positive correlation.

Plot the points on the graph.

 

 

This graph shows negative correlation.

i) a is y-intercept

b is gradient

 

ii) Negative correlation means b<0

 

iii) y=a because a is the y-intercept

 

iv) b is the gradient so it is the amount of distance further a bird travels to migrate per unit weight

a is the travel distance of a bird of 0 weight. Since birds cannot have no weight, this is not sensible.

y=8x+3

 

s=\dfrac{y}{10}+4

Rearrange to find y

s-4=\dfrac{y}{10}

y=10(s-4)

 

t=4(x+3)

Rearrange to find x

\dfrac{t}{4}=x+3

x=\dfrac{t}{4}-3

 

Substitute in the expressions for x and y

10(s-4)=8\left(\dfrac{t}{4}-3\right)+3

10s-40=2t-24+3

10s-40=2t-21

10s=2t+19

s=\dfrac{1}{5}t+\dfrac{19}{10}

This graph shows correlation in clusters.

There is negative correlation within the clusters.

There is no correlation overall.

Additional Resources

MME

Exam Tips Cheat Sheet

A Level
MME

Formula Booklet

A Level

Worksheet and Example Questions

You May Also Like...

A Level Maths Revision Cards

The best A level maths revision cards for AQA, Edexcel, OCR, MEI and WJEC. Maths Made Easy is here to help you prepare effectively for your A Level maths exams.

£14.99
View Product

A Level Maths – Cards & Paper Bundle

A level maths revision cards and exam papers for Edexcel. Includes 2022 predicted papers based on the advance information released in February 2022! MME is here to help you study from home with our revision cards and practise papers.

From: £22.99
View Product

Transition Maths Cards

The transition maths cards are a perfect way to cover the higher level topics from GCSE whilst being introduced to new A level maths topics to help you prepare for year 12. Your ideal guide to getting started with A level maths!

£8.99
View Product