## What you need to know

A boxplot (sometimes also called a ‘box and whisker diagram’) is one of the many ways we can display a set of data that we’ve collected, something like the shoe sizes of a group of people. Here’s an example of what that might look like.

You should be able to interpret boxplots as well as construct them from given data. We’ll go through both of these things. First up, understanding what all the bits mean.

As you can see in the diagram, the boxplot has 5 important components. From these, we can learn a lot about a data set. Firstly, we have the smallest value and largest value which are fairly self-explanatory; they are found at the end of the ‘whiskers’ and denote the smallest shoe size that any one person who was surveyed had and the largest, respectively. From this boxplot we learn that the smallest shoe size was 1.5 and the largest was 13, and from this we can calculate the **range** of this data set:

\text{Range } = \text{largest value } - \text{ smallest value } = 13 - 1.5 = 11.5

Range is one way of measuring the **spread** of the data, for more information head over here (mean median mode and range revision).

There is another measure of spread we can find from a boxplot known as the **interquartile range **(or **IQR**). To do this, we need the quartiles – so-called because they split the data up into quarters, i.e. between the smallest value and Q_1 you will find one quarter of the data, as you would if you looked between Q_1 and Q_2. To find the interquartile range we subtract the lower quartile from the upper quartile. So, reading from the boxplot we get:

\text{Interquartile range } = Q_3 - Q_1 = 10 - 4 = 6

The range is actually not a very good way of determining how spread out the data is at all, but fortunately the interquartile range is much better. Mainly for the reason that it is unaffected by any outliers – data points which sit far away from all the other. Remember this, as it’s a common question of why the IQR is the preferred measure of spread.

Finally, there is also the median (sometimes referred to as Q_2, though not very often. The median is one way of determining the average of a set of a data (more stuff on the median over here (mean median mode and range revision, and it refers to the value that is right in the middle of the data. The median is the central line in the boxplot, so we simply read it off to be 8.

This is how we read a boxplot, now we’re going to try making one.

**Example: **Construct a boxplot for the following data set.

3, 5, 8, 8, 9, 11, 12, 12, 13, 13, 16

Finding the largest and smallest terms is easy: they are 3 and 16 respectively. Finding the lower quartile, median, and upper quartile is bit more effort. To find the median, you may recall that if we add 1 to the total number of objects and then divide by 2, that tells us where the median. In general, we say the median is the \frac{n + 1}{2}th term, where n is the total number of objects we’re working with.

We divide by 2 because the median is half way through the data. Considering that the lower quartile is a quarter of the way through and the upper quartile is 3 quarters of the way through, we get:

– The lower quartile is the \dfrac{n + 1}{4}th term,

– The upper quartile is the \dfrac{3(n + 1)}{4}th term.

This set of data contains 11 numbers, so we get the following.

The median is the \dfrac{11 + 1}{2} = 6\text{th} term, so \text{median } = 11.

The lower quartile is the \dfrac{11 + 1}{4} = 3\text{rd} term, so Q_1 = 8.

The upper quartile is the \dfrac{3(11+1)}{4} = 9\text{th} term, so Q_3 = 13.

Now we have all the information we need to draw a boxplot. Recall where everything goes from the picture above, and the result looks like this.

One more thing you may be asked to do is compare two box plots of the same information collected about two different groups, e.g. the heights of men compared to the heights of women (see question 3 below for another). When doing this, all you need to do is state which one has a greater spread than the other (by looking at the IQR and/or the range) and which one has a higher average (by looking at the median). This might seem easy (because it is, which is nice), but it’s only easy because boxplots are a useful tool and they make comparisons like this easy.

### Example Questions

1) The boxplot below was constructed from a collection of times taken to run a 100m sprint. Using the boxplot, determine the range and interquartile range.

For the range, we need to subtract the smallest value from the largest. From the graph, we can see that the smallest value is 10 and the largest is 15.8, so:

\text{Range } = 15.8 - 10 = 5.8\text{ seconds}.

For the interquartile range, we need to subtract the lower quartile from the upper quartile. From the graph, we can see that the lower quartile is 10.5 and the upper quartile is 12.4, so:

\text{Interquartile range } = 12.4 - 10.5 = 1.9\text{ seconds}.

2) A class of students sat an exam. Their total marks, out of a possible 100, were recorded. The following are facts about the data recorded on the students’ scores. Use this information to construct a complete boxplot.

- \text{Largest value }= 92

- \text{Lower quartile } = 73

- \text{Median } = 81

- \text{Range } = 21

- \text{IQR } = 11

We have lots of information, but there are a few things missing that we need. Specifically, we need the upper quartile and the lowest value.

The range is the smallest value subtracted from the largest value, so if we subtract the range from the largest value we will get the smallest value.

\text{Smallest value } = 92 - 21 = 71

The IQR is the lower quartile subtracted from the upper quartile, so if we add the IQR to the lower quartile, we get the upper quartile.

\text{Upper quartile } = 73 + 11 = 84

Now we have all the information we need to plot the boxplot. Plot the box? Box the plot? You know what I mean. Your result should look a little like this.

3) The reaction times (in milliseconds) of a group of 20-year-olds and a group of 30-year-olds were tested. The reaction times for the 20-year-olds has been plotted below. The reaction times for the 30-year-olds are as follows.

220, 252, 256, 312, 332, 332, 400

Construct a boxplot for this set of the data and note two differences between the two groups.

To construct the boxplot, we need the smallest value, largest value, median, and the lower and upper quartiles. We have the first two, they are 220 and 400, and we’ll have to work out the others.

The median is the \dfrac{7+1}{2} = 4\text{th} term, which is 312.

The lower quartile is the \dfrac{7+1}{4} = 2\text{nd} term, which is 252.

The upper quartile is the \dfrac{3(7+1)}{4} = 6\text{th} term, which is 332.

Now we have all the necessary details, the resulting boxplot should look like this.

Comparing the two box plots, we can see that the second one has a higher median, meaning that the 30-year-olds were on average slower at reacting than the 20-year-olds.

Additionally, we can see that the IQR is greater for the 30-year-olds than it is for the 20-year-olds (because they’re on the same scale, looking at one on top of the other, we can see this without even calculating it), which means that the reaction times for 30-year-olds are more spread out than those for 20-year-olds.

### Worksheets and Exam Questions

### Videos

#### Box Plots Q1

GCSE MATHS#### Box Plots Q2

GCSE MATHS#### Box Plots Q3

GCSE MATHS### Other worksheets

## Box Plots Revision and Worksheets

### Learning resources you may be interested in

We have a range of learning resources to compliment our website content perfectly. Check them out below.