## What you need to know

Stratified sampling is a method we can use to make our sample more representative. By separating the population into groups (age groups, genders, etc.) called strata, we can then ensure that the number of people who will be sampled from each group is proportional to the number of people in that group overall. We choose to use a stratified sample when there are significantly different numbers of people/things in each group.

For example, if you were considering hair colour, and you knew that 30% of people have brown hair, then in a stratified sample, you make sure that 30% of your sample is people with brown hair. You can calculate the number of people needed from each group using the following formula:

\text{number to be sampled from group }=\dfrac{\text{number of people in group}}{\text{size of population}}\times \text{sample size}

In this context, population means the full collection of people/things you are taking a sample from.

Example: The breakdown of ages of all visitors to the convention is given in the table below.

Fabine wants to take a stratified sample of the visitors at the convention. She chooses a sample size of 80. Calculate how many people she will need to sample from each age group.

First, we need to establish how many people were at the convention. Adding up the numbers, we get

\text{total population }=132+678+543+289+108=1,750

Now we must apply the formula shown above. The number she should sample from the 5 – 15 group is

\dfrac{132}{1,750}\times 80=6.034...

Then, for the 16 – 25 group:

\dfrac{678}{1,750}\times 80=30.994...

The 26 – 40 group:

\dfrac{543}{1,750}\times 80 = 24.822...

The 41 – 60 group:

\dfrac{289}{1,750}\times 80=13.211...

Lastly, the 61+ group:

\dfrac{108}{1,750}\times 80=4.937...

Obviously, we can’t select decimal numbers of people, so we have to round all these values to the nearest whole number. Doing so, we get that the number of people to be sampled from each group (in order) is

6,\,\,31,\,\,25,\,\,13,\,\,\text{ and }\,\,5

Additionally, we can solve some stratified sampling problems by considering ratios! The rule is: the ratio between groups in the population must be same as the ratio between groups in the sample. We’ll see in the next example of how this idea can be useful.

Example: Odette has taken a stratified sample of people who work at her company based on gender. There are 500 people at her company. The table below gives some information about sizes of the groups. Complete the table.

We know there are 500 people in the company, but not how many are in the sample. So, instead of using the formula, we’re going to consider the fact stated just above:

“the ratio of the groups in the sample must equal the ratio of the groups in the population”

This means that the values in the sample must all be scaled down by the same number (you may call this a ‘scale factor’) from the original values. Given that there are 255 females at the company and 51 in the sample, we get

\text{scale factor } = 255\div 51=5

Therefore, all the values in the “number at company” row must be 5 times bigger than their respective values in the “number in sample” row. Therefore, we get

\text{Number at company: male category } = 47\times 5 = 25\text{ people}

Similarly, considering that the sample is 5 times smaller than the population we get

\text{Number in sample: other category }=10 \div 5= 2\text{ people}

Therefore, the completed table will look like

### Example Questions

1) Ana wants to conduct a survey on the eating habits of people in her village. She is planning on taking a stratified sample of 200 people based on their annual income.

The table below outlines the population of her village separated by their incomes.

Calculate the number of people she should sample from each income range.

Firstly, we need to work out the total population.

\text{Total population }=1,354+3,480+3,776+1,865+430=10,905

Now, we can use the formula to see how big our sample from each group should be

\text{0 - 14,999 group: }\dfrac{1,354}{10,905}\times 200=24.832...=25 \text{ people}

\text{15,000 - 24,999 group: }\dfrac{3,480}{10,905}\times 200=63.823...=64 \text{ people}

\text{25,000 - 34,999 group: }\dfrac{3,776}{10,905}\times 200=69.252...=69 \text{ people}

\text{35,000 - 49,999 group: }\dfrac{1,865}{10,905}\times 200=34.204...=34 \text{ people}

\text{50,000 group: }\dfrac{430}{10,905}\times 200=7.886...=8 \text{ people}

So, the number of for each group, in order, is

25,\,\,64,\,\,69,\,\,34,\,\,text{ and }\,\,8

Recall: the ratio of the groups in the sample must equal the ratio of the groups in the population. The consequence of this is that all the values in the sample must be scaled down by the same value. So, given that we know the population and sample value for football, we get

\text{scale factor }=196\div 28=7

Meaning that each of the population values must be 7 times bigger than the sample values. So, we get that

\text{number in sample for rugby }=91\div 7=13\text{ people}

\text{number in sample for basketball }=19\times 7=133\text{ people}

Therefore, the completed table looks like