What you need to know
Types of Data
We can classify data in a few different ways. First of all, we can classify data into different types by looking at what form it takes:
Continuous data can take any value. We usually obtain continuous data by measuring. Examples:
Discrete data can only take certain values. We usually obtain discrete data by counting. Examples:
- number of children
- shoe size
Categorical data is anything that isn’t a number. We usually obtain categorical data by conducting a survey.
A second way that we can classify data concerns how it is collected.
• Primary data is data that you collect first-hand.
– Examples: surveying members of the public; measuring the heights of your classmates.
o You know how the data has been collected and can choose to collect it in the most appropriate way.
o You can ensure your sample is representative and/or accurate.
o It can be very time-consuming.
o Getting the resources (e.g. printing off questionnaires), and possibly paying people to take part if your sample can be expensive.
• Secondary data is data that has already been collected by someone else.
– Examples: the results of a questionnaire that have been posted on the internet; statistics published in a newspaper.
o It takes much less time than collecting data yourself.
o Secondary data is either free, or at least much cheaper than collecting the data yourself.
o The data available might not be suitable for your purposes.
o You don’t know how the data was collected, meaning you can’t be sure that it is representative or fair.
It’s important to understand that we can combine the two classifications we’ve seen. In other words, data can be continuous and primary / secondary, or categorical and primary / secondary.
Example 1: Types of Data
Janet wants to learn some information about Jason. She learns three pieces of new information about him. For each one, state whether the data is discrete, continuous, or categorical.
a) His hair colour.
b) How many siblings he has.
c) How fast he can solve a Rubik’s cube.
a) His hair colour might be brown, blonde, ginger, but in any case, it’s not a number. Therefore, this is categorical data.
b) The number of siblings Jason has can only be a whole number, meaning it can’t take any value. Therefore, this is discrete data.
c) How fast he can solve a Rubik’s cube is a matter of time – he/she will have had to measure how fast he can solve the puzzle, and the result could be any value. Therefore, it is continuous data.
Note: you may be thinking that when measuring time, you are limited to only certain values because maybe your timer only shows to the nearest hundredth of a second. This is true, but this does not change the fact that time is continuous – it’s just a limitation of the timer. In theory, the time could be as specific as we want if our timer is good enough, so time could take any value.
Example 2: Types of Data
Chidi wants to gather some data on people’s favourite type of food. He decides to use a survey he found online that was conducted 10 years earlier where 200 people were asked if their favourite food was Italian, Chinese, Indian, Thai, or French.
a) State which two of the following words describes the data Chidi is using:
primary, secondary, categorical, discrete, continuous
b) State one advantage and one disadvantage of Chidi choosing to use this type of data.
a) The data is regarding people’s favourite type of food. Since this is not numerical, it must therefore be categorical. Secondly, he is using data that was collected by someone else – this is secondary.
b) One advantage of Chidi using this data is that he saves a lot of time that he would have had to use collecting it himself.
One disadvantage is that Chidi doesn’t know how the data was collected, so it might not be representative of what he wants to know.
Additionally, in this case the data is 10 years old and people’s preferences might have changed a lot in that time. Furthermore, the survey he found only gave people 5 choices – it might be excluding a particular cuisine which is currently very popular such as Mexican.
1) State whether the data for the following is categorical, discrete or continuous:
a) The heights of 12 dogs.
b) The lengths of 15 snakes.
c) The eye colours of students in a class.
d) The number of goals scored by members of the school’s football team.
a) Since the heights of dogs can be of any value (including numbers that are not whole numbers), this data is continuous.
b) Since the lengths of snakes can be of any value (including numbers that are not whole numbers), this data is continuous.
c) Since the data collected will be in the form of words (blue, green, brown etc.), this data is categorical.
d) Since goals can only be counted in whole numbers, this data is discrete.
2) Eleanor is measuring the length of everyone in her class’s hair.
a) whether this data is primary or secondary.
b) whether this data is categorical, discrete, or continuous.
a) Since Eleanor is measuring the data herself, the data is primary.
b) She is measuring hair length, which can have any value, including values that are not whole numbers, so the data is continuous.
3) Tahani says:
“People’s shoe sizes are based on the length of their feet, and since length is continuous, shoe size must also be continuous.”
Explain why Tahani is wrong.
Tahani is wrong because although a shoe size is based on foot length, the length of a person’s foot can be of any value, whereas shoe sizes have limited values (5, 5 and a half, 6, 6 and a half etc.).
4) Michael wants to collect information from families in his town about the number of children they have. He chooses to question people directly to obtain this data.
a) State whether his data will be primary or secondary.
b) Give two advantages of choosing to use this type of data.
a) Since Michael is collecting the data himself, it is primary data.
b) By collecting the data himself, he can ensure that the numbers are all accurately recorded.
A second advantage is that he can make efforts to make sure his sample is representative (he can ask people of different genders, races, ages, etc.). If he was using secondary data, he would have no control over who was being asked.
Note: other correct advantages are acceptable.
5) Steve wants to obtain data from his 30 classmates about the performance of the Tottenham striker, Harry Kane, in a recent match.
Half of them are allowed to choose from the following six options:
- “The worst performance I have ever witnessed from any player ever!”
- “He had a nightmare!”
- “A below average performance.”
- “Not his fault the team lost.”
- “I wish he could play like that every week!”
- “No player in the world could have performed better than that!”
The other half of the class are asked to give him a rating out of 10.
a) Is the data that Steve obtains from the first set of data categorical, discrete quantitative or continuous quantitative data?
b) Is the data that Steve obtains from the first set of data qualitative, discrete quantitative or continuous quantitative data?
c) State two disadvantages for collecting data qualitatively in this example.
a) Since the data that Steve collects from the first half of the class is worded data, this is categorical data.
b) Since the data that Steve collects from the second of the class is a number, this is quantitative data. Since the data can only take certain values (numbers between 1 and 10), the data is discrete quantative data.
c) The first disadvantage of collecting data in this way is that it is harder to analyse. It is much easier to analyse numerical data than worded data.
The second disadvantage is that there are only 6 options for the worded responses, whereas there are ten options for numbered responses between 0 and 10.
Worksheets and Exam Questions
Learning resources you may be interested in
We have a range of learning resources to compliment our website content perfectly. Check them out below.