What you need to know
We can classify data in a few different ways. Firstly, we can classify data into different types by looking at what form it takes.
• Continuous data can take any value. We usually obtain continuous data by measuring.
o Examples: height, weight, time.
• Discrete data can only take certain values. We usually obtain discrete data by counting.
o Examples: number of children, shoe size.
• Categorical data is anything that isn’t a number. We usually obtain categorical data by conducting a survey.
o Examples: favourite colour, make of car, favourite sport.
Example: Janet wants to learn some information about Jason. She learns three pieces of new information about him. For each one, state whether the data is discrete, continuous, or categorical.
a) His hair colour.
b) How many siblings he has.
c) How fast he can solve a Rubik’s cube.
a) His hair colour might be brown, blonde, ginger, but in any case, it’s not a number. Therefore, this is categorical data.
b) The number of siblings Jason has can only be a whole number, meaning it can’t take any value. Therefore, this is discrete data.
c) How fast he can solve a Rubik’s cube is a matter of time – he/she will have had to measure how fast he can solve the puzzle, and the result could be any value. Therefore, it is continuous data.
Note: you may be thinking that when measuring time, you are limited to only certain values because maybe your timer only shows to the nearest hundredth of a second. This is true, but this does not change the fact that time is continuous – it’s just a limitation of the timer. In theory, the time could be as specific as we want if our timer is good enough, so time could take any value.
A second way that we can classify data is regarding how it is collected.
• Primary data is data that you collect first-hand.
o Examples: surveying members of the public; measuring the heights of your classmates.
• Secondary data is data that has already been collected by someone else.
o Examples: the results of a questionnaire that have been posted on the internet; statistics published in a newspaper.
Both of these types of data come with their own advantages and disadvantages.
o You know how the data has been collected and can choose to collect it in the most appropriate way.
o You can ensure your sample is representative and/or accurate.
o It can be very time-consuming.
o Getting the resources (e.g. printing off questionnaires), and possibly paying people to take part if your sample can be expensive.
o It takes much less time than collecting data yourself.
o Secondary data is either free, or at least much cheaper than collecting the data yourself.
o The data available might not be suitable for your purposes.
o You don’t know how the data was collected, meaning you can’t be sure that it is representative or fair.
It’s important to understand that we can combine the two classifications we’ve seen. In other words, the data can be continuous and secondary, or categorical and primary.
Example: Chidi wants to gather some data on people’s favourite cuisine. He decides to use a survey he found online that was conducted 10 years earlier and asked 200 people which cuisine of Italian, Chinese, Indian, Thai, and French was their favourite.
a) State which two of the following words describes the data Chidi is using:
primary, secondary, categorical, discrete, continuous
b) State one advantage and one disadvantage of Chidi choosing to use this type of data.
a) The data is regarding people’s favourite type of food – this is not numerical, so must be categorical. Secondly, he is using data that was collected by someone else – this is secondary.
b) One advantage of Chidi using this data is that he saves a lot of time that he would’ve had to use collecting it himself.
One disadvantage is that Chidi doesn’t know how the data was collected, so it might not be representative of what he wants to know. Additionally, in this case the data is 10 years old and people’s preferences might have changed a lot in that time. Furthermore, the survey he found only gave people 5 choices – it might be excluding a particular cuisine which is currently very popular such as Mexican.
1) Eleanor is measuring the length of everyone in her class’s hair. State
a) Whether this data is primary or secondary, and
b) Whether this data is categorical, discrete, or continuous.
a) Eleanor is measuring it herself, so the data is primary.
b) She is measuring the length, and technically length can take any value, so the data is continuous.
2) Tahani says
“people’s shoe sizes are based on the length of their feet, and since the length is continuous, shoe size must also be continuous”
Explain why Tahani is wrong.
Whilst it’s true that shoe size is dependent on how long your feet are, shoe size cannot take any value. It is limited to whole numbers and sometimes halves (5, 5 and a half, 6, 6 and a half, etc). Therefore, it must be discrete, so Tahani is wrong.
3) Michael wants to collect information about how many children the people in his town have. He chooses to question people directly about the number of children they have.
a) State whether his data will be primary or secondary.
b) Give two advantages of choosing to use this type of data.
a) He is collecting the data himself, so it is primary.
b) One advantage of this is that he is collecting himself so can make sure the numbers are all accurately recorded.
A second advantage is that he can make efforts to make sure his sample is representative, i.e. he can ask people of different genders, races, ages, etc, whereas he wouldn’t be able to select for that if the data were secondary.
Note: other correct advantages are acceptable.