Statistics notes - types of data
This is more for overview of my own than for teaching or exercise.
|
One possible typology
Ratio data (ordered, meaningful zero point, linear scale, (often) continuous)
Properties:
- ordered/monotonous (larger value means larger represented thing)
- linearly comparable, e.g. twice the distance in these numbers means twice the amount of difference in represented thing)
- meaningful zero point
- the combination of the above imply numbers are proportional: twice the number directly means twice the amount of different thing
Examples:
- weight
- length
- time amount - reaction time, hours of study, time required to run a marathon
- age
- temperature in Kelvin
- number of responses (note: overlap with discrete numeric)
- many physical measurements in general
- though not all - can depends on units and their implied zeroing. For example, Farenheit and Celcius are not zeroed according to energy
Interval data (ordered, no meaningful zero point, possible linear scale, (often) continuous)
Properties:
- ordered/monotonous (larger value means larger represented thing)
- comparability often not linear, though could be for any given case
- no particularly meaningful zero point
Interval data is quantitative data in a numbering system in which there is no sensible zero point.
This means the assumption of linear relationships may be incorrect (often the most important difference compared to ratio data)
- ...because of the zero point
- ...because the scale is arbitrary
- ...and/or because of other reasons
Discrete numeric data (ordered, linear scale, discrete)
Ordinal data (ordered, but no obvious numbering so not linear; discrete)
Properties:
- ordered/monotonous (larger value means larger represented thing)
- not linearly comparable
- not necessarily a meaningful zero point
- not proportional
Examples:
- highest level of education
- age groups ('age up to 20', 'age 20-29', 'age 30-39') (note: in this case based on ratio data)
- socioeconomic status (sort of)
- most any ranking
- questionnaire items of the 'strongly disagree to strongly agree in five steps' sort
Ordinal data is discrete and is ordered, but has no directly obvious values to put to each item,
so there there is not necessarily any linearity or a zero point.
Put another way, cases in which you can clearly rank, but can not unambiguously put a value on without introducing some subjective assumptions (such as that it is linear, equally spaced or valued, or other such problems).
In the qualitative/quantitative distinction, this sits somewhere in between.
You should generally err on the side of treating this as qualitative data,
and you should assume that applying quantitative analyses would be a bad idea.
If you can make a good argument why and how this can be interpreted as quantitative data, it can be sensible enough for certain analyses.
A classical case where people frequently do this is "rate on a few-point scale" in questionnaires.
Nominal/categorical data (unordered; qualitative; discrete)
Nominal roughly means 'of names' (in other contexts as well), nominal values are those that are
- discrete,
- with no obvious ordering, or other direct comparability,
- making them qualitative and often categorical.
- usually also finite.
Examples:
- labels, such as {T,A,G,C}
- choosing from multiple choice options
- distinguishers such as {green,blue} or {true,false}, binary gender, blood type
- left/right-handedness
- political affiliation
- brand names
Ideally, qualitative categories are exclusive (non-overlapping concepts).
If they are not, extra care needs to be taken during analysis.
Note that some of the examples push that boundary. Intentionally, to make you be critical of it.
More words
More complex cases
Implications
Further terms that matter
Continuous data refers to valued numbering that is not restricted to be discrete/integer.
- so ratio or interval data in the above list.
Quantitative data - basically anything not categorical, so referring to nominal/categorical.