Statistics notes - types of data
This is more for overview of my own than for teaching or exercise.
|
✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
One possible typology
Ratio data (ordered, meaningful zero point, linear scale, (often) continuous)
✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
Properties:
- ordered/monotonous (larger value means larger represented thing)
- linearly comparable, e.g. twice the distance in these numbers means twice the amount of difference in represented thing)
- meaningful zero point
- the combination of the above imply numbers are proportional: twice the number directly means twice the amount of different thing
Examples:
- weight
- length
- time amount - reaction time, hours of study, time required to run a marathon
- age
- temperature in Kelvin
- number of responses (note: overlap with discrete numeric)
- many physical measurements in general
- though not all - can depends on units and their implied zeroing. For example, Farenheit and Celcius are not zeroed according to energy
Interval data (ordered, no meaningful zero point, possible linear scale, (often) continuous)
Properties:
- ordered/monotonous (larger value means larger represented thing)
- comparability often not linear, though could be for any given case
- no particularly meaningful zero point
Interval data is quantitative data in a numbering system in which there is no sensible zero point.
This means the assumption of linear relationships may be incorrect (often the most important difference compared to ratio data)
- ...because of the zero point
- ...because the scale is arbitrary
- ...and/or because of other reasons
Discrete numeric data (ordered, linear scale, discrete)
Ordinal data (ordered, but no obvious numbering so not linear; discrete)
Examples:
- highest level of education
- questionnaire items of the 'strongly disagree to strongly agree in five steps' sort
- age groups ('age up to 20', 'age 20-29', 'age 30-39') (note: in this case based on ratio data)
- socioeconomic status (sort of)
- most any ranking
Asking people to rate on a few-point scale is often seen as ordinal, while there is overlap with continuous interval data.
Nominal/categorical data (unordered; qualitative; discrete)
Examples:
- labels, such as {T,A,G,C}
- choosing from multiple choice options
- distinguishers such as {green,blue} or {true,false}, discrete genders, blood type
- brand names
- left/right-handedness
- political affiliation
More words
More complex cases
Implications
Further terms that matter
Continuous data refers to valued numbering that is not restricted to be discrete/integer.
- so ratio or interval data in the above list.
Quantitative data - basically anything not categorical, so referring to nominal/categorical.