Statistics notes - types of data

From Helpful
(Redirected from Ratiometric)
Jump to navigation Jump to search

This is more for overview of my own than for teaching or exercise.

Overview of the math's areas

Arithmetic · 'elementary mathematics' and similar concepts
Set theory, Category theory
Geometry and its relatives · Topology
Elementary algebra - Linear algebra - Abstract algebra
Calculus and analysis
Logic
Semi-sorted
: Information theory · Number theory · Decision theory, game theory · Recreational mathematics · Dynamical systems · Unsorted or hard to sort


Math on data:

  • Statistics as a field
some introduction · areas of statistics
types of data · on random variables, distributions
Virtues and shortcomings of...
on sampling · probability
glossary · references, unsorted
Footnotes on various analyses


  • Other data analysis, data summarization, learning
Machine larning goals, problems, and glossary
Data modeling, restructuring, and massaging
Statistical modeling · Classification, clustering, decisions, and fuzzy coding ·
dimensionality reduction ·
Optimization theory, control theory · State observers, state estimation
Connectionism, neural nets · Evolutionary computing
  • More applied:
Formal grammars - regular expressions, CFGs, formal language
Signal analysis, modeling, processing
Image processing notes
Varied text processing



This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


One possible typology

Ratio data (ordered, meaningful zero point, linear scale, (often) continuous)

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Properties:

  • ordered/monotonous (larger value means larger represented thing)
  • linearly comparable, e.g. twice the distance in these numbers means twice the amount of difference in represented thing)
  • meaningful zero point
  • the combination of the above imply numbers are proportional: twice the number directly means twice the amount of different thing


Examples:

  • weight
  • length
  • time amount - reaction time, hours of study, time required to run a marathon
  • age
  • temperature in Kelvin
  • number of responses (note: overlap with discrete numeric)
  • many physical measurements in general
though not all - can depends on units and their implied zeroing. For example, Farenheit and Celcius are not zeroed according to energy


Interval data (ordered, no meaningful zero point, possible linear scale, (often) continuous)

Properties:

  • ordered/monotonous (larger value means larger represented thing)
  • comparability often not linear, though could be for any given case
  • no particularly meaningful zero point


Interval data is quantitative data in a numbering system in which there is no sensible zero point.

This means the assumption of linear relationships may be incorrect (often the most important difference compared to ratio data)

  • ...because of the zero point
  • ...because the scale is arbitrary
  • ...and/or because of other reasons


Discrete numeric data (ordered, linear scale, discrete)

Ordinal data (ordered, but no obvious numbering so not linear; discrete)

Properties:

  • ordered/monotonous (larger value means larger represented thing)
  • not linearly comparable
  • not necessarily a meaningful zero point
  • not proportional


Examples:

  • highest level of education
  • age groups ('age up to 20', 'age 20-29', 'age 30-39') (note: in this case based on ratio data)
  • socioeconomic status (sort of)
  • most any ranking
  • questionnaire items of the 'strongly disagree to strongly agree in five steps' sort


Ordinal data is discrete and is ordered, but has no directly obvious values to put to each item, so there there is not necessarily any linearity or a zero point.

Put another way, cases in which you can clearly rank, but can not unambiguously put a value on without introducing some subjective assumptions (such as that it is linear, equally spaced or valued, or other such problems).

In the qualitative/quantitative distinction, this sits somewhere in between.


You should generally err on the side of treating this as qualitative data, and you should assume that applying quantitative analyses would be a bad idea.

If you can make a good argument why and how this can be interpreted as quantitative data, it can be sensible enough for certain analyses.

A classical case where people frequently do this is "rate on a few-point scale" in questionnaires.


Nominal/categorical data (unordered; qualitative; discrete)

Nominal roughly means 'of names' (in other contexts as well), nominal values are those that are

discrete,
with no obvious ordering, or other direct comparability,
making them qualitative and often categorical.
usually also finite.

Examples:

  • labels, such as {T,A,G,C}
  • choosing from multiple choice options
  • distinguishers such as {green,blue} or {true,false}, binary gender, blood type
  • left/right-handedness
  • political affiliation
  • brand names


Ideally, qualitative categories are exclusive (non-overlapping concepts).

If they are not, extra care needs to be taken during analysis.

Note that some of the examples push that boundary. Intentionally, to make you be critical of it.


More words

More complex cases

Implications

Further terms that matter

Continuous data refers to valued numbering that is not restricted to be discrete/integer.

so ratio or interval data in the above list.


Quantitative data - basically anything not categorical, so referring to nominal/categorical.

Variables, dimensions, and measurement, and experiments