Difference between revisions of "Statistics notes - types of data"

From Helpful
Jump to: navigation, search
m (Interval data (ordered, no meaningful zero point, possible linear scale, (often) continuous))
m (Discrete numeric data (ordered, linear scale, discrete))
 
Line 207: Line 207:
  
 
Think integers. Includes things like counts.
 
Think integers. Includes things like counts.
 +
  
 
Examples:
 
Examples:
*  
+
* many counts, e.g. the amount of attendees to a daily event
 +
* the profit we make per week
 +
 
  
  

Latest revision as of 16:55, 17 August 2022

This is more for overview of my own than for teaching or exercise.

Overview of the areas

Arithmetic · 'elementary mathematics' and similar concepts
Set theory, Category theory
Geometry and its relatives · Topology
Elementary algebra - Linear algebra - Abstract algebra
Calculus and analysis
Logic
Semi-sorted
 : Information theory · Number theory · Decision theory, game theory · Recreational mathematics · Dynamical systems · Unsorted or hard to sort


Math on data:

  • Statistics as a field
some introduction · areas of statistics
types of data · on random variables, distributions
Virtues and shortcomings of...
on sampling · probability
glossary · references, unsorted
Footnotes on various analyses


  • Other data analysis, data summarization, learning
Data modeling, restructuring, and massaging
Statistical modeling · Classification, clustering, decisions, and fuzzy coding ·
dimensionality reduction ·
Optimization theory, control theory · State observers, state estimation
Connectionism, neural nets · Evolutionary computing
  • More applied:
Formal grammars - regular expressions, CFGs, formal language
Signal analysis, modeling, processing
Image processing notes



This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)


One possible typology

Ratio data (ordered, meaningful zero point, linear scale, (often) continuous)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

Properties:

  • ordered/monotonous (larger value means larger represented thing)
  • linearly comparable, e.g. twice the distance in these numbers means twice the amount of difference in represented thing)
  • meaningful zero point
  • the combination of the above imply numbers are proportional: twice the number directly means twice the amount of different thing


Examples:

  • weight
  • length
  • time amount - reaction time, hours of study, time required to run a marathon
  • age
  • temperature in Kelvin
  • number of responses (note: overlap with discrete numeric)
  • many physical measurements in general
though not all - can depends on units and their implied zeroing. For example, Farenheit and Celcius are not zeroed according to energy


Interval data (ordered, no meaningful zero point, possible linear scale, (often) continuous)

Properties:

  • ordered/monotonous (larger value means larger represented thing)
  • comparability often not linear, though could be for any given case
  • no particularly meaningful zero point


Interval data is quantitative data in a numbering system in which there is no sensible zero point.

This means the assumption of linear relationships may be incorrect (often the most important difference compared to ratio data)

  • ...because of the zero point
  • ...because the scale is arbitrary
  • ...and/or because of other reasons


Discrete numeric data (ordered, linear scale, discrete)

Ordinal data (ordered, but no obvious numbering so not linear; discrete)

Examples:

  • highest level of education
  • questionnaire items of the 'strongly disagree to strongly agree in five steps' sort
  • age groups ('age up to 20', 'age 20-29', 'age 30-39') (note: in this case based on ratio data)
  • socioeconomic status (sort of)
  • most any ranking

Asking people to rate on a few-point scale is often seen as ordinal, while there is overlap with continuous interval data.


Nominal/categorical data (unordered; qualitative; discrete)

Examples:

  • labels, such as {T,A,G,C}
  • choosing from multiple choice options
  • distinguishers such as {green,blue} or {true,false}, (discrete) gender, blood type
  • brand names
  • left/right-handedness
  • political affiliation


More words

More complex cases

Implications

Further terms that matter

Continuous data refers to valued numbering that is not restricted to be discrete/integer.

so ratio or interval data in the above list.


Quantitative data - basically anything not categorical, so referring to nominal/categorical.

Variables, dimensions, and measurement, and experiments