Data modeling, restructuring, analysis, fuzzy cases, learning

From Helpful
Revision as of 12:10, 27 July 2020 by Helpful (Talk | contribs)

Jump to: navigation, search
This is more for overview of my own than for teaching or exercise.

Overview of the areas

Arithmetic · 'elementary mathematics' and similar concepts
Set theory, Category theory
Geometry and its relatives · Topology
Elementary algebra - Linear algebra - Abstract algebra
Calculus and analysis
 : Information theory · Number theory · Decision theory, game theory · Recreational mathematics · Dynamical systems · Unsorted or hard to sort

Math on data:

  • Statistics as a field
some introduction · areas of statistics
types of data · on random variables, distributions
Virtues and shortcomings of...
on sampling · probability
glossary · references, unsorted
Footnotes on various analyses

Other data analysis, data summarization, learning

  • Data modeling, restructuring, analysis, fuzzy cases, learning
Statistical modeling · Classification, clustering, decisions · dimensionality reduction · Optimization theory, control theory
Connectionism, neural nets · Evolutionary computing

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Concepts & glossary

The curse of dimensionality is, roughly, the idea that when you add a dimension to your model you need proportionally more data for decent training of that model. Similarly, since the volume increases so fast, you probably have a sparsity problem. It's a fairly exponential problem, so

Stochastic processes, deterministic processes, random fields

A deterministic process deals with possible determined cases, no unknowns or random variables.

A stochastic process (a.k.a. random process) allows indeterminacy, typically by working with probability distributions.

A lot of data is stochastically modeled, because you only need partial data and can generally only get partial data.

(Models mixing deterministic and stochastic processes are often called hybrid models)

A random field basically describes the generalization that happens when the parameter (dependent variable) is not necessarily time, or one-dimensional, or real-valued.

See also:

Types of problems

Tasks are often one of:

  • Clustering points out regions or groups of (mutual) similarity, and dissimilarity from other groups.
clustering may not deal well with future data of the same sort, unless some care has been taken, so may not be the best choice of a learning/predicting system
  • Vector quantization: Discretely dividing continuous space into various areas/shapes
which itself can be used for decision problems, labeling, etc.
  • Dimensionality reduction: projecting attributes into lower-dimensional data
where the resulting data is (hopefully) comparably predictive/correlative (compared to the original)
The reason is often to eliminate attributes/data that may be irrelevant or too informationally sparse
see also #Ordination.2C_Dimensionality_reduction.2C_Factor_Analysis.2C_Multivariate_analysis
  • Feature extraction: discovering (a few nice) attributes from (many) old ones, or just from data in general.
Often has a good amount of overlap with dimensionality reduction
  • others...

Markov property

the Markov property is essentially that there is no memory, only direct response: that response of a process is determined entirely by its current state (and input, if you don't already define that as part of the state).

More formally, "The environment's response (s,r) at time t+1 depends only on the Markov state s and action a at time t" [1]

There are many general concepts that you can make stateless, and thereby Markovian:

  • A Markov chain refers to a Markov process with finite, countable states [3]
  • A Markov random field [4]
  • A Markov logic network [5]
  • A Markov Decision Process (MDP) is a decision process that satisfies the Markov property
  • ..etc.

Observability and controllability

Underfitting and overfitting (learners)

Underfitting is when a model is too simple to be good at describing all the patterns in the data.

Underfitted models and learners may still generalize very well, and that can be intentional, e.g. to describe just the most major patterns.

It may be hard to quantify how crude is too crude, though.

Overfitting often means the model is allowed to be so complex that a part of it describes all the patterns there are, meaning the rest ends up describing just noise, or insignificant variance or random errors in the training set.

A little overfitting is not disruptive, but a lot of it often is, distorting or drowning out the parts that are actually modeling the major relationships.

Put another way, overfitting it is the (mistaken) assumption that convergence in the training data means convergence in all data.

There are a few useful tests to evaluate overfitting and underfitting.

supervised versus unsupervised systems (learners)

Supervised usually means the training process is suggested or somehow (dis)approved. Usually it refers to having annotated trainign data, sometimes to altering it. Example: Classification of documents, basic neural network back-propagation, least-squares fitting, operator cloning

Unsupervised refers to processes that work without intervention. For example, self-organizing maps, or clustering documents based on similarity needs no annotation.

Semi/lightly supervised usually means there is an iterative process which needs only minimal human intervention, be it to deal with a few unclear cases, for information that may be useful, or such.

Inductive versus deductive systems (learners)

Inductive refers to training purely from data.

Deductive refers to also having a theory about the domain domain.

Inductive learning can be approached as a search in a hypothesis space.

Model-based versus model-free systems (learners)

Statistical modeling

Data fusion means merging data from various sources, e.g. various sensors. This often implies some sort of modelling.

Regression analysis

Regression glossary:

  • linear regression models a linear predictor function
typically a least squares estimator
  • simple regression indicates a single independent/explanatory variable

Linear regression

Logistic regression

Mean Squared Error (metric)

State observers

See Optimization_theory,_control_theory#State_observers_.2F_state_estimation.3B_filters


Combining classifiers