Difference between revisions of "Data modeling, restructuring, analysis, fuzzy cases, learning"

From Helpful
Jump to: navigation, search
m (State observers)
m (PAC theory)
Line 461: Line 461:
 
[[Category:Math on data]]
 
[[Category:Math on data]]
 
[[Category:Machine Learning]]
 
[[Category:Machine Learning]]
 +
 +
===Kernel method===
 +
 +
[[Classification,_clustering,_and_decisions#Kernel_method]]
  
 
=Unsorted=
 
=Unsorted=

Revision as of 21:33, 21 June 2021

This is more for overview of my own than for teaching or exercise.

Overview of the areas

Arithmetic · 'elementary mathematics' and similar concepts
Set theory, Category theory
Geometry and its relatives · Topology
Elementary algebra - Linear algebra - Abstract algebra
Calculus and analysis
Logic
Semi-sorted
 : Information theory · Number theory · Decision theory, game theory · Recreational mathematics · Dynamical systems · Unsorted or hard to sort


Math on data:

  • Statistics as a field
some introduction · areas of statistics
types of data · on random variables, distributions
Virtues and shortcomings of...
on sampling · probability
glossary · references, unsorted
Footnotes on various analyses

Other data analysis, data summarization, learning

  • Data modeling, restructuring, analysis, fuzzy cases, learning
Statistical modeling · Classification, clustering, decisions · dimensionality reduction · Optimization theory, control theory
Connectionism, neural nets · Evolutionary computing



This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Concepts & glossary

Types of problems

The types of tasks/problems are often broadly sorted into some distinct types. ...which says little about how they are solved.


  • Clustering aims to points out regions or groups of (mutual) similarity, and dissimilarity from other groups.
clustering may not deal well with future data of the same sort, unless some care has been taken, so may not be the best choice of a learning/predicting system
  • Vector quantization: Discretely dividing continuous space into various areas/shapes
which itself can be used for decision problems, labeling, etc.
  • Dimensionality reduction: projecting attributes into lower-dimensional data
where the resulting data is (hopefully) comparably predictive/correlative (compared to the original)
The reason is often to eliminate attributes/data that may be irrelevant or too informationally sparse
see also #Ordination.2C_Dimensionality_reduction.2C_Factor_Analysis.2C_Multivariate_analysis
  • Feature extraction: discovering (a few nice) attributes from (many) old ones, or just from data in general.
Often has a good amount of overlap with dimensionality reduction


  • Control optimization problem - finding the best action in every (visited) state of a system


  • Structured prediction refer to prediction problems that want multiple labels (amount potentially varying with the input), and their prediction relies on intermediate workings of the others
which actually addresses implementation a little more than most others, in that running independent classifiers will give an answer but not be a structured prediction


  • others...

Related fields

Descriptors of learning systems

supervised versus unsupervised (learners)

Supervised usually means the training process is suggested or somehow (dis)approved. Usually it refers to having annotated training data, sometimes to altering it. Example: Classification of documents, basic neural network back-propagation, least-squares fitting, operator cloning


Unsupervised refers to processes that work without intervention. For example, self-organizing maps, or clustering documents based on similarity needs no annotation.


Semi/lightly supervised usually means there is an iterative process which needs only minimal human intervention, be it to deal with a few unclear cases, for information that may be useful, or such.


There is also an argument for a split between supervised, unsupervised, and reinforcement learning, where reinforcement learning lies somewhere in between - not having labeled data, but knowing what sort of outcome you want and somehow rewarding that, and steps towards it.



Model-based versus model-free systems

Inductive versus deductive systems (learners)

Transfer learning

State observers

State observers / state estimation will estimate the internal state of a real system, measuring what they easily can, and estimating what they need to. Optimization_theory,_control_theory#State_observers_.2F_state_estimation_.28and_filters_in_this_sense.29

This typically a singular state in control theory without much state, but can also be useful to present data to a learning system.


Related are the observability criterion and controllability criterion, as introduced by Kalman.

Observability is the idea that the states of a system can be reasonably inferred from knowledge of its outputs. Roughly, that a good set of sensors lets you build a model that covers all its (important) variables.(verify)

Controllability is roughly the idea that you can actually guide a system through all states.(verify)


Both are related to building something that is likely to be stable in a knowable way.

Again, concepts more related to control theory, but more widely applicable.

Behaviours of learners

Underfitting and overfitting (learners)

Underfitting is when a model is too simple to be good at describing all the patterns in the data.

Underfitted models and learners may still generalize very well, and that can be intentional, e.g. to describe just the most major patterns.

It may be hard to quantify how crude is too crude, though.


Overfitting often means the model is allowed to be so complex that a part of it describes all the patterns there are, meaning the rest ends up describing just noise, or insignificant variance or random errors in the training set.

A little overfitting is not disruptive, but a lot of it often is, distorting or drowning out the parts that are actually modeling the major relationships.

Put another way, overfitting it is the (mistaken) assumption that convergence in the training data means convergence in all data.


There are a few useful tests to evaluate overfitting and underfitting.




On the math side

Stochastic processes, deterministic processes, random fields

A deterministic process deals with possible determined cases, with no unknowns or random variables.


A stochastic process (a.k.a. random process) allows indeterminacy, typically by working with probability distributions.

A lot of data is stochastically modeled, often because you only need partial data (and often only have partial data to start with).


Hybrid models in this context means mixing deterministic and stochastic processes


A random field basically describes the modelling generalization when parameters (dependent variables) are not necessarily time, or one-dimensional, or real-valued. (verify)


See also:

Markov property

the Markov property is essentially that there is no memory, only direct response: that response of a process is determined entirely by its current state, and current input (if you don't already define that as part of the state).

More formally, "The environment's response (s,r) at time t+1 depends only on the Markov state s and action a at time t" [1]


There are many general concepts that you can make stateless, and thereby Markovian:

  • A Markov chain refers to a Markov process with finite, countable states [3]
  • A Markov random field [4]
  • A Markov logic network [5]
  • A Markov Decision Process (MDP) is a decision process that satisfies the Markov property
  • ..etc.


Curse of dimensionality

The curse of dimensionality is, roughly, the idea that each dimension you add to your model makes life harder.


Intuitively, this is largely because a lot of these dimensions may be noninformative, but will still contribute to everything you do.


A common example is distance calculations. Ass you add dimensions, the contributing value of any one decays, meaning the bulk of less-expressive are going to overpower the few good ones. It's very likely to drown any good signal in a lot more noise.

And things that build on those distances, like clustering, are going to have a harder time.


Semi-sorted

Structured prediction

Mistake Bound learning

PAC theory

Kernel method

Classification,_clustering,_and_decisions#Kernel_method

Unsorted