Difference between revisions of "Data modeling, restructuring, analysis, fuzzy cases, learning"
m (→State observers) 
m (→PAC theory) 

Line 461:  Line 461:  
[[Category:Math on data]]  [[Category:Math on data]]  
[[Category:Machine Learning]]  [[Category:Machine Learning]]  
+  
+  ===Kernel method===  
+  
+  [[Classification,_clustering,_and_decisions#Kernel_method]]  
=Unsorted=  =Unsorted= 
Revision as of 21:33, 21 June 2021
This is more for overview of my own than for teaching or exercise.
Other data analysis, data summarization, learning

This article/section is a stub — probably a pile of halfsorted notes, is not wellchecked so may have incorrect bits. (Feel free to ignore, fix, or tell me) 
Contents
Concepts & glossary
Types of problems
The types of tasks/problems are often broadly sorted into some distinct types. ...which says little about how they are solved.
 Clustering aims to points out regions or groups of (mutual) similarity, and dissimilarity from other groups.
 clustering may not deal well with future data of the same sort, unless some care has been taken, so may not be the best choice of a learning/predicting system
 Vector quantization: Discretely dividing continuous space into various areas/shapes
 which itself can be used for decision problems, labeling, etc.
 Dimensionality reduction: projecting attributes into lowerdimensional data
 where the resulting data is (hopefully) comparably predictive/correlative (compared to the original)
 The reason is often to eliminate attributes/data that may be irrelevant or too informationally sparse
 see also #Ordination.2C_Dimensionality_reduction.2C_Factor_Analysis.2C_Multivariate_analysis
 Feature extraction: discovering (a few nice) attributes from (many) old ones, or just from data in general.
 Often has a good amount of overlap with dimensionality reduction
 Control optimization problem  finding the best action in every (visited) state of a system
 Structured prediction refer to prediction problems that want multiple labels (amount potentially varying with the input), and their prediction relies on intermediate workings of the others
 which actually addresses implementation a little more than most others, in that running independent classifiers will give an answer but not be a structured prediction
 others...
Related fields
Descriptors of learning systems
supervised versus unsupervised (learners)
Supervised usually means the training process is suggested or somehow (dis)approved. Usually it refers to having annotated training data, sometimes to altering it. Example: Classification of documents, basic neural network backpropagation, leastsquares fitting, operator cloning
Unsupervised refers to processes that work without intervention. For example, selforganizing maps, or clustering documents based on similarity needs no annotation.
Semi/lightly supervised usually means there is an iterative process which needs only minimal human intervention, be it to deal with a few unclear cases, for information that may be useful, or such.
There is also an argument for a split between supervised, unsupervised, and reinforcement learning,
where reinforcement learning lies somewhere in between  not having labeled data, but knowing what sort of outcome you want and somehow rewarding that, and steps towards it.
Modelbased versus modelfree systems
Inductive versus deductive systems (learners)
Transfer learning
State observers
State observers / state estimation will estimate the internal state of a real system, measuring what they easily can, and estimating what they need to. Optimization_theory,_control_theory#State_observers_.2F_state_estimation_.28and_filters_in_this_sense.29
This typically a singular state in control theory without much state, but can also be useful to present data to a learning system.
Related are the observability criterion and controllability criterion, as introduced by Kalman.
Observability is the idea that the states of a system can be reasonably inferred from knowledge of its outputs. Roughly, that a good set of sensors lets you build a model that covers all its (important) variables.(verify)
Controllability is roughly the idea that you can actually guide a system through all states.(verify)
Both are related to building something that is likely to be stable in a knowable way.
Again, concepts more related to control theory, but more widely applicable.
Behaviours of learners
Underfitting and overfitting (learners)
Underfitting is when a model is too simple to be good at describing all the patterns in the data.
Underfitted models and learners may still generalize very well, and that can be intentional, e.g. to describe just the most major patterns.
It may be hard to quantify how crude is too crude, though.
Overfitting often means the model is allowed to be so complex that a part of it describes all the patterns there are, meaning the rest ends up describing just noise, or insignificant variance or random errors in the training set.
A little overfitting is not disruptive, but a lot of it often is, distorting or drowning out the parts that are actually modeling the major relationships.
Put another way, overfitting it is the (mistaken) assumption that convergence in the training data means convergence in all data.
There are a few useful tests to evaluate overfitting and underfitting.
On the math side
Stochastic processes, deterministic processes, random fields
A deterministic process deals with possible determined cases, with no unknowns or random variables.
A stochastic process (a.k.a. random process) allows indeterminacy, typically by working with probability distributions.
A lot of data is stochastically modeled, often because you only need partial data (and often only have partial data to start with).
Hybrid models in this context means mixing deterministic and stochastic processes
A random field basically describes the modelling generalization when parameters (dependent variables) are not necessarily time, or onedimensional, or realvalued. (verify)
See also:
 http://en.wikipedia.org/wiki/Stochastic_process
 http://en.wikipedia.org/wiki/Deterministic_system
 http://en.wikipedia.org/wiki/Random_field
 http://en.wikipedia.org/wiki/Conditional_random_field
Markov property
the Markov property is essentially that there is no memory, only direct response: that response of a process is determined entirely by its current state, and current input (if you don't already define that as part of the state).
More formally, "The environment's response (s,r) at time t+1 depends only on the Markov state s and action a at time t" [1]
There are many general concepts that you can make stateless, and thereby Markovian:
 A Markov model is a stochastic model with the Markov property
 A Markov process is a stochastic process with the Markov property [2]
 A Markov chain refers to a Markov process with finite, countable states [3]
 A Markov random field [4]
 A Markov logic network [5]
 A Markov Decision Process (MDP) is a decision process that satisfies the Markov property
 ..etc.
Curse of dimensionality
The curse of dimensionality is, roughly, the idea that each dimension you add to your model makes life harder.
Intuitively, this is largely because a lot of these dimensions may be noninformative, but will still contribute to everything you do.
A common example is distance calculations. Ass you add dimensions, the contributing value of any one decays, meaning the bulk of lessexpressive are going to overpower the few good ones.
It's very likely to drown any good signal in a lot more noise.
And things that build on those distances, like clustering, are going to have a harder time.
Semisorted
Structured prediction
Mistake Bound learning
PAC theory
Kernel method
Classification,_clustering,_and_decisions#Kernel_method