Machine learning goals, problems, and glossary
This is more for overview of my own than for teaching or exercise.
|
Broadly, and with some snark
Embedded machine learning
Overall concepts, Glossary
Some broad problem types (mostly around ML)
Descriptors, architectures, combinations, and other meta
"How much interaction does it need" descriptors
Model-based versus model-free systems
Supervised , unsupervised, and more
More on reinforcement learning
Inductive versus deductive systems
Transfer learning
Feature learning, representation learning
State observers
Ensemble learning
Random (decision) forest
Bootstrap aggregation
Bucket of models
Bayesian model combination
Bayesian model averaging
Boosting
Gradients
Gradient boosting
Structured prediction
https://www.youtube.com/watch?v=e5sPNlgbZAE
https://pystruct.github.io/intro.html
https://en.wikipedia.org/wiki/Structured_prediction
"Deep learning"
Properties and behaviours of learners
Underfitting and overfitting
Underfitting is when a model is too simple to encode all the interesting things in the data.
Underfitted models and learners may still generalize very well, but they may not be covering/describing all the patterns in the data.
Sometimes that can be done intentionally, e.g. to describe just the most major patterns, but more usually more patterns is better, so underfitting is a term used to point out maybe you want to allow more complexity into your model.
It may be hard to quantify how crude is too crude, though.
Overfitting goes the other way, where a model is too awkwardly specific
The exact shape that overfitting takes depends on the sort of learner it is (and the data you have), but often it describes cases where the model has most generalizing down and starts encoding specific/local data, insignificant variance, noise, or specific mistakes in the training set.
Overfitting can happen
- when your training data isn't large or diverse or balanced enough to actually suggest much generalization,
- when there are entire groups of real-world examples that aren't represented, and/or
- when the model is allowed to contain so much compexity that it can essentially just model all training data's cases, meaning it is no longer forced to generalize at all.
From another angle, especially if it's the last:
Overfitting it is the (mistaken) assumption that convergence in the training data means convergence in all data.
The more it starts encoding things that aren't generally true, the more it may still look better on your training data only, but only because it's encoding peculiarities in the training data - and for that reason it will at some point start getting worse on unseen/real-world data, at generalization but also other tasks.
A little localized overfitting is not disruptive, in part because when there's still real patterns to be learned (there's not often a real cuttoff between "we have learned all useful things and have now moved on to nonsense" - if there was this would not be an issue), that signal drowns out the noise.
But more than a little overfitting tends to start drown out the generalization with messines.
There are a few useful tests to evaluate overfitting and underfitting.
Observability
Interpretability
Explainable AI
Other terms
Loss function
Energy function
On the theory / math side
Stochastic processes, deterministic processes, random fields
Markov property
Log-Linear and Maximum Entropy
Bayesian learning
Bayesian classifier
Bayes Optimal Classifier
Naive Bayes Classifier
Multinomial Naive Bayes
Bayesian (Belief) Network
See below
Curse of dimensionality
Kernel method
https://en.wikipedia.org/wiki/Kernel_method