Machine learning goals, problems, and glossary
This is more for overview of my own than for teaching or exercise.
|
Broadly, and with some snark
Embedded machine learning
Overall concepts, Glossary
Some broad problem types (mostly around ML)
Descriptors, architectures, combinations, and other meta
"How much interaction does it need" descriptors
Model-based versus model-free systems
Supervised, unsupervised, and more
More on reinforcement learning
Inductive versus deductive systems
Transfer learning
Feature learning, representation learning
State observers
Ensemble learning
Random (decision) forest
Bootstrap aggregation
Bucket of models
Bayesian model combination
Bayesian model averaging
Boosting
Gradients
Gradient boosting
Structured prediction
https://www.youtube.com/watch?v=e5sPNlgbZAE
https://pystruct.github.io/intro.html
https://en.wikipedia.org/wiki/Structured_prediction
"Deep learning"
Properties and behaviours of learners
Underfitting and overfitting
Underfitting is when a model is too simple to encode all the interesting things in the data.
Underfitted models and learners may still generalize very well, but they may not be covering/describing all the patterns in the data.
Sometimes that can be done intentionally, e.g. to describe only the major patterns.
But usually people want it to go as far as it could, so underfitting is a term used to point out maybe you want to allow more complexity into your model.
It may be hard to quantify how crude is too crude, though.
Overfitting goes the other way, where a model is too awkwardly specific
The exact implication of overfitting depends on the sort of learner it is (and the data you have), but often it describes cases where the model generalizes pretty well but also starts encoding so much specific/local data, insignificant variance, noise, or specific mistakes in the training set that that starts drowning out the more general parts.
Overfitting can happen
- when your training data isn't large, or diverse, or balanced enough to actually suggest much generalization,
- when there are entire groups of real-world examples that aren't represented, and/or
- when the model is allowed to contain so much complexity that it can essentially just model all training data's cases, meaning it is no longer forced to generalize at all, so probably perform worse on unseen data
It might still continue to score better on your data - only,
From another angle, overfitting it is the (mistaken) assumption that convergence in the training data means convergence in all data.
A little localized overfitting is not disruptive, in part because when there's still real patterns to be learned (there's not often a real cuttoff between "we have learned all useful things and have now moved on to nonsense" - if there was this would not be an issue), that signal drowns out the noise.
But more than a little overfitting tends to start drown out the generalization with messines.
There are a few useful tests to evaluate overfitting and underfitting.
Observability
Interpretability
Explainable AI
Other terms
Loss function
Energy function
On the theory / math side
Stochastic processes, deterministic processes, random fields
Markov property
Log-Linear and Maximum Entropy
Bayesian learning
Bayesian classifier
Bayes Optimal Classifier
Naive Bayes Classifier
Multinomial Naive Bayes
Bayesian (Belief) Network
See below
Curse of dimensionality
Kernel method
https://en.wikipedia.org/wiki/Kernel_method