Machine learning goals, problems, and glossary

From Helpful
Jump to navigation Jump to search
This is more for overview of my own than for teaching or exercise.

Overview of the math's areas

Arithmetic · 'elementary mathematics' and similar concepts
Set theory, Category theory
Geometry and its relatives · Topology
Elementary algebra - Linear algebra - Abstract algebra
Calculus and analysis
Logic
Semi-sorted
: Information theory · Number theory · Decision theory, game theory · Recreational mathematics · Dynamical systems · Unsorted or hard to sort


Math on data:

  • Statistics as a field
some introduction · areas of statistics
types of data · on random variables, distributions
Virtues and shortcomings of...
on sampling · probability
glossary · references, unsorted
Footnotes on various analyses


  • Other data analysis, data summarization, learning
Machine learning goals, problems, and glossary
Data modeling, restructuring, and massaging
Statistical modeling · Classification, clustering, decisions, and fuzzy coding ·
dimensionality reduction ·
Optimization theory, control theory · State observers, state estimation
Connectionism, neural nets · Evolutionary computing
  • More applied:
Formal grammars - regular expressions, CFGs, formal language
Signal analysis, modeling, processing
Image processing notes
Varied text processing



Applied goals - and related fields

Broadly, and with some snark

Embedded machine learning

Overall concepts, Glossary

Some broad problem types (mostly around ML)

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Descriptors, architectures, combinations, and other meta

"How much interaction does it need" descriptors

Model-based versus model-free systems
Supervised , unsupervised, and more
More on reinforcement learning

Inductive versus deductive systems

Transfer learning

Feature learning, representation learning

State observers

Ensemble learning

Random (decision) forest
Bootstrap aggregation
Bucket of models
Bayesian model combination
Bayesian model averaging

Boosting

Gradients

Gradient boosting

Structured prediction

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

https://www.youtube.com/watch?v=e5sPNlgbZAE

https://pystruct.github.io/intro.html

https://en.wikipedia.org/wiki/Structured_prediction

"Deep learning"

Deep learning

Properties and behaviours of learners

Underfitting and overfitting

Underfitting is when a model is too simple to encode all the interesting things in the data.

Underfitted models and learners may still generalize very well, but they may not be covering/describing all the patterns in the data.

Sometimes that can be done intentionally, e.g. to describe just the most major patterns, but more usually more patterns is better, so underfitting is a term used to point out maybe you want to allow more complexity into your model.

It may be hard to quantify how crude is too crude, though.


Overfitting goes the other way, where a model is too awkwardly specific

The exact shape that overfitting takes depends on the sort of learner it is (and the data you have), but often it describes cases where the model has most generalizing down and starts encoding specific/local data, insignificant variance, noise, or specific mistakes in the training set.

Overfitting can happen

when your training data isn't large or diverse or balanced enough to actually suggest much generalization,
when there are entire groups of real-world examples that aren't represented, and/or
when the model is allowed to contain so much compexity that it can essentially just model all training data's cases, meaning it is no longer forced to generalize at all.


From another angle, especially if it's the last: Overfitting it is the (mistaken) assumption that convergence in the training data means convergence in all data.

The more it starts encoding things that aren't generally true, the more it may still look better on your training data only, but only because it's encoding peculiarities in the training data - and for that reason it will at some point start getting worse on unseen/real-world data, at generalization but also other tasks.


A little localized overfitting is not disruptive, in part because when there's still real patterns to be learned (there's not often a real cuttoff between "we have learned all useful things and have now moved on to nonsense" - if there was this would not be an issue), that signal drowns out the noise.

But more than a little overfitting tends to start drown out the generalization with messines.


There are a few useful tests to evaluate overfitting and underfitting.



Observability

Interpretability

Explainable AI

Other terms

Loss function

Energy function

More concepts, some shared tools

On the theory / math side

Stochastic processes, deterministic processes, random fields

Markov property

Log-Linear and Maximum Entropy

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Bayesian learning

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Bayesian classifier
Bayes Optimal Classifier
Naive Bayes Classifier
Multinomial Naive Bayes
Bayesian (Belief) Network

See below

Curse of dimensionality

Kernel method

https://en.wikipedia.org/wiki/Kernel_method

Statistical modeling

Regression analysis

On how regression relates to classification
Linear regression
Logistic regression

Graph based modeling

Directed network modeling
Bayesian belief network
Undirected network modeling
Markov Random Fields
Conditional Random Field

Feature selection

On the statistical side

Sampling

Gibbs sampling