Optimization theory, control theory
This is more for overview of my own than for teaching or exercise.
Other data analysis, data summarization, learning

Contents
State observers / state estimation; filters
Bayes estimator, Bayes filter
alphabeta filter
An alpha beta filter (a.k.a. fg filter, gh filter)
Kalman filter
Multi hypothesis tracking
Particle filter
Data fusion
Sensor fusion
Optimization theory, control theory
Glossary
Some controllers
In terms of the (near) future:
 greedy control doesn't really look ahead.
 PID can be tuned for some basic tendencies
 MPC tries to minimize mistakes in predicted future
 For example, take a HVAC system that actively heats but passively cools. This effectively means you should be very careful of overshooting. You would make the system sluggish  which also reduces performance because it lengthens the time of effects and settling
Nonlinear:
 HVAC
Greedy controllers
Doesn't look ahead, just minimizes for the current step.
For example basic proportional adjustment.
Tends not to be stable.
Can be stable enough for certain cases, in particular very slow systems where slow control is fine, and accuracy not so important.
For example, water boilers have such large volume that even a bangbang controller (turn heater element fully on or off according to temperature threshold) will keep the water within a few fewdegrees of that threshold, simply because the water's heat capacity is large in relation to the heating element you'ld probably use.
But in a wider sense, e.g. that same boiler with a small volume, or powerful heater, will mean such control causes unproductive feedback, e.g. oscillations when actuation running is about as fast or faster than measurement.
Hysteresis (behaviour)
Hysteresis control (type)
Mapbased controller (type)
PID controller
This article/section is a stub — probably a pile of halfsorted notes, is not wellchecked so may have incorrect bits. (Feel free to ignore, fix, or tell me) 
PID is a fairly generic controlloop system, still widely used in industrial control systems.
It is useful in systems that have to deal with delays between and/or in actuation and sensing, where they can typically be tuned to work better than greedy controllers (and also be tuned to work worse), because unlike greedy, you can try to tune out overshoots as well as oscillations.
PID is computationally very cheap (a few adds and multiplies per step), compared to some other cleverer methods.
Yet:
 There are no simple guarantees of optimality or stability,
 you have to tune them,
 and learn how to tune them.
 tuning is complex in that it depends on
 how fast the actuation works
 how fast you sample
 how fast the system changes/settles
 doesn't deal well with long time delays
 derivative component is sensitive to noise, so filtering may be a good idea
 has trouble controlling complex systems
 more complex systems should probably look to MPC or similar.
 linear at heart (assumes measurement and actuation are relatively linear)
 so doesn't perform so well in nonlinear systems
 symmetric at heart, so not necessarily wellsuited to nonsymmetric actuation
 consider e.g. a HVAC system  which would oscillate around its target by alternately heating and cooling.
 It is much more power efficient to do one passively, e.g. active heating and passive cooling (if it's cold outside), or active cooling and passive heating (if it's warmer outside)
 means it's easier to overshoot, and more likely to stick offsetpoint on the passive side, so on average be on one side
 You could make the system sluggish  in this case it reduces the speed at which it reaches the setpoint, but that is probably acceptable to you.
 in other words: sluggish system and/or a bias to one side
Some definition
This article/section is a stub — probably a pile of halfsorted notes, is not wellchecked so may have incorrect bits. (Feel free to ignore, fix, or tell me) 
The idea is to adjust the control based on some function of the error, and a Proportional–Integral–Derivative (PID) controller combines the three components it names, each tweaked with their own weight (gain).
The very short version is that
 P adjusts according to the proportional error
 I adjusts according to the integrated error
 D adjusts according to the derivative error
It can be summarized as:
where
 e(t) is the error
 P, I, and D are scalar weights controlling how much effect each component has
Some intuition
So how do you tune it?
MPC (Model Predictive Control)
FLC (Fuzzy Logic Control)
Notes on
Gradientbased learning
See also
LogLinear and Maximum Entropy
Reinforcement learning (RL)
See also:
 LP Kaelbing, ML Littman, AW Moore. (1996) Reinforcement learning: A survey.
 RS Sutton, AG Barto (1998) Reinforcement Learning: An Introduction