Probabilities, odds, and logits

From Helpful
(Redirected from Log probabilities)
Jump to navigation Jump to search

Probabilities and log probabilities

https://en.wikipedia.org/wiki/Log_probability

Odds and log odds

Logits

In math, a logit function is a function that maps probabilities in 0..1 to real numbers in -inf..inf (more mathematically, in ℝ).


In statistics and probability, it tends to be more specifically defined as[1]:

logit(p) = log(p/(1−p)​)


This can be useful, in that doing this to everything involved lets you work in something like log-odds.

In fact, around statistics and probablity it is sometimes called log-odds, because it is equivalent to log(p/(1-p)), where p/(1-p) is the odds in probability.



In machine learning (especially ANNs), 'logit' is used more broadly and vaguely - but still carries a similar idea.

  • logit function - take unbounded values and use a sigmoid function (such as the logit function) to put that in 0..1
  • logit values, e.g. if a model is said to output logits, that communicates that it gives unbounded real numbers, and is not converted to probabilities.
and may carry the idea of "the values at this layer, before we do something else with it"

Again, logits are useful to have parts of a model work in an unbounded space - which probably act like log-odds even if not defined or designed that way.



You also see statements like "the logit is the inverse function of a logistic sigmoid function", which is not a literal definition, and more of an abstract statement about how logit values and probability values often relate to each other -- going the other way:

if a model gives you a logit that means you're reading out the unbounded numbers, and you can map those to probabilityies via e.g. something like sigmoid function (or softmax, or other) (Sigmoid maps ℝ → (0,1))
...so if you have the probability, then something you could describe as "the inverse function of the sigmoid" approximates the logit again (logit maps (0,1) → ℝ)


Around neural networks (and in particular in tensorflow?)

  • 'logit' is also used to refer to a layer / tensor that is a raw prediction output (probably from a much denser input)
...before passing it to a normalization-style function such as softmax.
  • 'logits' is sometimes used to refer to the numbers that that a NN layer spits out(verify)
...often the last layer. Say, a classification model may mean a vector of non-normalized values (that will be called logits), not yet passed though some normalization function


Without the detailed history of why this name was adopted for this specific use, some of the last are arguably just confusing abuse of the original term.


https://stackoverflow.com/questions/41455101/what-is-the-meaning-of-the-word-logits-in-tensorflow/52111173#52111173

https://en.wikipedia.org/wiki/Logit