Various transfer functions: Difference between revisions
m (→Logit) |
mNo edit summary |
||
(11 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
{{#addbodyclass:tag_math}} | |||
Line 23: | Line 24: | ||
* [http://www.wolframalpha.com/input/?i=plot+%281-abs%281-2x%29%29**0.5+for+x+in+%280%2C1%29 sqrt(1-abs(1-2x))] | * [http://www.wolframalpha.com/input/?i=plot+%281-abs%281-2x%29%29**0.5+for+x+in+%280%2C1%29 sqrt(1-abs(1-2x))] | ||
* [http://www.wolframalpha.com/input/?i=plot+sin%28pi*x%29+for+x+in+%280%2C1%29 sin(pi*x)] | * [http://www.wolframalpha.com/input/?i=plot+sin%28pi*x%29+for+x+in+%280%2C1%29 sin(pi*x)] | ||
* sin(pi*x) raised to a power (lower or higher than 1) | * sin(pi*x) raised to a power (lower or higher than 1), e.g. | ||
** [https://www.wolframalpha.com/input?i=plot+sin%28pi*x%29**3+for+x+in+%280%2C1%29 sin(pi*x)**3] | |||
** [https://www.wolframalpha.com/input?i=plot+sin%28pi*x%29**0.5+for+x+in+%280%2C1%29 sqrt( sin(pi*x) )] | |||
* many others | * many others | ||
Line 90: | Line 94: | ||
{{stub}} | {{stub}} | ||
<!-- | In math, '''logit''' is a function that maps probabilities in 0..1 to real numbers in -inf..inf. | ||
{{zzz|Around statistics and probablity it is sometimes called '''log-odds''', because it is equivalent to log(p/(1-p)), where ''p/(1-p)'' is the [https://en.wikipedia.org/wiki/Odds odds] in probability.}} | |||
<!-- [https://www.wolframalpha.com/input?i=plot+log%28+x%2F%281-x%29+%29+++for+x+in+%280%2C1%29 --> | |||
It a transfer function largely useful to statistics and machine learning. | It a transfer function largely useful to statistics and machine learning. | ||
Line 99: | Line 105: | ||
In machine learning, it's | In machine learning, it's | ||
: sometimes just as a general idea, | : sometimes just as a general idea, | ||
: sometimes used to refer to the inverse function of logistic sigmoid function{{verify}} | : sometimes used to refer to the inverse function of a (any) logistic sigmoid function{{verify}} | ||
Around neural networks (an in particular in tensorflow) | |||
* 'logit' it is also used to refer to a layer / tensor that is a raw prediction output (probably from a much denser input) before passing it to a normalization-style function like [[softmax]]. | |||
* 'logits' is sometimes used to refer to the numbers that that layer spits out{{verify}} | |||
Without the detailed history of why this name was adopted for this specific use, | |||
this is actually a confusing abuse of the original term. | |||
https://stackoverflow.com/questions/41455101/what-is-the-meaning-of-the-word-logits-in-tensorflow/52111173#52111173 | |||
https://en.wikipedia.org/wiki/Logit | https://en.wikipedia.org/wiki/Logit | ||
===On sigmoid functions=== | ===On sigmoid functions=== | ||
Line 123: | Line 127: | ||
In various contexts (e.g. activation functions in modeled neurons, population growth modeling) it often refers to the [[logistic function]], or sometimes [[tanh]], but can be various others. | In various contexts (e.g. activation functions in modeled neurons, population growth modeling) it often refers to the [[logistic function]], | ||
or sometimes [[tanh]], but can be various others. | |||
<!-- | <!-- | ||
Generally useful as | Generally useful as | ||
: population growth modeling | : population growth modeling | ||
Line 132: | Line 139: | ||
: activation functions (see e.g. sigmoid neuron) | : activation functions (see e.g. sigmoid neuron) | ||
When you want to turn things like dot products or cosine metrics (or other things that may center on 0) | |||
to probabilities -- but there are justifications for many others, depending on what you're doing | |||
(e.g. [[softmax]]) | |||
--> | --> |
Latest revision as of 23:13, 21 April 2024
Quick and dirty
While coding I regularly want a quick weighing or distortion of data roughly the right amount - without spending hours into what it really should be in theory.
on fractions in (0,1)
roots and powers - from a straight line, we squeeze 0..1 to one side or other
Weighing low/high values (hard)
Weighing center values (hard)
Weighing center values (softer)
- sqrt(1-abs(1-2x))
- sin(pi*x)
- sin(pi*x) raised to a power (lower or higher than 1), e.g.
- many others
Smoothstep (~= sigmoids in 0..1)
on things in (0..large number)
(these are somewhat )
- sqrt(x) (...or sqrt(x)/turning_point)
- turning_point: scale down everything so that this value is 1.0
- where turning point is an interesting threshold (if it makes sense in the given scale, perhaps based on an average if in somewhat arbitrary scale)
- logs (vaguely like sqrt, but larger values have less effect):
- since log(0)=-inf and is negative up to x=1, we want to add something to that input:
- log(base+x) (e.g. log_e(e+x), log_10(10+x))
- at x=0, output is 1, with slow, logarithmic curve above that (the steepest initial bit is hidden in x<0; you could play with offsets to get more of it)
- log(base+x)-1
- same curve as the previous one, but at x=0, output is 0
- you could rescale it
On things around 0 (mostly but not necessarily in -1..1)
Sigmoids are a wide family, anything s-shaped
Logistic function
tanh (the hyperbolic tangent)
Some notes
Logistic function
The logistic function, a.k.a. logistic curve, is defined by:
f(x) = L / ( 1+e-k*(x-x0) )
where
- x0 is the x position of the midpoint
- L = the magnitude (often 1)
- k = logistic growth rate - the steepness of the curve
(in specific applications these may have more intuitively meaningful names)
Note that in various uses it may simplify, e.g. in neural net use often to 1/(1+e-x,
and is then often called a sigmoid, apparently both to point at the shape and that it's not quite a full logistic function.
Logit
In math, logit is a function that maps probabilities in 0..1 to real numbers in -inf..inf.
It a transfer function largely useful to statistics and machine learning.
In machine learning, it's
- sometimes just as a general idea,
- sometimes used to refer to the inverse function of a (any) logistic sigmoid function(verify)
Around neural networks (an in particular in tensorflow)
- 'logit' it is also used to refer to a layer / tensor that is a raw prediction output (probably from a much denser input) before passing it to a normalization-style function like softmax.
- 'logits' is sometimes used to refer to the numbers that that layer spits out(verify)
Without the detailed history of why this name was adopted for this specific use, this is actually a confusing abuse of the original term.
https://en.wikipedia.org/wiki/Logit
On sigmoid functions
Sigmoid refers to a class of curves that are S-shaped.
In various contexts (e.g. activation functions in modeled neurons, population growth modeling) it often refers to the logistic function,
or sometimes tanh, but can be various others.