Softmax: Difference between revisions
mNo edit summary |
mNo edit summary |
||
Line 11: | Line 11: | ||
Note that it is ''not'' just normalization | Note that it is ''not'' just normalization, nor is it just a way to bring out the strongest answer. | ||
The exponent in its internals, plus the "will sum to 1.0 part" will mean things shift around in a non-linear way, | The exponent in its internals, plus the "will sum to 1.0 part" will mean things shift around in a non-linear way, | ||
Line 19: | Line 17: | ||
: softmax([1.0,0.5,0.1]) ~= 0.5, 0.3, 0.2, | : softmax([1.0,0.5,0.1]) ~= 0.5, 0.3, 0.2, | ||
: softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28 | : softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28 | ||
The name might suggest to you it is a numerically smoothed maximum. It is not. | |||
It is much closer to [[argmax]] | |||
A smooth approximation to the arg max function: the function whose value is the index of a vector's largest element. In fact, the term "softmax" is also used for the closely related LogSumExp function, which is a smooth maximum. For this reason, some prefer the more accurate term "softargmax", but the term "softmax" is conventional in machine learning.[3][4] This section uses the term "softargmax" to emphasize this interpretation. | |||
<!-- | <!-- | ||
If you squint, it is ''something'' like [[sigmoid]] function | The output has the probabilities of a probability distribution. | ||
If you squint, it is ''something'' like [[sigmoid]] function (because this is a generalization of the [[logistic function]]) | |||
but it is not directly comparable to transfer functions, | but it is not directly comparable to transfer functions, | ||
and you can't get an easy plot of it, | and you can't get an easy plot of it, | ||
Line 57: | Line 70: | ||
https://en.wikipedia.org/wiki/Softmax_function | https://en.wikipedia.org/wiki/Softmax_function | ||
[[Category:Math on data]] |
Revision as of 15:01, 13 February 2024
softmax (sometimes called softargmax, normalized exponential function, and other things )
- takes a vector of numbers
- (any scale)
- returns a same-length vector of probabilities
- all in 0 .. 1
- that sum to 1.0
Note that it is not just normalization, nor is it just a way to bring out the strongest answer.
The exponent in its internals, plus the "will sum to 1.0 part" will mean things shift around in a non-linear way, so even relative probabilities already in in 0..1 and summing to 1.0 will change, e.g.
- softmax([1.0,0.5,0.1]) ~= 0.5, 0.3, 0.2,
- softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28
The name might suggest to you it is a numerically smoothed maximum. It is not.
It is much closer to argmax A smooth approximation to the arg max function: the function whose value is the index of a vector's largest element. In fact, the term "softmax" is also used for the closely related LogSumExp function, which is a smooth maximum. For this reason, some prefer the more accurate term "softargmax", but the term "softmax" is conventional in machine learning.[3][4] This section uses the term "softargmax" to emphasize this interpretation.