Softmax: Difference between revisions

From Helpful
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
Line 11: Line 11:




Note that it is ''not'' just normalization.
Note that it is ''not'' just normalization, nor is it just a way to bring out the strongest answer.
 
Nor is it only a way to bring out the strongest answer.


The exponent in its internals, plus the "will sum to 1.0 part" will mean things shift around in a non-linear way,
The exponent in its internals, plus the "will sum to 1.0 part" will mean things shift around in a non-linear way,
Line 19: Line 17:
: softmax([1.0,0.5,0.1])  ~= 0.5, 0.3, 0.2,  
: softmax([1.0,0.5,0.1])  ~= 0.5, 0.3, 0.2,  
: softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28
: softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28
The name might suggest to you it is a numerically smoothed maximum. It is not.
It is much closer to [[argmax]]
A smooth approximation to the arg max function: the function whose value is the index of a vector's largest element. In fact, the term "softmax" is also used for the closely related LogSumExp function, which is a smooth maximum. For this reason, some prefer the more accurate term "softargmax", but the term "softmax" is conventional in machine learning.[3][4] This section uses the term "softargmax" to emphasize this interpretation.




<!--
<!--
If you squint, it is ''something'' like [[sigmoid]] function,
The output has the probabilities of a probability distribution.
 
 
If you squint, it is ''something'' like [[sigmoid]] function (because this is a generalization of the [[logistic function]])
but it is not directly comparable to transfer functions,  
but it is not directly comparable to transfer functions,  
and you can't get an easy plot of it,
and you can't get an easy plot of it,
Line 57: Line 70:


https://en.wikipedia.org/wiki/Softmax_function
https://en.wikipedia.org/wiki/Softmax_function
[[Category:Math on data]]

Revision as of 15:01, 13 February 2024

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

softmax (sometimes called softargmax, normalized exponential function, and other things )

  • takes a vector of numbers
(any scale)
  • returns a same-length vector of probabilities
all in 0 .. 1
that sum to 1.0


Note that it is not just normalization, nor is it just a way to bring out the strongest answer.

The exponent in its internals, plus the "will sum to 1.0 part" will mean things shift around in a non-linear way, so even relative probabilities already in in 0..1 and summing to 1.0 will change, e.g.

softmax([1.0,0.5,0.1]) ~= 0.5, 0.3, 0.2,
softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28




The name might suggest to you it is a numerically smoothed maximum. It is not.

It is much closer to argmax A smooth approximation to the arg max function: the function whose value is the index of a vector's largest element. In fact, the term "softmax" is also used for the closely related LogSumExp function, which is a smooth maximum. For this reason, some prefer the more accurate term "softargmax", but the term "softmax" is conventional in machine learning.[3][4] This section uses the term "softargmax" to emphasize this interpretation.




https://en.wikipedia.org/wiki/Softmax_function