Softmax: Difference between revisions

Revision as of 15:01, 13 February 2024

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

softmax (sometimes called softargmax, normalized exponential function, and other things )

takes a vector of numbers

(any scale)

returns a same-length vector of probabilities

all in 0 .. 1

that sum to 1.0

Note that it is not just normalization, nor is it just a way to bring out the strongest answer.

The exponent in its internals, plus the "will sum to 1.0 part" will mean things shift around in a non-linear way, so even relative probabilities already in in 0..1 and summing to 1.0 will change, e.g.

softmax([1.0,0.5,0.1]) ~= 0.5, 0.3, 0.2,

softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28

The name might suggest to you it is a numerically smoothed maximum. It is not.

It is much closer to argmax A smooth approximation to the arg max function: the function whose value is the index of a vector's largest element. In fact, the term "softmax" is also used for the closely related LogSumExp function, which is a smooth maximum. For this reason, some prefer the more accurate term "softargmax", but the term "softmax" is conventional in machine learning.[3][4] This section uses the term "softargmax" to emphasize this interpretation.

https://en.wikipedia.org/wiki/Softmax_function

@@ Line 11: / Line 11: @@
-Note that it is ''not'' just normalization.
+Note that it is ''not'' just normalization, nor is it just a way to bring out the strongest answer.
-Nor is it only a way to bring out the strongest answer.
 The exponent in its internals, plus the "will sum to 1.0 part" will mean things shift around in a non-linear way,
@@ Line 19: / Line 17: @@
 : softmax([1.0,0.5,0.1])   ~= 0.5, 0.3, 0.2,
 : softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28
+The name might suggest to you it is a numerically smoothed maximum. It is not.
+It is much closer to [[argmax]]
+A smooth approximation to the arg max function: the function whose value is the index of a vector's largest element. In fact, the term "softmax" is also used for the closely related LogSumExp function, which is a smooth maximum. For this reason, some prefer the more accurate term "softargmax", but the term "softmax" is conventional in machine learning.[3][4] This section uses the term "softargmax" to emphasize this interpretation.
 <!--
-If you squint, it is ''something'' like [[sigmoid]] function,
+The output has the probabilities of a probability distribution.
+If you squint, it is ''something'' like [[sigmoid]] function (because this is a generalization of the [[logistic function]])
 but it is not directly comparable to transfer functions,
 and you can't get an easy plot of it,
@@ Line 57: / Line 70: @@
 https://en.wikipedia.org/wiki/Softmax_function
+[[Category:Math on data]]

Softmax: Difference between revisions

Revision as of 15:01, 13 February 2024

Navigation menu