Softmax: Difference between revisions

Revision as of 14:48, 13 February 2024

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

softmax (sometimes called softargmax, normalized exponential function, and other things )

takes a vector of numbers

returns a vector of probabilities

all in 0 .. 1

that sum to 1.0

Note that it is not just normalization.

Nor is it only a way to bring out the strongest answer.

The exponent in its internals, plus the "will sum to 1.0 part" will mean things shift around in a non-linear way, so even relative probabilities already in in 0..1 and summing to 1.0 will change, e.g.

softmax([1.0,0.5,0.1]) ~= 0.5, 0.3, 0.2,

softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28

https://en.wikipedia.org/wiki/Softmax_function

@@ Line 1: / Line 1: @@
 {{stub}}
-<!--
+'''softmax''' (sometimes called softargmax, normalized exponential function, and other things )
-softmax (a.k.a. softargmax, normalized exponential function)
 * takes a vector of numbers
-* provides a vector of probabilities
+* returns a vector of probabilities
 :: all in 0 .. 1
 :: that sum to 1.0
-Many references you will find ''now'' are its use in neural nets,
-where they take activation on any sort, and put them into 0..1 scale sensibly,
-as a normalization step that is often used at least in the final layer, and sometimes at the end of smaller building blocks as well.
+Note that it is ''not'' just normalization.
-When using nets as multiclass classifiers, you would need ''something'' like softmax to be able to respond on all the labels,
+Nor is it only a way to bring out the strongest answer.
-and in a way that looks like probabilities.
-In part it's just a choice of what you want to show (you could output classification margin scores instead),
-in part it's a choice that
-Note that it is ''not'' just normalization.
-Nor is just a way to bring out the strongest answer.
+The exponent in its internals, plus the "will sum to 1.0 part" will mean things shift around in a non-linear way,
-Both its exponent internals and the "will sum to 1.0 part" will mean things shift around in a non-linear way,
 so even relative probabilities already in in 0..1 and summing to 1.0 will change, e.g.
 : softmax([1.0,0.5,0.1])   ~= 0.5, 0.3, 0.2,
 : softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28
+<!--
+If you squint, it is ''something'' like [[sigmoid]] function,
+but it is not directly comparable to transfer functions,
+and you can't get an easy plot of it,
+exactly ''because'' it takes multiple inputs.
-While the exponent makes it look like some choices of sigmoid functions,
+It is a more generic mathematical tool,
+historically seen a bunch in machine learning,
+and these days many references are its use in neural nets.
-And it isn't directly comparable to transfer functions, and you can't get an easy graph of it, exactly ''because'' it takes multiple inputs.
+In that context they will
+take activation on any sort,
+and put them into 0..1 scale sensibly,
+mostly as a normalization step that is often used at least in the final layer,
+and sometimes at the end of smaller building blocks as well.
-But also, it's a more general mathematical tool, even if it's mostly seen in machine learning.
+When using nets as multiclass classifiers, you would need ''something'' like softmax to be able to respond on all the labels,
+and in a way that looks like probabilities.
+In part it's just a choice of what you want to show (you could output classification margin scores instead),
+in part it's a choice that
@@ Line 48: / Line 53: @@
 -->
+https://en.wikipedia.org/wiki/Softmax_function

Softmax: Difference between revisions

Revision as of 14:48, 13 February 2024

Navigation menu