Softmax: Difference between revisions

Revision as of 15:22, 28 November 2023

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

@@ Line 1: / Line 1: @@
+{{stub}}
 <!--
 softmax (a.k.a. softargmax, normalized exponential function)
 * takes a vector of numbers
 * provides a vector of probabilities
-:: all in 0..1
+:: all in 0 .. 1
-:: and sum to 1.0
+:: that sum to 1.0
-Many reference you'll find ''now'' are its use in neural nets, where they take activation on any sort, and put them into 0..1 scale sensibly,
+Many references you will find ''now'' are its use in neural nets,
-in what is often a final layer (in a functional block, or overall).
+where they take activation on any sort, and put them into 0..1 scale sensibly,
+as a normalization step that is often used at least in the final layer, and sometimes at the end of smaller building blocks as well.
-While the exponent makes it look like some choices of sigmoid functions,
+When using nets as multiclass classifiers, you would need ''something'' like softmax to be able to respond on all the labels,
+and in a way that looks like probabilities.
+In part it's just a choice of what you want to show (you could output classification margin scores instead),
+in part it's a choice that
-And it isn't directly comparable to transfer functions, and you can't get an easy graph of it, exactly ''because'' it takes multiple inputs.
+Note that it is ''not'' just normalization.
-But also, it's a more general mathematical tool, even if it's mostly seen in machine learning.
+Nor is just a way to bring out the strongest answer.
+Both its exponent internals and the "will sum to 1.0 part" will mean things shift around in a non-linear way,
+so even relative probabilities already in in 0..1 and summing to 1.0 will change, e.g.
+: softmax([1.0,0.5,0.1])   ~= 0.5, 0.3, 0.2,
+: softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28
+While the exponent makes it look like some choices of sigmoid functions,
-It is ''not'' just normalization.
+And it isn't directly comparable to transfer functions, and you can't get an easy graph of it, exactly ''because'' it takes multiple inputs.
-Nor is just a way to bring out the strongest answer.
-Both its exponent internals and the "will sum to 1.0 part" will mean things shift around, even if you feed it probabilities in 0..1 - e.g. softmax([1.0,0.5,0.1]) ~= 0.5, 0.3, 0.2, and even if they already sum to 1.0, e.g. softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28
-When using nets as multiclass classifiers, you would need something like softmax to be able to respond on all the labels, and in a way that looks like probabilities.
-In part it's just a choice of what you want to show (you could output classification margin scores instead),
-in part it's a choice that
+But also, it's a more general mathematical tool, even if it's mostly seen in machine learning.

Softmax: Difference between revisions

Revision as of 15:22, 28 November 2023

Navigation menu