Softmax: Difference between revisions
Jump to navigation
Jump to search
mNo edit summary |
mNo edit summary |
||
Line 1: | Line 1: | ||
{{stub}} | {{stub}} | ||
'''softmax''' (sometimes called softargmax, normalized exponential function, and other things ) | |||
softmax ( | |||
* takes a vector of numbers | * takes a vector of numbers | ||
* | * returns a vector of probabilities | ||
:: all in 0 .. 1 | :: all in 0 .. 1 | ||
:: that sum to 1.0 | :: that sum to 1.0 | ||
Note that it is ''not'' just normalization. | |||
Nor is it only a way to bring out the strongest answer. | |||
The exponent in its internals, plus the "will sum to 1.0 part" will mean things shift around in a non-linear way, | |||
so even relative probabilities already in in 0..1 and summing to 1.0 will change, e.g. | so even relative probabilities already in in 0..1 and summing to 1.0 will change, e.g. | ||
: softmax([1.0,0.5,0.1]) ~= 0.5, 0.3, 0.2, | : softmax([1.0,0.5,0.1]) ~= 0.5, 0.3, 0.2, | ||
: softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28 | : softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28 | ||
<!-- | |||
If you squint, it is ''something'' like [[sigmoid]] function, | |||
but it is not directly comparable to transfer functions, | |||
and you can't get an easy plot of it, | |||
exactly ''because'' it takes multiple inputs. | |||
It is a more generic mathematical tool, | |||
historically seen a bunch in machine learning, | |||
and these days many references are its use in neural nets. | |||
In that context they will | |||
take activation on any sort, | |||
and put them into 0..1 scale sensibly, | |||
mostly as a normalization step that is often used at least in the final layer, | |||
and sometimes at the end of smaller building blocks as well. | |||
When using nets as multiclass classifiers, you would need ''something'' like softmax to be able to respond on all the labels, | |||
and in a way that looks like probabilities. | |||
In part it's just a choice of what you want to show (you could output classification margin scores instead), | |||
in part it's a choice that | |||
Line 48: | Line 53: | ||
--> | --> | ||
https://en.wikipedia.org/wiki/Softmax_function |
Revision as of 14:48, 13 February 2024
✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
softmax (sometimes called softargmax, normalized exponential function, and other things )
- takes a vector of numbers
- returns a vector of probabilities
- all in 0 .. 1
- that sum to 1.0
Note that it is not just normalization.
Nor is it only a way to bring out the strongest answer.
The exponent in its internals, plus the "will sum to 1.0 part" will mean things shift around in a non-linear way, so even relative probabilities already in in 0..1 and summing to 1.0 will change, e.g.
- softmax([1.0,0.5,0.1]) ~= 0.5, 0.3, 0.2,
- softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28