Softmax: Difference between revisions

From Helpful
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
Line 1: Line 1:
 
{{stub}}


<!--
<!--
softmax (a.k.a. softargmax, normalized exponential function)
softmax (a.k.a. softargmax, normalized exponential function)


* takes a vector of numbers
* takes a vector of numbers
* provides a vector of probabilities  
* provides a vector of probabilities  
:: all in 0..1
:: all in 0 .. 1
:: and sum to 1.0
:: that sum to 1.0




Many reference you'll find ''now'' are its use in neural nets, where they take activation on any sort, and put them into 0..1 scale sensibly,
Many references you will find ''now'' are its use in neural nets,
in what is often a final layer (in a functional block, or overall).
where they take activation on any sort, and put them into 0..1 scale sensibly,
as a normalization step that is often used at least in the final layer, and sometimes at the end of smaller building blocks as well.




While the exponent makes it look like some choices of sigmoid functions,
When using nets as multiclass classifiers, you would need ''something'' like softmax to be able to respond on all the labels,
and in a way that looks like probabilities. 
In part it's just a choice of what you want to show (you could output classification margin scores instead),  
in part it's a choice that


And it isn't directly comparable to transfer functions, and you can't get an easy graph of it, exactly ''because'' it takes multiple inputs.


Note that it is ''not'' just normalization.


But also, it's a more general mathematical tool, even if it's mostly seen in machine learning.
Nor is just a way to bring out the strongest answer.
Both its exponent internals and the "will sum to 1.0 part" will mean things shift around in a non-linear way,
so even relative probabilities already in in 0..1 and summing to 1.0 will change, e.g.
: softmax([1.0,0.5,0.1])  ~= 0.5, 0.3, 0.2,
: softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28






While the exponent makes it look like some choices of sigmoid functions,


It is ''not'' just normalization.
And it isn't directly comparable to transfer functions, and you can't get an easy graph of it, exactly ''because'' it takes multiple inputs.
 
Nor is just a way to bring out the strongest answer.
Both its exponent internals and the "will sum to 1.0 part" will mean things shift around, even if you feed it probabilities in 0..1 - e.g. softmax([1.0,0.5,0.1]) ~= 0.5, 0.3, 0.2, and even if they already sum to 1.0, e.g. softmax([0.5, 0.3, 0.2]) ~= 0.4, 0.31, 0.28
 
 
When using nets as multiclass classifiers, you would need something like softmax to be able to respond on all the labels, and in a way that looks like probabilities. 
In part it's just a choice of what you want to show (you could output classification margin scores instead),  
in part it's a choice that
 




But also, it's a more general mathematical tool, even if it's mostly seen in machine learning.





Revision as of 15:22, 28 November 2023

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.