|
|
(2 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
| | {{#addbodyclass:tag_tech}} |
| | {{#addbodyclass:tag_prog}} |
| | {{#addbodyclass:tag_math}} |
| {{math notes}} | | {{math notes}} |
|
| |
|
Line 24: |
Line 27: |
|
| |
|
| ==Semi-sorted== | | ==Semi-sorted== |
| ===Mixture models===
| |
| {{stub}}
| |
| A '''mixture model''' (sometimes '''mixture distribution''') is a density model consisting of a mixture (weighed sum) of independent variables.
| |
|
| |
| This is a pretty abstract concept. It sees varying types of real-world uses.
| |
|
| |
| <!--
| |
| Note that mixture models do not necessarily refer to systems that are real-numbered throughout.
| |
| Nor are they necessarily finite - that is, in practice the amount of components are usually finite as well as fairly small.
| |
|
| |
|
| |
| GMMs {{comment|(Gaussian Mixture Models)}} seem to be one of the more common concrete applications of MMs, in which an arbitrary distribution is modelled/approximated using a collection of Guassian distributions.
| |
|
| |
|
| |
| There are numerous variations of mixture model types. For example, there is the observation that using Student's (heavier-tailed) distribution can lead to more robustness when using mixture models for clustering.
| |
|
| |
| You can also model things in more dimensions, model using different number spaces, and such, when this is useful for some reason. Arguably, the optimization/fitting algorithms are a little more important there.
| |
|
| |
|
| |
|
| |
| ====Uses====
| |
| Mixture models can be useful for various things, and for various reasons.
| |
|
| |
| They see frequent use in statistical analysis, machine learning, data mining, sometimes compression, often by approximating and clustering, using models as fuzzy predictors.
| |
|
| |
|
| |
| For example, they are handy to approximate signals that show simple lobes, particularly when you are trying to loosely but parametrically describe that information.
| |
|
| |
| The result of such an approximation in a GMM is a set of (mean,stdev) pairs that can be seen as a model of the signal and summed up to the approximation.
| |
| Just a few such pairs can loosely approximate a simple signal, a few dozen can do so with basic accuracy.
| |
|
| |
|
| |
| Consider for example the case of creating a compound country-wide distribution (of some variable) by mixing per-demographic information that you have -- or attempting to statistically approximate such per-demographic information from (sparse) information you have.
| |
|
| |
|
| |
| Mixture models (in general) are also a type of clustering and/or useful to it, as they can help identify the amount, positions, and size of clusters.
| |
|
| |
|
| |
|
| |
|
| |
|
| |
| When the distributional nature is realistic for data, (finite) mixture models are interesting for one or more of various properties and their implications, including:
| |
| * ease of dealing with a parametrized model of data
| |
| * real-time adaptability of such a model
| |
|
| |
| * its approximating nature
| |
| ** can act as a kind of dimensionality reduction
| |
| ** can lead to feature discovery
| |
|
| |
| * robustness to sparse data (like in various others learning models -- and under the assumption of smoothness of the given space)
| |
|
| |
|
| |
| Mixture models find applications in
| |
| * classification tasks
| |
| * approximating non-trivial samples/populations/distributions
| |
| * finding groups or patterns within samples/populations, estimation of parameters for them
| |
|
| |
|
| |
|
| |
|
| |
|
| |
| ====Problems====
| |
| Perhaps the main reason that mixture models are not very common is that there are a number of potential problems in the approximation.
| |
|
| |
|
| |
| =====Number of elements=====
| |
| The choice of how many elements to model with is often not very obvious.
| |
|
| |
|
| |
| It doesn't help that solutions for the data with different number choice are not necessarily comparable (parameterwise).
| |
|
| |
| For example, the solution data showing a major and minor lobe (particularly if rough-resolution discrete information, say, a ten-element histogram) with two gaussians may be decent, but with three may easily model one lobe as a mix of two, which is not parametrically comparable to the two-element solution.
| |
|
| |
| A fixed choice may not be ideal, so it can be useful to approximate
| |
|
| |
|
| |
| Methods like maximum likelyhood continuously favour a more complex model, so lead to impractical models and overfitting
| |
|
| |
|
| |
| =====Intractibility of exact inference=====
| |
| (hence the optimization algorithms)
| |
|
| |
|
| |
| =====Common approximation methods=====
| |
| For these reasons and more, optimization algorithms such as EM/GEM, Markov Chain Monte Carlo (MCMC), and Bayesian fitting seem more practical than attempting to find real solutions, although they sometimes imply a little less robustness.
| |
|
| |
| Problems include includes:
| |
| * lack of robustness to outliers. You have to know the data is simple, smooth it, or even assign an element per outlier.
| |
|
| |
|
| |
|
| |
|
| |
|
| |
| -->
| |
| ===See also===
| |
| * http://en.wikipedia.org/wiki/Gaussian_mixture_model
| |
| * http://www.csse.monash.edu.au/~dld/mixturemodel.html
| |
|
| |
| <!--
| |
|
| |
| Evolution as a designer
| |
|
| |
|
| |
| Mutation (variation) and reward
| |
|
| |
|
| --> | | --> |