Statistics notes - on random variables, distributions, probability
Dependent versus independent
In experiments
- the independent variable is the thing you actively control and change,
- ...hoping to see a change in one or more dependent variables that also reveals some pattern.
In mathematics you get to define all the parts, so this more often works out as independent=input and dependent=output,
In statistics this isn't always so separated,
In science the dependent variable is the thing that you will have to painstakingly measure.
In exploratory methods you may initially not care which is which -- you just start by looking at covariance is there at all, and worry about the how/why of relations later, or hard assume it before you even got started.
In part because of those variations, different fields have additional names that may be a little better for what they are doing, more intuitive,
- e.g. 'control variable' for the thing you control, 'explanatory variable' for the thing you hope explains the change;
- e.g. 'response variable' for the thing that varies in response, 'measured value' if you still had to measure it too.
There are in face many such synonyms, most of them associated with specific fields:
Independent variables in specific contexts may also be known as:
- explanatory variable (regression analysis)
- manipulated variable
- predictor variable
- control(led) variable (econometrics)
- regressor (regression analysis)
- exposure variable (reliability theory)
- feature (machine learning, pattern recognition)
- condition (in behaviour experiment design, when you present things based on a condition) (verify)
- risk factor (medical statistics)
- input variable
- covariate (hypothesized predictor, in statistics)
Dependent variables are also known as
- response variable
- predicated variable
- measured variable
- explained variable
- regressand (regression analysis)
- experimental variable
- responding variable
- outcome variable
- output variable
https://en.wikipedia.org/wiki/Dependent_and_independent_variables#Statistics_synonyms
In statistics, there is the concept of a covariate.
This is also yet another near-synonym, because intuitively, a covariate is often ment as a hypothesized independent variable - though it may turn out they are better described as a confounding variable or, y'know, as not particularly related at all.
Confounding variable
There are many things in the real world that aren't one-way predictors of other things.
This leads to many variables being what you could call interacting variable to point out they interact and/or correlate with many things.
You can also call them confounding variable, or confounders, to point out that such correlations really mess with
analysis that wants to conclude one thing causes another.
See also omitted-variable bias (leaving out an important factor).
https://en.wikipedia.org/wiki/Confounding
https://en.wikipedia.org/wiki/Covariate
Random variables, distributions
Probability function(/distribution)
Probability mass function
discrete
A discrete probability distribution, a.k.a. a probability mass function (pmf), is discrete in that it is essentially a list of possible values and their according probabilities.
Note that a histogram is a very similar idea.
Pedantically, the pmf is ideally the actual underlying distribution,
the histogram is more typically an often-somewhat imprecise empirical sampled description of that pmf.
(Also, a histogram can describe continuous data - a pdf, by bucketing)
https://stats.stackexchange.com/questions/375510/histogram-and-probability-mass-function
Probability density functions
continuous
Cumulative distribution function
Expected value
A few distributions
https://link.springer.com/chapter/10.1007/978-1-4939-6572-4_3
Binomial distribution
discrete
Models the output of something with two outcomes (with a simple fraction probability dividing them), done a number of times in a row.
Note that
- since such a two-outcome event can be called a Bernoulli random variable, you can say that a binomial distribution is the sum of many independent such outcomes
- more precisely, that a binomial random variable is the sum of independent (but identically distributed) Bernoulli random variables. (verify)
- there's also Bernoulli distributions.
- all Bernoulli distributions are binomial distributions, though most binomial distributions are not Bernoulli distributions.
- gaussian and Poisson (and some others) can be extrapolated from binomial
https://en.wikipedia.org/wiki/Binomial_distribution
Bernoulli distribution
discrete
The benoilli distribution is the number of successes in independent yes-no experiments.
Basically binomial with
https://en.wikipedia.org/wiki/Bernoulli_distribution
Uniform Distribution
Discrete[1] and continuous[2] variants are both useful
While this one often falls away into 'well duh',
it can also be seen as a more complex case of binomial.
For example, a single fair dice throw is n cases of 1/n probability (where n is the amount of faces) should converge on uniform.
Note that the distribution of combined dice rolls starts resembling a gaussian.
Exponential Distribution
continuous (note that the Geometric distribution is a very similar discrete distribution)
https://en.wikipedia.org/wiki/Exponential_distribution
Multinomial distribution
Generalization of binomial distribition, easiest to introduce as such:
- where binomial may let you model the amount of successes for a coin (boolean outcomes) flipped n times...
- ...multinomial lets you model the probability of counts for each side of a k-sided die rolled n times.
- if k is 2 and n is 1, it is the Bernoulli distribution (verify)
- if k is 2 and n is >1, it is the binomial distribution
- if k >2 and n is 1, it is the categorical distribution
Poisson distribution
discrete
Models the probability of a given number of events occurring, in a fixed interval of time or space, based on a known mean rate, independently of time since last even.
Used mainly if the events are discrete, and relatively sparse, where the distribution sits against zero and noticeably isn't very symmetric.
Note that when not so sparse, the shape much resembles a normal distribution. In part just because of the CLT, though, and it may still make more sense to model it as Poisson for a few usually-minor reasons (normal is continuous).
Poisson
- tendency of rates: idealized distribution for counts of rare events
- support: non-negative integers
- often related to event counting
- useful to model likeliness of something happening in an interval -- that considers interval length, and the rarity of the events
- for high rates, it starts to look more gaussian
Examples:
- Size of groups coming into restaurants
- Poisson -- will be mostly two and one, sometimes three, more is fairly rare. Shape is not symmatric, and we want to model the rarity.
https://en.wikipedia.org/wiki/Poisson_distribution
Gaussian/Normal distribution
continuous
Gaussian distribution, a.k.a. Normal distribution
Gaussian
- many independent identical events
- where things have a tendency to the average
- support: real numbers
Note that in various cases, more than one distribution can fit the data's nature (even when there is no real relation between the distributions. For example, for non-trivial counts both poisson and gaussian may work, which are quite different).
Examples:
- height of body
Bell curve refers to any distribution of this sort of shape, but also frequently to the more specifically shaped gaussian.
Bell-shaped curves are everywhere, for a number of reasons.
Not least of which is that the combination of varying distributions converges on a gaussian/bell curve -- see the central limit theorem[3]). This also means that if you don't know a distribution and have no reason to suspect a specific type, gaussian a decent go-to to get you started somewhere.
https://math.stackexchange.com/questions/2379271/why-do-bell-curves-appear-everywhere/2379472
https://en.wikipedia.org/wiki/Normal_distribution
t distribution
a t-distribution is an approximation of a variable based on a small sample size.
(also known as Student's t-test, because its originator, William Gosset, was not allowed to use his own name by Guiness, his employer, so he used Student as a name)
A t-distribution is strongly related to a normal/gaussian,
and in fact as the degrees of freedom increase,
the more the t-distribution tends towards a regular normal distribution.
We use a t-distribution when we know we probably don't have enough samples to assume a gaussian distribution, and still want to do hypothesis testing, with comparable output scale (verify) to other tests.
You can arguably see the t-distriution can be seen as generalizing its data a little - its thicker tails make the probability density more spread, which also means its its peak is lower and narrower.
In part, you acknowledge that when sampling from unknown distributions
- the sample mean is probably not the same as the population mean
- your estimation of the variance would probably be too high
...without those estimates being structurally too different from things like z-scores.
Actually, after about ~30 df, the t-statistic is basically the same as the z-statistic (from the normal distribution), because you're now starting to get enough samples, and above 50 you might as well just go normal.
t-distribution variance is S2*k / k-2
- which, since k/(k-2) will be slightly >1
- is part of what it makes it wider
- is what makes it nearer to normal for higher (which is just S2 (verify))
Logistic distribution
beta distribution
continuous
Defined over [0,1]
Dirichlet distribution
continuous
Multivariate
Generalization of the beta distribution to multiple variables(verify), which is why it is also sometimes called multivariate beta distribution.
https://en.wikipedia.org/wiki/Dirichlet_distribution
Probability
Intro
Between permutation and subset combination, I always confuse how to calculate which, so this is here to remind me.
Permutations
Permutations of a whole set are the (amount of) unique-choice lists of all its elements.
- Non-overlapping choices, in which
- the order matters
Example: The permutations of characters a,b,c are:
abc acb bac bca cab cba
There are n! permutations of n elements, because choices depend on earlier ones.
For the abc example: there are 3 choices for the first letter, (for each of these choices there are) two left for the second, and one for the last.
All permutations of fewer elements than the whole set is also commonly needed.
Like basic permutation, but only the first so-many choices apply. In a way, only first few terms in the would-be-factorial apply, as you stop choosing after a while.
For example: "get all ways to order three pictures next to each other on a wall - any three from from six pictures"
- 6*5*4 - six options for the first, five for the second, four for the third.
The more general notation is nPc
And its value is nPc = n!/(n-c)!
...so the pictures example would be
- 6P3 = 6!/(6-3)! = 6!/3! = 120
- If it helps: (6*5*4*3*2*1)/(3*2*1)
All permutations of all equal or smaller lengths is also sometimes useful.
For example, given four words, all unique sentences you can make of length one, two, three and four.
Rotations
When calculating "The amount of ways to seat people around a circular table," you likely care only who is sitting next to each other, so have no obvious starting point, so in these cases there are n solutions (where n is the amount of chairs) that are not discernable (everyone shifts one seat, but you decide you consider it the same).
So same logic as above, but you divide by n.
Mirrors
You may also consider an exact mirror to be the same, e.g. abcde as equivalent to edcba.
Again, when seating people, the same people would be adjacent.
Half of the results are equal to the other half, so divide by two.
Combinations
Combinations of a subset finds the (amount of) possible sets which choose c unique elements from a set of n.
- Non-overlapping choices, in which
- order doesn't matter
- Notation for the amount: nCc
- regularly spoken out as "n choose c"
- nCc = n!/c!(n-c)!
- Example: of the letters a b c d, all combinations of three elements: (there are are 4C3 = 4!/3!1! = 4)
abc abd acd bcd
Applies to questions such as "choose two different flavours for your ice cream cone, out of eight flavours", because you don't care about the order the scoops are stacked.
8C2 = 8!/(8-2)!2! = 8!/6!2! = 28