Statistics notes - on random variables, distributions, probability

⚠ This is early stages and needs a LOT of cleanup

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Dependent versus independent

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

In statistical experiments, the independent variable is the thing you actively control and change,

...hoping to see a related change in one or more dependent variables - that you hope are dependent.

In mathematics you get to define all the parts, so this more often works out as independent=input and dependent=output, while in statistics this isn't always so separated.

And in some exploratory methods, you just start by looking at covariance is there at all, and worry about the how/why of relations later, or hard assume it before you even got started.

Some people like e.g. 'control variable' 'explanatory variable' for independent, and 'response' for dependent, as more intuitive terms.

In fact there are many such synonyms, most of them associated with specific fields.

Independent variables in specific contexts may also be known as:

explanatory variable (regression analysis)
manipulated variable
predictor variable
control(led) variable (econometrics)
regressor (regression analysis)
exposure variable (reliability theory)
feature (machine learning, pattern recognition)
condition (in behaviour experiment design, when you present things based on a condition) (verify)
risk factor (medical statistics)
input variable
covariate (hypothesized predictor, in statistics)

Dependent variables are also known as

response variable
predicated variable
measured variable
explained variable
regressand (regression analysis)
experimental variable
responding variable
outcome variable
output variable

https://en.wikipedia.org/wiki/Dependent_and_independent_variables#Statistics_synonyms

In statistics, there is the concept of a covariate.

This is also yet another near-synonym, because intuitively, a covariate is often ment as a hypothesized independent variable - though it may turn out they are better described as a confounding variable or, y'know, as not particularly related at all.

Confounding variable

There are many things in the real world that aren't one-way predictors of other things.

This leads to many variables being what you could call interacting variable to point out they interact and/or correlate with many things.

You can also call them confounding variable, or confounders, to point out that such correlations really mess with analysis that wants to conclude one thing causes another.

See also omitted-variable bias (leaving out an important factor).

https://en.wikipedia.org/wiki/Confounding

https://en.wikipedia.org/wiki/Covariate

Random variables, distributions

Probability function(/distribution)

Probability mass function

discrete

A discrete probability distribution, a.k.a. a probability mass function (pmf), is discrete in that it is essentially a list of possible values and their according probabilities.

Note that a histogram is a very similar idea. Pedantically, the pmf is ideally the actual underlying distribution, the histogram is more typically an often-somewhat imprecise empirical sampled description of that pmf.

(Also, a histogram can describe continuous data - a pdf, by bucketing)

https://stats.stackexchange.com/questions/375510/histogram-and-probability-mass-function

Probability density functions

continuous

Cumulative distribution function

Expected value

A few distributions

https://link.springer.com/chapter/10.1007/978-1-4939-6572-4_3

Binomial distribution

discrete

Models the output of something with two outcomes (with a simple fraction probability dividing them), done a number of times in a row.

Note that

since such a two-outcome event can be called a Bernoulli random variable, you can say that a binomial distribution is the sum of many independent such outcomes

more precisely, that a binomial random variable is the sum of independent (but identically distributed) Bernoulli random variables. (verify)

there's also Bernoulli distributions.

all Bernoulli distributions are binomial distributions, though most binomial distributions are not Bernoulli distributions.

gaussian and Poisson (and some others) can be extrapolated from binomial

https://en.wikipedia.org/wiki/Binomial_distribution

https://math.stackexchange.com/questions/838107/what-is-the-difference-and-relationship-between-the-binomial-and-bernoulli-distr

Bernoulli distribution

discrete

The benoilli distribution is the number of successes in independent yes-no experiments.

Basically binomial with

https://en.wikipedia.org/wiki/Bernoulli_distribution

Uniform Distribution

Discrete[1] and continuous[2] variants are both useful

While this one often falls away into 'well duh', it can also be seen as a more complex case of binomial.

For example, a single fair dice throw is n cases of 1/n probability (where n is the amount of faces) should converge on uniform.

Note that the distribution of combined dice rolls starts resembling a gaussian.

Exponential Distribution

continuous (note that the Geometric distribution is a very similar discrete distribution)

https://en.wikipedia.org/wiki/Exponential_distribution

Multinomial distribution

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Generalization of binomial distribition, easiest to introduce as such:

where binomial may let you model the amount of successes for a coin (boolean outcomes) flipped n times...

...multinomial lets you model the probability of counts for each side of a k-sided die rolled n times.

if k is 2 and n is 1, it is the Bernoulli distribution (verify)

if k is 2 and n is >1, it is the binomial distribution

if k >2 and n is 1, it is the categorical distribution

Poisson distribution

discrete

Models the probability of a given number of events occurring, in a fixed interval of time or space, based on a known mean rate, independently of time since last even.

Used mainly if the events are discrete, and relatively sparse, where the distribution sits against zero and noticeably isn't very symmetric.

Note that when not so sparse, the shape much resembles a normal distribution. In part just because of the CLT, though, and it may still make more sense to model it as Poisson for a few usually-minor reasons (normal is continuous).

Poisson

tendency of rates: idealized distribution for counts of rare events
support: non-negative integers
often related to event counting
useful to model likeliness of something happening in an interval -- that considers interval length, and the rarity of the events
for high rates, it starts to look more gaussian

Examples:

Size of groups coming into restaurants

Poisson -- will be mostly two and one, sometimes three, more is fairly rare. Shape is not symmatric, and we want to model the rarity.

https://en.wikipedia.org/wiki/Poisson_distribution

Gaussian/Normal distribution

continuous

Gaussian distribution, a.k.a. Normal distribution

Gaussian

many independent identical events
where things have a tendency to the average
support: real numbers

Note that in various cases, more than one distribution can fit the data's nature (even when there is no real relation between the distributions. For example, for non-trivial counts both poisson and gaussian may work, which are quite different).

Examples:

height of body

Bell curve refers to any distribution of this sort of shape, but also frequently to the more specifically shaped gaussian.

Bell-shaped curves are everywhere, for a number of reasons.

Not least of which is that the combination of varying distributions converges on a gaussian/bell curve -- see the central limit theorem[3]). This also means that if you don't know a distribution and have no reason to suspect a specific type, gaussian a decent go-to to get you started somewhere.

https://math.stackexchange.com/questions/2379271/why-do-bell-curves-appear-everywhere/2379472

https://en.wikipedia.org/wiki/Normal_distribution

t distribution

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

a t-distribution is an approximation of a variable based on a small sample size.

(also known as Student's t-test, because its originator, William Gosset, was not allowed to use his own name by Guiness, his employer, so he used Student as a name)

A t-distribution is strongly related to a normal/gaussian, and in fact as the degrees of freedom increase, the more the t-distribution tends towards a regular normal distribution.

We use a t-distribution when we know we probably don't have enough samples to assume a gaussian distribution, and still want to do hypothesis testing, with comparable output scale (verify) to other tests.

You can arguably see the t-distriution can be seen as generalizing its data a little - its thicker tails make the probability density more spread, which also means its its peak is lower and narrower.

In part, you acknowledge that when sampling from unknown distributions

the sample mean is probably not the same as the population mean

your estimation of the variance would probably be too high

...without those estimates being structurally too different from things like z-scores.

Actually, after about ~30 df, the t-statistic is basically the same as the z-statistic (from the normal distribution), because you're now starting to get enough samples, and above 50 you might as well just go normal.

t-distribution variance is S²*k / k-2

which, since k/(k-2) will be slightly >1

is part of what it makes it wider

is what makes it nearer to normal for higher (which is just S² (verify))

Logistic distribution

beta distribution

continuous

Defined over [0,1]

Dirichlet distribution

continuous

Multivariate

Generalization of the beta distribution to multiple variables(verify), which is why it is also sometimes called multivariate beta distribution.

https://en.wikipedia.org/wiki/Dirichlet_distribution

Probability

Intro

Combinatorics and related probabilities

Between permutation and subset combination, I always confuse how to calculate which, so this is here to remind me.

Permutations

Permutations of a whole set are the (amount of) unique-choice lists of all its elements.

Non-overlapping choices, in which
the order matters

Example: The permutations of characters a,b,c are:

abc
acb
bac
bca
cab
cba

There are n! permutations of n elements, because choices depend on earlier ones.

For the abc example: there are 3 choices for the first letter, (for each of these choices there are) two left for the second, and one for the last.

All permutations of fewer elements than the whole set is also commonly needed.

Like basic permutation, but only the first so-many choices apply. In a way, only first few terms in the would-be-factorial apply, as you stop choosing after a while.

For example: "get all ways to order three pictures next to each other on a wall - any three from from six pictures"

6*5*4 - six options for the first, five for the second, four for the third.

The more general notation is ⁿP_c

And its value is ⁿP_c = n!/(n-c)!

...so the pictures example would be

⁶P₃ = 6!/(6-3)! = 6!/3! = 120

If it helps: (6*5*4*3*2*1)/(3*2*1)

All permutations of all equal or smaller lengths is also sometimes useful.

For example, given four words, all unique sentences you can make of length one, two, three and four.

Rotations

When calculating "The amount of ways to seat people around a circular table," you likely care only who is sitting next to each other, so have no obvious starting point, so in these cases there are n solutions (where n is the amount of chairs) that are not discernable (everyone shifts one seat, but you decide you consider it the same).

So same logic as above, but you divide by n.

Mirrors

You may also consider an exact mirror to be the same, e.g. abcde as equivalent to edcba.

Again, when seating people, the same people would be adjacent.

Half of the results are equal to the other half, so divide by two.

Combinations

Combinations of a subset finds the (amount of) possible sets which choose c unique elements from a set of n.

Non-overlapping choices, in which
order doesn't matter

Notation for the amount: ⁿC_c
regularly spoken out as "n choose c"
ⁿC_c = n!/c!(n-c)!

Example: of the letters a b c d, all combinations of three elements: (there are are ⁴C₃ = 4!/3!1! = 4)

abc
abd
acd
bcd

Applies to questions such as "choose two different flavours for your ice cream cone, out of eight flavours", because you don't care about the order the scoops are stacked.

⁸C₂ = 8!/(8-2)!2! = 8!/6!2! = 28

Statistics notes - on random variables, distributions, probability

Contents

Dependent versus independent

Confounding variable

Random variables, distributions

Probability function(/distribution)

Probability mass function

Probability density functions

Cumulative distribution function

Expected value

A few distributions

Binomial distribution

Bernoulli distribution

Uniform Distribution

Exponential Distribution

Multinomial distribution

Poisson distribution

Gaussian/Normal distribution

t distribution

Logistic distribution

beta distribution

Dirichlet distribution

Probability

Intro

Combinatorics and related probabilities

Permutations

Rotations

Mirrors

Combinations

...and its introduction to...

Dependent and independent events

Navigation menu