Statistics notes - on random variables, distributions, probability

From Helpful
(Redirected from Logistic distribution)
Jump to navigation Jump to search
⚠ This is early stages and needs a LOT of cleanup
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Dependent versus independent

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

In statistical experiments, the independent variable is the thing you actively control and change,


...hoping to see a related change in one or more dependent variables - that you hope are dependent.


In mathematics you get to define all the parts, so this more often works out as independent=input and dependent=output, while in statistics this isn't always so separated.

And in some exploratory methods, you just start by looking at covariance is there at all, and worry about the how/why of relations later, or hard assume it before you even got started.


Some people like e.g. 'control variable' 'explanatory variable' for independent, and 'response' for dependent, as more intuitive terms.

In fact there are many such synonyms, most of them associated with specific fields.

Independent variables in specific contexts may also be known as:

  • explanatory variable (regression analysis)
  • manipulated variable
  • predictor variable
  • control(led) variable (econometrics)
  • regressor (regression analysis)
  • exposure variable (reliability theory)
  • feature (machine learning, pattern recognition)
  • condition (in behaviour experiment design, when you present things based on a condition) (verify)
  • risk factor (medical statistics)
  • input variable
  • covariate (hypothesized predictor, in statistics)


Dependent variables are also known as

  • response variable
  • predicated variable
  • measured variable
  • explained variable
  • regressand (regression analysis)
  • experimental variable
  • responding variable
  • outcome variable
  • output variable

https://en.wikipedia.org/wiki/Dependent_and_independent_variables#Statistics_synonyms


In statistics, there is the concept of a covariate.

This is also yet another near-synonym, because intuitively, a covariate is often ment as a hypothesized independent variable - though it may turn out they are better described as a confounding variable or, y'know, as not particularly related at all.


Confounding variable

There are many things in the real world that aren't one-way predictors of other things.


This leads to many variables being what you could call interacting variable to point out they interact and/or correlate with many things.


You can also call them confounding variable, or confounders, to point out that such correlations really mess with analysis that wants to conclude one thing causes another.


See also omitted-variable bias (leaving out an important factor).


https://en.wikipedia.org/wiki/Confounding


https://en.wikipedia.org/wiki/Covariate

Random variables, distributions

Probability function(/distribution)

Probability mass function

discrete

A discrete probability distribution, a.k.a. a probability mass function (pmf), is discrete in that it is essentially a list of possible values and their according probabilities.


Note that a histogram is a very similar idea. Pedantically, the pmf is ideally the actual underlying distribution, the histogram is more typically an often-somewhat imprecise empirical sampled description of that pmf.

(Also, a histogram can describe continuous data - a pdf, by bucketing)



https://stats.stackexchange.com/questions/375510/histogram-and-probability-mass-function

Probability density functions

continuous

Cumulative distribution function

Expected value

A few distributions

https://link.springer.com/chapter/10.1007/978-1-4939-6572-4_3


Binomial distribution

discrete


Models the output of something with two outcomes (with a simple fraction probability dividing them), done a number of times in a row.


Note that

  • since such a two-outcome event can be called a Bernoulli random variable, you can say that a binomial distribution is the sum of many independent such outcomes
more precisely, that a binomial random variable is the sum of independent (but identically distributed) Bernoulli random variables. (verify)


  • there's also Bernoulli distributions.
all Bernoulli distributions are binomial distributions, though most binomial distributions are not Bernoulli distributions.
  • gaussian and Poisson (and some others) can be extrapolated from binomial


https://en.wikipedia.org/wiki/Binomial_distribution

https://math.stackexchange.com/questions/838107/what-is-the-difference-and-relationship-between-the-binomial-and-bernoulli-distr


Bernoulli distribution

discrete

The benoilli distribution is the number of successes in independent yes-no experiments.

Basically binomial with


https://en.wikipedia.org/wiki/Bernoulli_distribution


Uniform Distribution

Discrete[1] and continuous[2] variants are both useful


While this one often falls away into 'well duh', it can also be seen as a more complex case of binomial.

For example, a single fair dice throw is n cases of 1/n probability (where n is the amount of faces) should converge on uniform.


Note that the distribution of combined dice rolls starts resembling a gaussian.

Exponential Distribution

continuous (note that the Geometric distribution is a very similar discrete distribution)


https://en.wikipedia.org/wiki/Exponential_distribution

Multinomial distribution

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Generalization of binomial distribition, easiest to introduce as such:

where binomial may let you model the amount of successes for a coin (boolean outcomes) flipped n times...
...multinomial lets you model the probability of counts for each side of a k-sided die rolled n times.


Poisson distribution

discrete

Models the probability of a given number of events occurring, in a fixed interval of time or space, based on a known mean rate, independently of time since last even.

Used mainly if the events are discrete, and relatively sparse, where the distribution sits against zero and noticeably isn't very symmetric.

Note that when not so sparse, the shape much resembles a normal distribution. In part just because of the CLT, though, and it may still make more sense to model it as Poisson for a few usually-minor reasons (normal is continuous).


Poisson

  • tendency of rates: idealized distribution for counts of rare events
  • support: non-negative integers
  • often related to event counting
  • useful to model likeliness of something happening in an interval -- that considers interval length, and the rarity of the events
  • for high rates, it starts to look more gaussian


Examples:

  • Size of groups coming into restaurants
Poisson -- will be mostly two and one, sometimes three, more is fairly rare. Shape is not symmatric, and we want to model the rarity.


https://en.wikipedia.org/wiki/Poisson_distribution

Gaussian/Normal distribution

continuous

Gaussian distribution, a.k.a. Normal distribution


Gaussian

  • many independent identical events
  • where things have a tendency to the average
  • support: real numbers

Note that in various cases, more than one distribution can fit the data's nature (even when there is no real relation between the distributions. For example, for non-trivial counts both poisson and gaussian may work, which are quite different).


Examples:

  • height of body



Bell curve refers to any distribution of this sort of shape, but also frequently to the more specifically shaped gaussian.

Bell-shaped curves are everywhere, for a number of reasons.

Not least of which is that the combination of varying distributions converges on a gaussian/bell curve -- see the central limit theorem[3]). This also means that if you don't know a distribution and have no reason to suspect a specific type, gaussian a decent go-to to get you started somewhere.


https://math.stackexchange.com/questions/2379271/why-do-bell-curves-appear-everywhere/2379472


https://en.wikipedia.org/wiki/Normal_distribution


t distribution

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

a t-distribution is an approximation of a variable based on a small sample size.

(also known as Student's t-test, because its originator, William Gosset, was not allowed to use his own name by Guiness, his employer, so he used Student as a name)


A t-distribution is strongly related to a normal/gaussian, and in fact as the degrees of freedom increase, the more the t-distribution tends towards a regular normal distribution.


We use a t-distribution when we know we probably don't have enough samples to assume a gaussian distribution, and still want to do hypothesis testing, with comparable output scale (verify) to other tests.


You can arguably see the t-distriution can be seen as generalizing its data a little - its thicker tails make the probability density more spread, which also means its its peak is lower and narrower.

In part, you acknowledge that when sampling from unknown distributions

the sample mean is probably not the same as the population mean
your estimation of the variance would probably be too high

...without those estimates being structurally too different from things like z-scores.

Actually, after about ~30 df, the t-statistic is basically the same as the z-statistic (from the normal distribution), because you're now starting to get enough samples, and above 50 you might as well just go normal.



t-distribution variance is S2*k / k-2

which, since k/(k-2) will be slightly >1
is part of what it makes it wider
is what makes it nearer to normal for higher (which is just S2 (verify))

Logistic distribution

beta distribution

continuous

Defined over [0,1]


Dirichlet distribution

continuous

Multivariate

Generalization of the beta distribution to multiple variables(verify), which is why it is also sometimes called multivariate beta distribution.

https://en.wikipedia.org/wiki/Dirichlet_distribution

Probability

Intro

Combinatorics and related probabilities

Between permutation and subset combination, I always confuse how to calculate which, so this is here to remind me.

Permutations

Permutations of a whole set are the (amount of) unique-choice lists of all its elements.

  • Non-overlapping choices, in which
  • the order matters

Example: The permutations of characters a,b,c are:

abc
acb
bac
bca
cab
cba

There are n! permutations of n elements, because choices depend on earlier ones.

For the abc example: there are 3 choices for the first letter, (for each of these choices there are) two left for the second, and one for the last.



All permutations of fewer elements than the whole set is also commonly needed.

Like basic permutation, but only the first so-many choices apply. In a way, only first few terms in the would-be-factorial apply, as you stop choosing after a while.


For example: "get all ways to order three pictures next to each other on a wall - any three from from six pictures"

6*5*4 - six options for the first, five for the second, four for the third.


The more general notation is nPc

And its value is nPc = n!/(n-c)!

...so the pictures example would be

6P3 = 6!/(6-3)! = 6!/3! = 120
If it helps: (6*5*4*3*2*1)/(3*2*1)



All permutations of all equal or smaller lengths is also sometimes useful.

For example, given four words, all unique sentences you can make of length one, two, three and four.



Rotations

When calculating "The amount of ways to seat people around a circular table," you likely care only who is sitting next to each other, so have no obvious starting point, so in these cases there are n solutions (where n is the amount of chairs) that are not discernable (everyone shifts one seat, but you decide you consider it the same).

So same logic as above, but you divide by n.


Mirrors

You may also consider an exact mirror to be the same, e.g. abcde as equivalent to edcba.

Again, when seating people, the same people would be adjacent.

Half of the results are equal to the other half, so divide by two.


Combinations

Combinations of a subset finds the (amount of) possible sets which choose c unique elements from a set of n.

  • Non-overlapping choices, in which
  • order doesn't matter
  • Notation for the amount: nCc
  • regularly spoken out as "n choose c"
  • nCc = n!/c!(n-c)!
  • Example: of the letters a b c d, all combinations of three elements: (there are are 4C3 = 4!/3!1! = 4)
abc
abd
acd
bcd

Applies to questions such as "choose two different flavours for your ice cream cone, out of eight flavours", because you don't care about the order the scoops are stacked.

8C2 = 8!/(8-2)!2! = 8!/6!2! = 28

...and its introduction to...

Dependent and independent events