# math/statistics

# Statistics

## Basic Measures

The sample distribution has finite size and is what has been measured; the parent distribution is infinite and smooth and is the limit case of the sample distribution.

The mean, or average, is (of course):

$$\langle x \rangle = \frac{1}{N} \sum_{i=1}^{N}x_i$$

The variance is;

$$s^{2}_x = \frac{1}{N-1}\sum^{N}_{i=1}\left(x-\langle x \rangle\right)^2$$

The standard deviation is the square root of the variance; the standard deviation of the parent distribution is represented by *σ*_{x} instead of *s*_{x}. The mean of the parent distribution is *μ* instead of *x̄*.

## Binomial Distribution

If we are playing a yes/no game (eg flipping a coin), the binomial distribution represents the probability of getting ‘yes’ *x* times out of *n* if *p* is the probability of getting ‘yes’ for a single attempt.

$$P(x;n,p) = \frac{n!}{x! (n-x)!} p^x (1-p)^{n-x}$$

The mean of this distribution is *μ* = *n**p*, and $\sigma = \sqrt{np (1-p)}$.

## Poisson Distribution

$$P(x,\mu) = \frac{\mu^x}{x!} e^{-\mu}$$

The mean is *μ*, and $\sigma=\sqrt{\mu}$.

## Gaussian Distribution

The classic! Also called a normal distribution.

$$P(x;\mu,\sigma) = \frac{1}{2\pi \sigma} e^{-\left(\frac{(x-\mu)^2}{2\sigma^2}\right)}$$

The mean is *μ* and the deviation is $\sigma=\sqrt{\mu}$.

## Lorentzian Distribution

This distribution represents damped resonance; it is also the Fourier transform of an exponentially decaying sinusoid.

$$P(x;\mu,\Gamma) = \frac{1}{\pi} \frac{\Gamma/2}{(x-\mu)^2 + (\Gamma/2)^2}$$

where the mean is *μ* and the linewidth (the width of the peak) is *Γ*.

## Error Analysis

For a given measurement, the error on the mean is not the standard deviation (which is a measure of the statistics), it is $\frac{s_x}{\sqrt{N}}$: the standard deviation should stay roughly constant as *N* gets very large, but the error on the mean should get smaller. More elaborately, if the errors are different for each individual measurement, the mean will be:

$$\bar{x}=
\frac{ \sum_{i=1}^{N} x_i / \sigma_{i}^2}{\sum_{i=1}^{N} 1/\sigma_{i}^2}
\pm \sqrt{ \frac{1}{\sum_{i=1}^{N} 1/\sigma_{i}^2}}$$

*χ*^{2} Distribution

*χ*^{2} is often written “chi-squared” and is a metric for how well a fit curve matches uncertain data.

$$\chi^2 = \sum_{i=1}^{N}\left(\frac{x_i-\mu_i}{\sigma{i}}\right)^2$$

The number of degrees of freedom of the system is the number of measurements *N* minus the number of variable parameters in a curve fit *N*_{c}: *ν* = *N* − *N*_{c}.

The reduced *χ*^{2} value is *χ*_{r}^{2} = *χ*^{2}/*ν*. You want *χ*_{r}^{2} to be around (but not exactly!) 1; if it is significantly larger there are probably too many degrees of freedom, while if significantly smaller the fit is bad.