Central Limit Theorem simulation

A very well known result in probability theory is the Central Limit Theorem, which states that the sum of i.i.d. random variables converges in distribution and at a given speed to a normal distribution.

Let \(X_1, \dots, X_n\) be a sample of i.i.d. random variables in \(L^2\) with expected value \(\mu\) and variance \(\sigma^2\). We are interested in the average

\[S_n := \frac{X_1 + \dots + X_n}{n}\]

We know by the Strong Law of Large Numbers that this quantity converges in probability and a.s. to the expected value \(\mu\). The CLT states that

\[\sqrt n \left(S_n - \mu \right) \xrightarrow{\mathcal{L}} \mathcal{N}(0, \sigma^2)\]

In other words the average of \(n\) independent realizations of a random variable converges to a normal distribution at speed \(\sqrt{n}\).

We give here a tool to verify that when \(n\) grows, the variable \(\sqrt{n}(S_n - \mu)\) approximates more and more precisely a normal distribution.

Here below we write the density function (or the mass function) for some known random variables.

Normal distribution with mean \(\mu\) and standard deviation \(\sigma\)

\[f(x) = \frac 1 {\sigma \sqrt{2 \pi}} e^{- \frac{(x-\mu)^2}{2 \sigma^2}}\]

Uniform distribution on the real interval \([a,b]\)

\[f(x) = \left\{ \begin{array}{cc} \frac 1 {b-a} & if \, x \in [a,b] \\ 0 & otherwise \end{array} \right.\]

Exponential distribution of parameter \(\lambda\)

\[f(x) = \left\{ \begin{array}{cc} \lambda e^{-\lambda x} & x \geq 0 \\ 0 & x < 0 \end{array} \right.\]

Gamma distribution of parameter \(k\)

\[f(x) = \frac {x^{\alpha-1} \beta^\alpha e^{-\beta x }}{\Gamma(\alpha)}\]

Bernoulli distribution of parameter \(p\)

\[\mathbb{P}(X = 1) = 1 - \mathbb{P}(X = 0) = p\]

Binomial distribution of parameters \(n\) and \(p\)

\[\mathbb{P}(X = k) = \binom n k p^k(1-p)^{n-k}\]

Poisson distribution of parameter \(\lambda\)

\[\mathbb{P}(X = k) = \frac{\lambda^k}{k!} e^{-\lambda}\]

The seed used to initialize the pseudorandom number generator changes at each call of the algorithm if we set the variable \(seed = 0\). If we choose an integer different from zero, this seed will be taken to initialize the pseudorandom generator. This can be useful when we want to reproduce twice the same simulation.

Let \(Y_n = \sqrt n ( S_n - \mu )\) and let \((Y_n^k)_{k\geq 0}\) i.i.d. such that \(Y_n^k \sim Y_n\) for each \(k \geq 0\). In order to visualize the convergence to the normal distribution \(\mathcal N (0, \sigma^2)\), we make 1000 simulations of \(Y_n\) i.e. \((Y_n^1, \ldots, Y_n^{1000})\) and we plot them in an normed histogram with a fixed number of bins.

We notice that for a number of random realizations \(n=1\) we obtain an approximation of the distribution of the centered random variable.

We have to set the number of realizations \(n\) of the random variable and the number of bins in the histogram.