We can generate pseudo-random numbers on $[0, 1]$ with an algorithm such as the Mersenne Twister Algorithm. This is basically all that we can do on a computer and everything is built upon this.
Rejection Sampling
One approach to sampling from distribution $p dm$ is to upper bound $p$ by a distribution $q$ that we can sample from and then reject the samples that lie outside of $p$.We implement rejection sampling in python. We upper bound the normal distribution by a square, so we only get a small region of the normal distribution.
import numpy as np import matplotlib.pyplot as plt def rejectionSampler(target, N): Y = np.zeros(N) i = 0 while i < N: X = 10 * np.random.uniform(low = -5, high = 5) u = np.random.uniform() if u <= target(X): Y[i] = X i += 1 return Y def target(x): return 1/np.sqrt(2 * np.pi) * np.exp(-1/2 * x**2) X = rejectionSampler(target, 10000) plt.figure(figsize = (8, 8)) plt.tight_layout() plt.hist(X, normed = True, bins = 100) plt.show()
Markov Chain Monte Carlo Sampling
An extension to the Law of Large Numbers is the Mean Ergodic Theorem which says that if we have a measure-preserving system $(\mu, T)$ and $f \in L_1(\mu)$, then \[ \frac{1}{n} \lim_{n \to \infty} \sum_{k = 1}^n T^k f \to E(f(X_1) | \mathcal{S}) \] almost everywhere $\mu$ where $\mathcal{S}$ is the shift-invariant $\sigma$-algebra of $T$. If $\mathcal{S}$ is trivial, then $E(f | \mathcal{S}) = E(f)$, which is our goal. Let $(X_k)$ be a Markov chain with Markov operator $P$ and invariant distribution $\pi = p dm$. Since $\pi$ is invariant, $(P, \pi)$ is a measure-preserving system. If the Markov chain is irreducible, then $\mathcal{S}$ is trivial. Therefore, we can apply the Mean Ergodic Theorem, and \[ \frac{1}{n} \lim_{n \to \infty} P^k f \to \int f p dm \] almost everywhere $p dm$, and we can approximate integrals this way. However, this says nothing about the rate of convergence, and to get within $\epsilon$ of the limit may take $n = 10^{82}$ (the number of atoms in the universe). Therefore, we need to be careful with Theorems of this type in general.We can implement this theory using the Metropolis-Hastings algorithm. This can be thought of as an extension of the Rejection Algorithm. The idea is to move (accept) if we obtain a good sample or else stay put (reject).
We implement a Metropolis-Hastings algorithm with random walk proposal and the target distribution as a normal mixture with 2 modes.
import numpy as np import matplotlib.pyplot as plt def randomWalkMH(target, dim, N = 10**(4), h = 1): X = np.zeros((N, dim)) X[0] = np.repeat(0, dim) for i in range(0, N - 1): Z = np.random.multivariate_normal(np.zeros(dim), np.identity(dim)) X_next = X[i] + h * Z rho = min(1, target(X_next) / max(10**(-20), target(X[i]))) u = np.random.uniform() if u < rho: X[i + 1] = X_next else: X[i + 1] = X[i] return X def target(x): mu1 = np.array([-5, 0]) mu2 = np.array([5, 0]) f1 = 1/np.sqrt(2 * np.pi)**2 * np.exp(-1/2 * np.linalg.norm(x - mu1)**2) f2 = 1/np.sqrt(2 * np.pi)**2 * np.exp(-1/2 * np.linalg.norm(x - mu2)**2) return 1/2 * f1 + 1/2 * f2 dim = 2 X = randomWalkMH(target, dim, N = 10**(5), h = 5) plt.figure(figsize = (10, 5)) plt.tight_layout() plt.scatter(X[:, 0], X[:, 1], s=1) plt.show()
Gibbs Sampling
Gibbs Sampling is analogous to the coordinate descent algorithm in optimization. It is pretty easy to derive how the Mean Ergodic Theorem will hold for this Markov chain.def bivariateNormalgibbsSampler(mu, E, N = 1000): X = np.zeros((N, 2)) for i in range(0, N - 1): X[i + 1][0] = np.random.normal(E[0, 1] * 1/E[1, 1] * (X[i][1] - mu[1]), E[1, 1] - E[0, 1] * E[1, 0]) X[i + 1][1] = np.random.normal(E[1, 0] * 1/E[0, 0] * (X[i + 1][0] - mu[1]), E[0, 0] - E[0, 1] * E[1, 0]) return X mu = np.array([0, 0]) E = np.matrix([[1, -1/2], [-1/2, 1]]) X = bivariateNormalgibbsSampler(mu, E, N = 10**(4)) plt.figure(figsize = (8, 6)) plt.tight_layout() plt.scatter(X[:, 0], X[:, 1], s=1) plt.show()
References.
Christian P. Robert and George Casella. 2005. Monte Carlo Statistical Methods (Springer Texts in Statistics). Springer-Verlag, Berlin, Heidelberg.Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. The MIT Press.
Tweets by austindavbrown