Variational Inference (VI)
Variational Inference (VI) approximates an intractable posterior distribution by replacing integration with optimization.
1. Setup
Given observed data $x$ and latent variables $z$, Bayesian inference requires the posterior
$$p(z|x) = \frac{p(x,z)}{p(x)}$$Direct computation is difficult when
$$p(x) = \int p(x,z)\,dz$$is intractable.
2. Variational Idea
Choose a tractable family of distributions $\mathcal{Q}$ and find
$$q^*(z) = \arg\min_{q(z)\in\mathcal{Q}} D_{\text{KL}}(q(z)\|p(z|x))$$where
$$D_{\text{KL}}(q(z)\|p(z|x)) = \mathbb{E}_q\left[\log \frac{q(z)}{p(z|x)}\right]$$3. ELBO
Because $p(z|x)$ depends on the intractable $p(x)$, VI instead maximizes the Evidence Lower Bound (ELBO):
$$\mathcal{L}(q) = \mathbb{E}_q[\log p(x,z)] - \mathbb{E}_q[\log q(z)]$$with the decomposition
$$\log p(x) = \mathcal{L}(q) + D_{\text{KL}}(q(z)\|p(z|x))$$Thus maximizing $\mathcal{L}(q)$ is equivalent to minimizing $D_{\text{KL}}(q(z)\|p(z|x))$.
4. Mean-Field VI
A common assumption is factorization:
$$q(z) = \prod_{j=1}^m q_j(z_j)$$This converts posterior approximation into coordinate-wise optimization.
5. Purpose
VI trades exactness for computational efficiency and is widely used when MCMC is too slow or exact posterior computation is impossible.