Variational Inference (VI)

Variational Inference (VI) approximates an intractable posterior distribution by replacing integration with optimization.

1. Setup

Given observed data $x$ and latent variables $z$, Bayesian inference requires the posterior

$$p(z|x) = \frac{p(x,z)}{p(x)}$$

Direct computation is difficult when

$$p(x) = \int p(x,z)\,dz$$

is intractable.

Choose a tractable family of distributions $\mathcal{Q}$ and find

$$q^*(z) = \arg\min_{q(z)\in\mathcal{Q}} D_{\text{KL}}(q(z)\|p(z|x))$$

where

$$D_{\text{KL}}(q(z)\|p(z|x)) = \mathbb{E}_q\left[\log \frac{q(z)}{p(z|x)}\right]$$

Because $p(z|x)$ depends on the intractable $p(x)$, VI instead maximizes the Evidence Lower Bound (ELBO):

$$\mathcal{L}(q) = \mathbb{E}_q[\log p(x,z)] - \mathbb{E}_q[\log q(z)]$$

with the decomposition

$$\log p(x) = \mathcal{L}(q) + D_{\text{KL}}(q(z)\|p(z|x))$$

Thus maximizing $\mathcal{L}(q)$ is equivalent to minimizing $D_{\text{KL}}(q(z)\|p(z|x))$.

A common assumption is factorization:

$$q(z) = \prod_{j=1}^m q_j(z_j)$$

This converts posterior approximation into coordinate-wise optimization.

VI trades exactness for computational efficiency and is widely used when MCMC is too slow or exact posterior computation is impossible.