Expectation-Maximization (EM)

Expectation-Maximization (EM) is an iterative algorithm for Maximum Likelihood Estimation when the model contains latent variables or missing data.

1. Setup

Let $x$ be observed data, $z$ latent variables, and $\theta$ parameters. The target is

$$\ell(\theta) = \log p(x|\theta) = \log \int p(x,z|\theta)\,dz$$

The integral usually makes direct maximization difficult.

2. Two-Step Update

At iteration $t$:

E-step

Compute the expected complete-data log-likelihood under the current posterior of the latent variables:

$$Q(\theta \mid \theta^{(t)}) = \mathbb{E}_{z \sim p(z|x,\theta^{(t)})}[\log p(x,z|\theta)]$$

M-step

Update the parameter by maximizing this quantity:

$$\theta^{(t+1)} = \arg\max_\theta Q(\theta \mid \theta^{(t)})$$

3. Interpretation

  • E-step: infer the latent structure using the current parameter.
  • M-step: re-estimate the parameter as if the latent structure were known in expectation.

4. Core Property

Each EM iteration does not decrease the observed-data log-likelihood:

$$\ell(\theta^{(t+1)}) \ge \ell(\theta^{(t)})$$

EM therefore performs monotone ascent toward a local optimum.

5. Relation to VI

EM can be viewed as a special case of variational optimization where the variational distribution is set to the exact conditional distribution of the latent variables in the E-step.