Expectation-Maximization (EM)
Expectation-Maximization (EM) is an iterative algorithm for Maximum Likelihood Estimation when the model contains latent variables or missing data.
1. Setup
Let $x$ be observed data, $z$ latent variables, and $\theta$ parameters. The target is
$$\ell(\theta) = \log p(x|\theta) = \log \int p(x,z|\theta)\,dz$$The integral usually makes direct maximization difficult.
2. Two-Step Update
At iteration $t$:
E-step
Compute the expected complete-data log-likelihood under the current posterior of the latent variables:
$$Q(\theta \mid \theta^{(t)}) = \mathbb{E}_{z \sim p(z|x,\theta^{(t)})}[\log p(x,z|\theta)]$$M-step
Update the parameter by maximizing this quantity:
$$\theta^{(t+1)} = \arg\max_\theta Q(\theta \mid \theta^{(t)})$$3. Interpretation
- E-step: infer the latent structure using the current parameter.
- M-step: re-estimate the parameter as if the latent structure were known in expectation.
4. Core Property
Each EM iteration does not decrease the observed-data log-likelihood:
$$\ell(\theta^{(t+1)}) \ge \ell(\theta^{(t)})$$EM therefore performs monotone ascent toward a local optimum.
5. Relation to VI
EM can be viewed as a special case of variational optimization where the variational distribution is set to the exact conditional distribution of the latent variables in the E-step.