Exam Pattern: EM on Gaussian Mixture

This note consolidates the full EM procedure for a Gaussian Mixture Model (GMM) — the most likely long-answer exam question for STA414.

Setup

Given: $N$ observations $\{x_1, \dots, x_N\}$, $K$ mixture components, parameters $\theta = \{\pi_k, \mu_k, \Sigma_k\}_{k=1}^K$.

Log-likelihood:

$$\ell(\theta) = \sum_{n=1}^N \log \left(\sum_{k=1}^K \pi_k \,\mathcal{N}_m(x_n \mid \mu_k, \Sigma_k)\right)$$

E-Step: Compute Responsibilities

$$r_{nk} = p(z_n = k \mid x_n, \theta^{(t)}) = \frac{\pi_k^{(t)} \,\mathcal{N}_m(x_n \mid \mu_k^{(t)}, \Sigma_k^{(t)})}{\sum_{j=1}^K \pi_j^{(t)} \,\mathcal{N}_m(x_n \mid \mu_j^{(t)}, \Sigma_j^{(t)})}$$

Define effective counts: $N_k = \sum_{n=1}^N r_{nk}$

M-Step: Update Parameters

Means:

$$\mu_k^{(t+1)} = \frac{1}{N_k}\sum_{n=1}^N r_{nk}\,x_n$$

Covariances:

$$\Sigma_k^{(t+1)} = \frac{1}{N_k}\sum_{n=1}^N r_{nk}\,(x_n - \mu_k^{(t+1)})(x_n - \mu_k^{(t+1)})^\top$$

Mixing weights:

$$\pi_k^{(t+1)} = \frac{N_k}{N}$$

Common Exam Variations

Variation 1: Derive the M-step for $\mu_k$

Set $\frac{\partial}{\partial \mu_k}Q(\theta \mid \theta^{(t)}) = 0$ where $Q$ is the expected complete-data log-likelihood:

$$Q = \sum_{n=1}^N \sum_{k=1}^K r_{nk}\left[\log \pi_k - \frac{m}{2}\log(2\pi) - \frac{1}{2}\log|\Sigma_k| - \frac{1}{2}(x_n - \mu_k)^\top \Sigma_k^{-1}(x_n - \mu_k)\right]$$

Differentiating w.r.t. $\mu_k$:

$$\sum_{n=1}^N r_{nk}\,\Sigma_k^{-1}(x_n - \mu_k) = 0 \implies \mu_k = \frac{\sum_n r_{nk}\,x_n}{\sum_n r_{nk}}$$

Variation 2: What happens as $K=1$?

EM reduces to computing the sample mean and sample covariance — no iteration needed.

Variation 3: Connection to K-Means

When $\Sigma_k = \varepsilon I$ and $\varepsilon \to 0$, responsibilities become hard (0/1) assignments, and EM becomes the K-Means Algorithm.

Checklist for Exam

  • Write the generative model (latent $z$, then $x \mid z$)
  • Write the log-likelihood with the sum inside the log
  • Write the responsibility formula (E-step)
  • Derive or state the M-step updates for $\mu_k$, $\Sigma_k$, $\pi_k$
  • State monotonicity: $\ell(\theta^{(t+1)}) \geq \ell(\theta^{(t)})$
  • Mention convergence to local optimum, not global