Hidden Markov Model (HMM)

A Hidden Markov Model is a Latent Variable Model for sequential data where the latent states form a Markov chain and each observation depends only on its corresponding latent state.

1. Three Distributions

An HMM with $K$ hidden states and $T$ time steps is specified by:

  1. Initial distribution: $\pi_k = p(z_1 = k)$ for $k = 1, \dots, K$
  2. Transition matrix: $A_{ij} = p(z_t = j \mid z_{t-1} = i)$, with rows summing to 1
  3. Emission distribution: $p(x_t \mid z_t)$, often parameterized as $\lambda_t(k) = p(x_t \mid z_t = k)$

2. Joint Distribution

The joint probability of a sequence of observations $x_{1:T}$ and latent states $z_{1:T}$ factorizes as

$$p(x_{1:T}, z_{1:T}) = p(z_1) \prod_{t=2}^T p(z_t \mid z_{t-1}) \prod_{t=1}^T p(x_t \mid z_t)$$

This factorization follows from the directed graphical model structure: $z_1 \to z_2 \to \cdots \to z_T$ with each $z_t \to x_t$.

3. Key Inference Tasks

TaskQuantityAlgorithm
Filtering$p(z_t \mid x_{1:t})$Forward-Backward Algorithm (forward pass only)
Smoothing$p(z_t \mid x_{1:T})$Forward-Backward Algorithm (both passes)
MAP sequence$\arg\max_{z_{1:T}} p(z_{1:T} \mid x_{1:T})$Viterbi Algorithm
Marginal likelihood$p(x_{1:T})$Forward algorithm

4. Observation Models

Common emission distributions include:

  • Discrete: $p(x_t \mid z_t = k)$ is a categorical distribution (e.g., dice, words)
  • Gaussian: $p(x_t \mid z_t = k) = \mathcal{N}(x_t \mid \mu_k, \Sigma_k)$