Metropolis-Hastings Algorithm
Metropolis-Hastings (MH) is a Markov chain Monte Carlo method that generates samples from a target distribution $p(x)$ known only up to a normalizing constant.
1. Setup
Given:
- Target distribution $\bar{p}(x)$ where $p(x) = \bar{p}(x)/Z$ and $Z$ is unknown
- Proposal distribution $q(x' \mid x^{(t)})$
2. Algorithm
Starting from $x^{(0)}$, at each iteration $t$:
- Propose: Draw $x' \sim q(x' \mid x^{(t)})$
- Accept/Reject: Compute the acceptance probability
- Draw $u \sim \text{Uniform}(0,1)$. If $u \leq A$, set $x^{(t+1)} = x'$; otherwise $x^{(t+1)} = x^{(t)}$.
3. Detailed Balance
MH satisfies the detailed balance condition:
$$p(x)\,q(x' \mid x)\,A(x' \mid x) = p(x')\,q(x \mid x')\,A(x \mid x')$$Proof sketch: Suppose $p(x)q(x' \mid x) \leq p(x')q(x \mid x')$. Then $A(x' \mid x) = 1$ and $A(x \mid x') = \frac{p(x)q(x' \mid x)}{p(x')q(x \mid x')}$. Substituting into both sides yields equality. The symmetric case follows identically.
Detailed balance implies $p(x)$ is the stationary distribution of the MH chain.
4. Special Cases
- Random-walk MH: $q(x' \mid x) = q(x' - x)$ is symmetric, so the ratio $q(x^{(t)} \mid x')/q(x' \mid x^{(t)}) = 1$ and $A$ simplifies to $\min\{1, \bar{p}(x')/\bar{p}(x^{(t)})\}$
- Independence MH: $q(x' \mid x) = q(x')$ does not depend on current state
5. Comparison with Rejection Sampling
| Rejection Sampling | Metropolis-Hastings | |
|---|---|---|
| Requires envelope $Cg(x) \geq f(x)$ | Yes | No |
| Samples are independent | Yes | No (correlated) |
| Needs normalizing constant | Yes | No |
| Works in high dimensions | Poorly | Better |
6. Practical Considerations
- Burn-in: Discard initial samples before the chain converges to stationarity
- Thinning: Keep every $k$-th sample to reduce autocorrelation
- Proposal tuning: Too narrow proposals give high acceptance but slow exploration; too wide gives low acceptance