Importance Sampling

Importance sampling estimates expectations under a target distribution $p(x)$ using samples from a different proposal distribution $q(x)$.

1. Core Identity

For any function $f(x)$:

$$\mathbb{E}_p[f(x)] = \int f(x)\,p(x)\,dx = \int f(x)\frac{p(x)}{q(x)}\,q(x)\,dx = \mathbb{E}_q\left[f(x)\frac{p(x)}{q(x)}\right]$$

2. Importance Weights

The ratio $w(x) = \frac{p(x)}{q(x)}$ is called the importance weight. Given samples $x^{(s)} \sim q$:

$$\mathbb{E}_p[f(x)] \approx \frac{1}{S}\sum_{s=1}^S f(x^{(s)})\,w(x^{(s)})$$

3. Unnormalized Case

When $p(x) = \bar{p}(x)/Z$ with unknown $Z$, use self-normalized importance sampling:

$$\mathbb{E}_p[f(x)] \approx \frac{\sum_{s=1}^S f(x^{(s)})\,\bar{w}(x^{(s)})}{\sum_{s=1}^S \bar{w}(x^{(s)})}$$

where $\bar{w}(x) = \bar{p}(x)/q(x)$ are the unnormalized weights.

4. Variance

The variance of the importance sampling estimator depends heavily on the choice of $q$:

  • If $q$ has lighter tails than $p$, the weights can have high variance
  • The optimal proposal (minimum variance) is $q^*(x) \propto |f(x)|\,p(x)$
  • In practice, choose $q$ to cover the regions where $|f(x)|p(x)$ is large

5. Comparison with Other Methods

MethodRequires envelopeSamples are weightedSamples are independent
Rejection SamplingYesNoYes
Importance SamplingNoYesYes
Metropolis-Hastings AlgorithmNoNoNo