STA414: Variational Inference, ELBO, EM, and Mixture Models (Question Set)

Q1. Variational Inference — 10 pts

Let $p(x,z)$ be a joint distribution and let $q(z \mid x)$ be any distribution that is positive wherever $p(x,z)$ is positive.

1. Starting from

$$ p(x)=\int p(x,z)\,\mathrm{d}z, $$

derive the inequality

$$ \mathcal{L}(q)=\mathbb{E}_{q(z \mid x)}[\log p(x,z)-\log q(z \mid x)]\le \log p(x). $$

2. State the exact condition under which equality holds.

3. In one sentence, explain why replacing $q(z \mid x)$ by a distribution $q(z)$ that does not depend on $x$ is generally not an adequate posterior approximation.

Q2. ELBO algebra and KL direction — 10 pts

Assume the latent-variable model

$$ p(x,z)=p(x \mid z)p(z). $$

1. Show that

$$ \mathcal{L}(q)=\mathbb{E}_q[\log p(x \mid z)]-D_{\mathrm{KL}}(q(z \mid x)\Vert p(z)). $$

2. Starting from Bayes’ rule, prove that

$$ \log p(x)=\mathcal{L}(q)+D_{\mathrm{KL}}(q(z \mid x)\Vert p(z \mid x)). $$

3. Which KL direction is minimized by standard variational inference:

$$ D_{\mathrm{KL}}(q\Vert p)\quad \text{or}\quad D_{\mathrm{KL}}(p\Vert q)? $$

Give one consequence of that choice in terms of support-covering or mode-seeking behavior.

Q3. Mean-field update — 10 pts

Let $z=(z_1,z_2,z_3)$, and suppose we restrict the variational family to

$$ q(z)=q_1(z_1)q_2(z_2)q_3(z_3). $$

1. Derive the optimal coordinate update for $q_2^*(z_2)$ up to proportionality.

2. Your final answer must have the form

$$ q_2^*(z_2)\propto \exp\Big(\mathbb{E}_{q_1 q_3}[\log p(x,z)]\Big). $$

3. State clearly what is treated as a constant when deriving this update.

Q4. EM for a two-component Gaussian mixture — 14 pts

Let $x_1,\dots,x_N$ be observed. Introduce latent variables $z_n\in\{0,1\}$ with

$$ P(z_n=1)=\pi,\qquad P(z_n=0)=1-\pi. $$

Conditionally,

$$ x_n\mid z_n=0 \sim \mathcal{N}(\mu_0,\sigma^2),\qquad x_n\mid z_n=1 \sim \mathcal{N}(\mu_1,\sigma^2), $$

where $\sigma^2$ is known.

1. Write the complete-data joint distribution $p(X,Z \mid \theta)$, where $\theta=(\pi,\mu_0,\mu_1)$.

2. Derive the E-step responsibilities

$$ r_n=P(z_n=1 \mid x_n,\theta^{\text{old}}). $$

3. Write the $Q$-function

$$ Q(\theta,\theta^{\text{old}})=\mathbb{E}_{p(Z \mid X,\theta^{\text{old}})}[\log p(X,Z \mid \theta)] $$

up to additive constants independent of $\theta$.

4. Derive the M-step updates for $\pi$, $\mu_0$, and $\mu_1$.

Q5. EM as variational inference — 10 pts

For the same model as in Q4, define a variational distribution

$$ q(Z)=\prod_{n=1}^N q_n(z_n),\qquad q_n(z_n=1)=r_n. $$

1. Show that

$$ \log p(X \mid \theta)=\mathcal{L}(q,\theta)+D_{\mathrm{KL}}(q(Z)\Vert p(Z \mid X,\theta)). $$

2. Show that for fixed $\theta$, the maximizer over all $q$ is

$$ q^*(Z)=p(Z \mid X,\theta). $$

3. Explain why the E-step and M-step can be viewed as coordinate ascent on $\mathcal{L}(q,\theta)$.

Q6. Short diagnostic — 6 pts

For each statement, write True or False, and give a one-line justification.

1. In the ELBO derivation,

$$ p(x)=\int p(x,z)\,\mathrm{d}x. $$

2. Since $\log$ is concave,

$$ \log \mathbb{E}_q[f(z)]\le \mathbb{E}_q[\log f(z)]. $$

3. If $q(z \mid x)=p(z \mid x)$, then

$$ \mathcal{L}(q)=\log p(x). $$

4. The identity

$$ \begin{aligned} &\mathbb{E}_{p(x \mid \theta)}[\log p(x \mid \theta)]-\mathbb{E}_{p(x \mid \theta)}[\log p(x \mid \hat{\theta})] \\ &\quad = D_{\mathrm{KL}}(p(x \mid \theta)\Vert p(x \mid \hat{\theta})) \end{aligned} $$

is valid.

5. In the E-step of EM, the distribution over latent variables depends on $\theta^{\text{old}}$.

6. In the M-step of EM, the responsibilities $r_n$ are treated as fixed.

Q7. One mixed long question — 10 pts

Consider again the model in Q4.

1. Starting from

$$ \log p(X \mid \theta)=\log \sum_Z p(X,Z \mid \theta), $$

insert an arbitrary $q(Z)$, derive an ELBO, and identify the KL remainder term.

2. Specialize $q(Z)$ to

$$ q(Z)=\prod_{n=1}^N \mathrm{Bernoulli}(r_n). $$

3. Show that optimizing the ELBO with respect to $r_n$ recovers the EM responsibility formula.

4. State in one sentence what prevents this optimization from being a closed-form exact posterior update in a generic latent-variable model.

Q1. Variational Inference — 10 pts#

Q2. ELBO algebra and KL direction — 10 pts#

Q3. Mean-field update — 10 pts#

Q4. EM for a two-component Gaussian mixture — 14 pts#

Q5. EM as variational inference — 10 pts#

Q6. Short diagnostic — 6 pts#

Q7. One mixed long question — 10 pts#