Best Predictor — Conditional Expectation

Problem

Given random variables $X$ and $Y$ with $E(Y) = \mu$ and $\text{Var}(Y) < \infty$, find the function $f(X)$ that minimizes the mean squared error:

$$\text{MSE} = E[(Y - f(X))^2]$$

Solution

$$f(X) = E(Y \mid X)$$

The conditional expectation is the best predictor under squared error loss. More generally, the function $f(X_1, \ldots, X_n)$ minimizing $E[(Y - f(X_1, \ldots, X_n))^2]$ is $E[Y \mid X_1, \ldots, X_n]$.

Why We Don’t Use It Directly

Computing $E(Y|X)$ requires knowing the joint distribution of $X$ and $Y$, which is often too complex. Instead, we restrict to linear functions → Best Linear Predictor — General (Theorem 1.2).