Best Predictor — Conditional Expectation
Problem
Given random variables $X$ and $Y$ with $E(Y) = \mu$ and $\text{Var}(Y) < \infty$, find the function $f(X)$ that minimizes the mean squared error:
$$\text{MSE} = E[(Y - f(X))^2]$$Solution
$$f(X) = E(Y \mid X)$$The conditional expectation is the best predictor under squared error loss. More generally, the function $f(X_1, \ldots, X_n)$ minimizing $E[(Y - f(X_1, \ldots, X_n))^2]$ is $E[Y \mid X_1, \ldots, X_n]$.
Why We Don’t Use It Directly
Computing $E(Y|X)$ requires knowing the joint distribution of $X$ and $Y$, which is often too complex. Instead, we restrict to linear functions → Best Linear Predictor — General (Theorem 1.2).