Prediction Error Properties
Two Key Properties
For the best linear predictor $P_n X_{n+h}$:
1. Zero mean error:
$$E(X_{n+h} - P_n X_{n+h}) = 0$$The predictor is unbiased.
2. Error uncorrelated with information:
$$E[(X_{n+h} - P_n X_{n+h}) \cdot X_j] = 0, \qquad j = 1, 2, \ldots, n$$Equivalently: $\text{Cov}(X_{n+h} - P_n X_{n+h},\; X_j) = 0$ for $j = 1, \ldots, n$.
Interpretation
The prediction error contains no linear information extractable from the observed data. If it did, we could improve the predictor — contradicting optimality.
Derivation Sketch
Property 1 comes from $\partial L / \partial a_0 = 0$. Property 2 comes from $\partial L / \partial a_i = 0$ for $i = 1, \ldots, n$, where $L = E[(X_{n+h} - P_n X_{n+h})^2]$.