Best Linear Predictor — General (Theorem 1.2)
Setup
Let $U$ be a scalar random variable with $E(U^2) < \infty$, and $\mathbf{W}$ an $n$-dimensional random vector with finite variance-covariance matrix $\Gamma = \text{Cov}(\mathbf{W}, \mathbf{W})$.
The best linear predictor $P(U|\mathbf{W})$ minimizes $E[(U - g(\mathbf{W}))^2]$ over all affine functions $g(\mathbf{W}) = a_0 + \mathbf{a}'\mathbf{W}$.
Seven Properties
(i) $P(U|\mathbf{W}) = E(U) + \mathbf{a}'(\mathbf{W} - E(\mathbf{W}))$ where $\Gamma\mathbf{a} = \text{Cov}(U, \mathbf{W})$
(ii) $E[(U - P(U|\mathbf{W}))^2] = \text{Var}(U) - \mathbf{a}'\text{Cov}(U, \mathbf{W})$ (MSE formula)
(iii) $P(U|\mathbf{W}) = E(U)$ if $\text{Cov}(U, \mathbf{W}) = \mathbf{0}$ (no linear info → predict the mean)
(iv) $E([U - P(U|\mathbf{W})]) = 0$ (unbiased)
(v) $E([U - P(U|\mathbf{W})]\mathbf{W}) = \mathbf{0}$ (error uncorrelated with information)
(vi) $P(\alpha_0 + \alpha_1 U_1 + \alpha_2 U_2|\mathbf{W}) = \alpha_0 + \alpha_1 P(U_1|\mathbf{W}) + \alpha_2 P(U_2|\mathbf{W})$ (linearity in $U$)
(vii) $P(W_i|\mathbf{W}) = W_i$ (projecting information onto itself)
Interpretation of (i)
Solve the linear system $\Gamma\mathbf{a} = \boldsymbol{\gamma}$ where $\boldsymbol{\gamma} = \text{Cov}(U, \mathbf{W})$. Then the BLP is: mean of target + weighted deviations of information from its mean.