Best Linear Predictor — General (Theorem 1.2)

Setup

Let $U$ be a scalar random variable with $E(U^2) < \infty$, and $\mathbf{W}$ an $n$-dimensional random vector with finite variance-covariance matrix $\Gamma = \text{Cov}(\mathbf{W}, \mathbf{W})$.

The best linear predictor $P(U|\mathbf{W})$ minimizes $E[(U - g(\mathbf{W}))^2]$ over all affine functions $g(\mathbf{W}) = a_0 + \mathbf{a}'\mathbf{W}$.

Seven Properties

(i) $P(U|\mathbf{W}) = E(U) + \mathbf{a}'(\mathbf{W} - E(\mathbf{W}))$ where $\Gamma\mathbf{a} = \text{Cov}(U, \mathbf{W})$

(ii) $E[(U - P(U|\mathbf{W}))^2] = \text{Var}(U) - \mathbf{a}'\text{Cov}(U, \mathbf{W})$ (MSE formula)

(iii) $P(U|\mathbf{W}) = E(U)$ if $\text{Cov}(U, \mathbf{W}) = \mathbf{0}$ (no linear info → predict the mean)

(iv) $E([U - P(U|\mathbf{W})]) = 0$ (unbiased)

(v) $E([U - P(U|\mathbf{W})]\mathbf{W}) = \mathbf{0}$ (error uncorrelated with information)

(vi) $P(\alpha_0 + \alpha_1 U_1 + \alpha_2 U_2|\mathbf{W}) = \alpha_0 + \alpha_1 P(U_1|\mathbf{W}) + \alpha_2 P(U_2|\mathbf{W})$ (linearity in $U$)

(vii) $P(W_i|\mathbf{W}) = W_i$ (projecting information onto itself)

Interpretation of (i)

Solve the linear system $\Gamma\mathbf{a} = \boldsymbol{\gamma}$ where $\boldsymbol{\gamma} = \text{Cov}(U, \mathbf{W})$. Then the BLP is: mean of target + weighted deviations of information from its mean.