Multivariate Normal Distributions
Basic Definition:
- \(X\) is \(p\)-dimensional MVN if \(V^T X\) is univariate normal for any \(V \in \mathbb{R}^p\).
- Density for \(X \sim N_p(\mu, \Sigma)\) is \( f_X(x) = \frac{1}{(2\pi)^{p/2} \det(\Sigma)^{1/2}} e^{-\frac{1}{2} (x-\mu)^T \Sigma^{-1}(x-\mu)} \)
- Density contours are ellipsoids \(d(x, \mu) = (x-\mu)^T \Sigma^{-1} (x-\mu) = C\)
- centered at \(\mu\)
- axes align with eigenvectors of \(\Sigma\)
- axes proportional to eigenvalues of \(\Sigma\)
Confidence ellipses
Properties of \(X \sim N_p(\mu, \Sigma)\)
\(X_i \sim N(\mu_i, \Sigma_{ii})\)
\(\underset{q \times 1}{y} = \underset{q \times p}{A} X + b \sim N_q(A\mu + b, A\Sigma A^T)\),
- in particular, \(y = \Sigma^{-1/2}(X-\mu) \sim N_p(0, I)\)
\( \begin{pmatrix} X \\ y \end{pmatrix} \sim N_{p+q} \left( \begin{pmatrix} \mu_x \\ \mu_y \end{pmatrix}, \begin{pmatrix} \Sigma_x & \Sigma_{xy} \\ \Sigma_{yx} & \Sigma_y \end{pmatrix} \right) \)
\(X \text{ and } y \text{ are independent} \Leftrightarrow X \text{ and } y \text{ are uncorrelated} \Leftrightarrow \Sigma = \begin{pmatrix} \Sigma_x & 0 \\ 0 & \Sigma_y \end{pmatrix}\)
Marginal and Conditional Distributions
For a random vector \(X = \begin{pmatrix} X_1 \\ X_2 \end{pmatrix}\) it has
- joint density \(f_X(\mathbf{x}) = f_X(x_1, x_2)\)
- marginal densities $$ f_{X_1}(x_1) = \int_{\mathbb{R}} f_X(x_1, x_2) \,dx_2 $$ $$ f_{X_2}(x_2) = \int_{\mathbb{R}} f_X(x_1, x_2) \,dx_1 $$
- conditional densities $$ f_{X_1|X_2}(x_1|x_2) = \frac{f_X(x_1, x_2)}{f_{X_2}(x_2)} $$ $$ f_{X_2|X_1}(x_2|x_1) = \frac{f_X(x_1, x_2)}{f_{X_1}(x_1)} $$
For a random vector \(X = \begin{pmatrix} X_1 \\ \vdots \\ X_p \end{pmatrix}\) it has
- joint density \(f_X(\mathbf{x}) = f_X(x_1, \dots, x_p)\)
- marginal densities $$ f_{X_i}(x_i) = \int_{\mathbb{R}^{p-1}} f_X(x_1, \dots, x_p) \,dx_1 \dots dx_{i-1} dx_{i+1} \dots dx_p $$
- conditional densities for \(I = \{i_1, \dots, i_r\}\), \(J = \{j_1, \dots, j_{n-r}\}\) such that \(J = \{1, \dots, n\} \setminus I\) $$ f_{X_I|X_J}(\mathbf{x}_I | \mathbf{x}_J) = \frac{f_X(x_1, \dots, x_p)}{f_{X_J}(\mathbf{x}_J)} $$ Here $$ X_I = (X_{i_1}, \dots, X_{i_r}) \quad \mathbf{x}_I = (x_{i_1}, \dots, x_{i_r}) $$ $$ X_J = (X_{j_1}, \dots, X_{j_{n-r}}) \quad \mathbf{x}_J = (x_{j_1}, \dots, x_{j_{n-r}}) $$
Conditional Distribution
Let
$$ \begin{pmatrix} X \\ y \end{pmatrix} \sim N_{p+q} \left( \begin{pmatrix} \mu_x \\ \mu_y \end{pmatrix}, \begin{pmatrix} \Sigma_x & \Sigma_{xy} \\ \Sigma_{yx} & \Sigma_y \end{pmatrix} \right) $$denote \(Z = y - \Sigma_{yx} \Sigma_x^{-1} X\)
- \(X\) and \(Z\) are normal \(A = (I_p, O_{p \times q})\) then \(A \begin{pmatrix} X \\ y \end{pmatrix} = X\) \(B = (-\Sigma_{yx} \Sigma_x^{-1}, I_q)\) then \(B \begin{pmatrix} X \\ y \end{pmatrix} = -\Sigma_{yx} \Sigma_x^{-1} X + y\)
- \(X \sim N_p(\mu_x, \Sigma_x)\)
- \(Z \sim N_q(\mu_y - \Sigma_{yx} \Sigma_x^{-1} \mu_x, \Sigma_y - \Sigma_{yx} \Sigma_x^{-1} \Sigma_{xy})\)
- \(X\) and \(Z\) are independent $$ \begin{aligned} Cov(X, Z) &= Cov\left(A \begin{pmatrix} X \\ y \end{pmatrix}, B \begin{pmatrix} X \\ y \end{pmatrix}\right) = A \cdot Var\left(\begin{pmatrix} X \\ y \end{pmatrix}\right) \cdot B^T \\ &= (I_p, O) \begin{pmatrix} \Sigma_x & \Sigma_{xy} \\ \Sigma_{yx} & \Sigma_y \end{pmatrix} \begin{pmatrix} -\Sigma_x^{-1} \Sigma_{xy} \\ I \end{pmatrix} \\ &= (\Sigma_x, \Sigma_{xy}) \begin{pmatrix} -\Sigma_x^{-1} \Sigma_{xy} \\ I \end{pmatrix} \\ &= -\Sigma_x \Sigma_x^{-1} \Sigma_{xy} + \Sigma_{xy} = -\Sigma_{xy} + \Sigma_{xy} = 0 \end{aligned} $$ \(X\) and \(Z\) are uncorrelated MVNs \(\Rightarrow\) \(X\) and \(Z\) are independent
- \(Y|X=x \sim N_q(\mu_y + \Sigma_{yx} \Sigma_x^{-1}(x - \mu_x), \Sigma_y - \Sigma_{yx} \Sigma_x^{-1} \Sigma_{xy})\)
- \(Z = y - \Sigma_{yx} \Sigma_x^{-1} X\) implies \(y = Z + \Sigma_{yx} \Sigma_x^{-1} x\)
- \(Z\) and \(X\) are independent then $$ \begin{aligned} E(y|X=x) &= E(Z|X=x) + E(\Sigma_{yx} \Sigma_x^{-1} x | X=x) \\ &= E(Z) + \Sigma_{yx} \Sigma_x^{-1} x \\ &= (\mu_y - \Sigma_{yx} \Sigma_x^{-1} \mu_x) + \Sigma_{yx} \Sigma_x^{-1} x \\ &= \mu_y + \Sigma_{yx} \Sigma_x^{-1}(x - \mu_x) \end{aligned} $$
- \(Var(y|X=x) = Var(Z) = \Sigma_y - \Sigma_{yx} \Sigma_x^{-1} \Sigma_{xy}\)
Block Inverse Via Schur Complement
Given a block matrix \(M\) :
\[ M = \begin{pmatrix} A & B \\ B^T & C \end{pmatrix} \in \mathbb{R}^{(p+q) \times (p+q)} \]where \(A \in \mathbb{R}^{p \times p}\) and \(C \in \mathbb{R}^{q \times q}\) are invertible matrices.
The Schur complement of @@M@@ with respect to @@C@@ is defined as:
\[ M/C = A - BC^{-1}B^T \in \mathbb{R}^{p \times p} \]The Schur complement of @@M@@ with respect to @@A@@ is defined as:
\[ M/A = C - B^T A^{-1} B \in \mathbb{R}^{q \times q} \]Both @@M/C@@ and @@M/A@@ are invertible if @@M@@ is invertible.
Given a block matrix \(M\) :
\[ M = \begin{pmatrix} A & B \\ B^T & C \end{pmatrix} \in \mathbb{R}^{(p+q) \times (p+q)} \]The inverse of the block matrix, \(M^{-1}\), can be expressed in terms of the Schur complements. We define \(M^{-1}\) with corresponding blocks:
\[ M^{-1} = \begin{pmatrix} \tilde{A} & \tilde{B} \\ \tilde{B}^T & \tilde{C} \end{pmatrix} \]The blocks of the inverse are given by:
\[ M^{-1} = \begin{pmatrix} (M/C)^{-1} & -A^{-1}B(M/A)^{-1} \\ -C^{-1}B^T(M/C)^{-1} & (M/A)^{-1} \end{pmatrix} \]证明过程见Proof Of Multivariable Statistics的第二个Proof。
Given random vector @@X \sim N_p(\mu, \Sigma)@@.
$$ X = \begin{pmatrix} X_1 \\ X_2 \\ \vdots \\ X_p \end{pmatrix} \begin{matrix} \left. \right\} A \\ \left. \right\} B \end{matrix} = \begin{pmatrix} X_A \\ X_B \end{pmatrix} \sim N_p \left( \begin{pmatrix} \mu_A \\ \mu_B \end{pmatrix}, \begin{pmatrix} \Sigma_A & \Sigma_{AB} \\ \Sigma_{BA} & \Sigma_B \end{pmatrix} \right) $$Consider precision matrix @@\Sigma^{-1}@@.
If @@(\Sigma^{-1})_{12} = (\Sigma^{-1})_{21} = 0@@, i.e.
$$ \Sigma^{-1} = \begin{pmatrix} \ast & 0 & \ast & \cdots & \ast \\ 0 & \ast & \ast & \cdots & \ast \\ \ast & \ast & \ast & \cdots & \ast \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \ast & \ast & \ast & \cdots & \ast \end{pmatrix} $$then @@X_1@@ and @@X_2@@ are independent given @@X_3, ..., X_p@@.
Relationship With MVN
@@X_A | X_B = x_B \sim N_2(\mu_A + \Sigma_{AB} \Sigma_B^{-1}(x_B - \mu_B), \Sigma_A - \Sigma_{AB} \Sigma_B^{-1} \Sigma_{BA})@@
$$ \Sigma^{-1} = \begin{pmatrix} (\Sigma / \Sigma_B)^{-1} & \ast \\ \ast & \ast \end{pmatrix} $$Let @@\Sigma / \Sigma_B = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \in \mathbb{R}^{2 \times 2}@@ where b = c, Then
$$ (\Sigma / \Sigma_B)^{-1} = \frac{1}{ad-bc} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix} $$@@b = 0 \iff (\Sigma / \Sigma_B)^{-1} = \begin{pmatrix} \ast & 0 \\ 0 & \ast \end{pmatrix} \iff \Sigma^{-1} = \left( \begin{array}{cc|c} \ast & 0 & \ast \\ 0 & \ast & \ast \\ \hline \ast & \ast & \ast \end{array} \right) \iff \Sigma / \Sigma_B = \begin{pmatrix} \ast & 0 \\ 0 & \ast \end{pmatrix} \iff X_1 \text{ and } X_2 \text{ given } X_B \text{ are independent.}@@
注:对角矩阵的逆矩阵也是对角矩阵,这是基本性质之一。
既然 $\Sigma / \Sigma_B$ 是对角矩阵,那么它的逆矩阵 $(\Sigma / \Sigma_B)^{-1}$ 也必须是一个对角矩阵。
所以, $(\Sigma / \Sigma_B)^{-1} = \begin{pmatrix} * & 0 \\ 0 & * \end{pmatrix}$。
我们知道全Precisiom Matrix $\Sigma^{-1}$ 的左上角分块就是 $(\Sigma / \Sigma_B)^{-1}$。因此, $\Sigma^{-1}$ 的左上角分块也必须是那个对角矩阵 $\begin{pmatrix} * & 0 \\ 0 & * \end{pmatrix}$。
不需要知道逆矩阵的0之外到底是什么,但因为precision matrix是如此得来,所以在得到这个右上角的block为对角矩阵的结论后也可以在precision matrix里得到相关变量之间条件独立的结论(given其它变量)。
总结:
If @@(\Sigma^{-1})_{ij} = (\Sigma^{-1})_{ji} = 0@@, i.e.
$$ \Sigma^{-1} = \begin{pmatrix} \ast & \cdots & \ast & \cdots & \ast & \cdots & \ast \\ \vdots & \ddots & \vdots & & \vdots & & \vdots \\ \ast & \cdots & \ast & \cdots & 0 & \cdots & \ast \\ \vdots & & \vdots & \ddots & \vdots & & \vdots \\ \ast & \cdots & 0 & \cdots & \ast & \cdots & \ast \\ \vdots & & \vdots & & \vdots & \ddots & \vdots \\ \ast & \cdots & \ast & \cdots & \ast & \cdots & \ast \end{pmatrix} $$where for random @@i,j@@
then @@X_i@@ and @@X_j@@ are independent given @@X_{1}, \ldots, X_{i-1}, X_{i+1}, \ldots, X_{j-1}, X_{j+1}, \ldots, X_p@@.
Maximum likelihood estimates
For observations @@x_1, \ldots, x_n \in \mathbb{R}^p@@, maximize likelihood that @@x_i@@ came from @@X \sim N_p(\mu, \Sigma)@@.
$$ \ell(x_1, \ldots, x_n; \mu, \Sigma) = -\frac{n \cdot p}{2} \log(2\pi) - \frac{n}{2} \log \det(\Sigma) - \frac{1}{2} \sum_{i=1}^n (x_i - \mu)^T \Sigma^{-1}(x_i - \mu) $$$$ \begin{aligned} L(x_1, \ldots, x_n; \mu, \Sigma) &= \prod_{i=1}^n f_X(x_i) = \prod_{i=1}^n \frac{1}{\sqrt{(2\pi)^p \det(\Sigma)}} e^{-\frac{1}{2}(x_i - \mu)^T \Sigma^{-1}(x_i - \mu)} \\ \ell(x_1, \ldots, x_n; \mu, \Sigma) &= \log L(x_1, \ldots, x_n; \mu, \Sigma) \\ &= \sum_{i=1}^n \left( -\log \left[ (2\pi)^{p/2} (\det(\Sigma))^{1/2} \right] - \frac{1}{2}(x_i - \mu)^T \Sigma^{-1}(x_i - \mu) \right) \\ &= -\frac{n \cdot p}{2} \log(2\pi) - \frac{n}{2} \log \det(\Sigma) - \frac{1}{2} \sum_{i=1}^n (x_i - \mu)^T \Sigma^{-1}(x_i - \mu) \end{aligned} $$MLE Estimator Optimization
1. Optimizing @@\mu@@
Optimizing over @@\mu@@ implies @@\hat{\mu} = \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i@@.
$$ \begin{aligned} \nabla_{\mu} \ell(x_1, \ldots, x_n; \mu, \Sigma) &= \nabla_{\mu} \left[ -\frac{1}{2} \sum_{i=1}^n (x_i - \mu)^T \Sigma^{-1}(x_i - \mu) \right] \\ &= \sum_{i=1}^n \left[ \Sigma^{-1}(x_i - \mu) \right] \\ &= \Sigma^{-1} \left( \sum_{i=1}^n x_i - \sum_{i=1}^n \mu \right) \\ &= n \Sigma^{-1} (\bar{x} - \mu) = 0 \end{aligned} $$Then @@\hat{\mu} = \bar{x}@@.
2. Optimizing @@\Sigma@@
Optimizing over @@\Sigma@@ gives @@\hat{\Sigma} = S = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})(x_i - \bar{x})^T@@.
We substitute @@\mu = \bar{x}@@ into the likelihood function:
$$ \ell(x_1, \ldots, x_n; \bar{x}, \Sigma) = -\frac{n}{2} \log \det(\Sigma) - \frac{1}{2}\sum_{i=1}^n (x_i - \bar{x})^T \Sigma^{-1}(x_i - \bar{x}) + \text{const} $$Using the property @@c = \text{tr}(c)@@ if @@c@@ is a scalar constant:
$$ \begin{aligned} \sum_{i=1}^n (x_i - \bar{x})^T \Sigma^{-1}(x_i - \bar{x}) &= \sum_{i=1}^n \text{tr}\left[ (x_i - \bar{x})^T \Sigma^{-1}(x_i - \bar{x}) \right] \\ &= \sum_{i=1}^n \text{tr}\left[ \Sigma^{-1}(x_i - \bar{x})(x_i - \bar{x})^T \right] \\ &= \text{tr}\left[ \Sigma^{-1} \sum_{i=1}^n (x_i - \bar{x})(x_i - \bar{x})^T \right] \\ &= \text{tr}\left[ \Sigma^{-1} (nS) \right] = n \text{tr}(\Sigma^{-1} S) \end{aligned} $$Thus, the simplified log-likelihood is:
$$ \ell(x_1, \ldots, x_n; \bar{x}, \Sigma) = -\frac{n}{2} \log \det(\Sigma) - \frac{n}{2} \text{tr}(\Sigma^{-1} S) + \text{const} $$Restate log-likelihood in terms of Precision Matrix
Restate log-likelihood in terms of @@P = \Sigma^{-1}@@. (Note: @@\log \det(\Sigma^{-1}) = \log \det(P) = - \log \det(\Sigma)@@, and @@\text{tr}(\Sigma^{-1} S) = \text{tr}(PS)@@)
$$ \ell(x_1, \ldots, x_n; \bar{x}, P) = \frac{n}{2} \log \det(P) - \frac{n}{2} \text{tr}(PS) $$The gradient with respect to @@P@@ is set to zero:
$$ \nabla_P \ell(x_1, \ldots, x_n; \bar{x}, P) = \frac{n}{2} P^{-1} - \frac{n}{2} S = 0 \implies \hat{P} = S^{-1} \implies \hat{\Sigma} = S $$Here we used the matrix derivative identity: @@\nabla_P \log \det(P) = P^{-1}@@.
Independence of MLEs
If @@x_1, \ldots, x_n@@ are i.i.d. @@N_p(\mu, \Sigma)@@ then @@\bar{x}@@ and @@S@@ are independent.
Let @@X = \begin{pmatrix} x_1^T \\ \vdots \\ x_n^T \end{pmatrix}@@ be a random @@n \times p@@ matrix.
Let @@\tilde{X} = CX = \begin{pmatrix} (x_1 - \bar{x})^T \\ \vdots \\ (x_n - \bar{x})^T \end{pmatrix}@@.
$$ S = \frac{1}{n} \tilde{X}^T \tilde{X} = \frac{1}{n} (CX)^T (CX) = \frac{1}{n} X^T C^T C X = \frac{1}{n} X^T C X \text{ (since } C \text{ is idempotent)} $$(Note: The hand-written note uses @@\tilde{X}@@ for @@x_i - \bar{x}@@ which simplifies to @@S = \frac{1}{n} \tilde{X}^T \tilde{X}@@. The matrix @@C@@ is the centering matrix.)
The mean vector is @@\bar{x} = \frac{1}{n} X^T \mathbf{1}@@, where $\mathbf{1}$ is the vector of ones.
$$ \text{cov}(x_i - \bar{x}, \bar{x}) = \text{cov}(x_i, \bar{x}) - \text{var}(\bar{x}) = \frac{1}{n} \sum_{j=1}^n \text{cov}(x_i, x_j) - \frac{1}{n} \Sigma $$Since @@x_i, x_j@@ are i.i.d., @@\text{cov}(x_i, x_j) = 0@@ for @@i \ne j@@ and @@\Sigma@@ for @@i=j@@.
$$ \text{cov}(x_i - \bar{x}, \bar{x}) = \frac{1}{n} \text{cov}(x_i, x_i) - \frac{1}{n} \Sigma = \frac{1}{n} \Sigma - \frac{1}{n} \Sigma = 0 $$Since @@x_i - \bar{x}@@ and @@\bar{x}@@ are jointly multivariate normal, zero covariance implies independence. Thus, @@\tilde{X}@@ and @@\bar{x}@@ are independent, which implies @@S@@ and @@\bar{x}@@ are also independent.
Distribution of the estimates
If @@x_1, \ldots, x_n \stackrel{\text{i.i.d.}}{\sim} N_p(\mu, \Sigma)@@ then @@\bar{x} \sim N_p(\mu, \frac{\Sigma}{n})@@.
If @@x_1, \ldots, x_n \sim N_p(0, \Sigma)@@ and @@M = X^T X = \sum_{i=1}^n x_i x_i^T@@ follows Wishart distribution.
$$ M \sim W_p(\Sigma, n) $$(@@\Sigma@@ is the scaling matrix, @@n@@ is the degrees-of-freedom.)
Wishart distribution generalizes the @@\chi^2@@ distribution, as:
$$ W_1(\sigma^2, n) \equiv \sigma^2 \chi^2(n) $$For @@p=1@@, let @@X = \begin{pmatrix} x_1 \\ \vdots \\ x_n \end{pmatrix} \text{ where } x_i \sim N(0, \sigma^2)@@.
$$ X^T X = \sum_{i=1}^n x_i^2 \sim \sigma^2 \chi^2(n) $$Example: Sample Covariance
Univariate case @@x_1, \ldots, x_n \stackrel{\text{i.i.d.}}{\sim} N(\mu, \sigma^2)@@
$$ n S^2 = \sum_{i=1}^n (x_i - \bar{x})^2 \sim \sigma^2 \chi^2(n-1) $$Multivariate case @@x_1, \ldots, x_n \stackrel{\text{i.i.d.}}{\sim} N_p(\mu, \Sigma)@@
$$ n S = \sum_{i=1}^n (x_i - \bar{x})(x_i - \bar{x})^T \sim W_p(\Sigma, n-1) $$Hotelling $T^2$ distribution
Univariate Case (Student’s t-distribution)
If @@Z \sim N(0, 1)@@ and @@M \sim \chi^2(n)@@ are independent, then
$$ \tau = \frac{Z}{\sqrt{M/n}} \text{ has Student's } t\text{-distribution} $$That is, @@\tau \sim t(n)@@ degrees of freedom.
Multivariate Case (Hotelling $T^2$ distribution)
If @@Z \sim N_p(0, I)@@ and @@M \sim W_p(I, n)@@ are independent, then
$$ \tau^2 = n Z^T M^{-1} Z \text{ has Hotelling } T^2\text{-distribution} $$That is, @@\tau^2 \sim T^2(p, n)@@.
When @@p=1@@, then @@Z \sim N(0, 1)@@ and @@M \sim W_1(1, n) \equiv \chi^2(n)@@. The statistic simplifies:
$$ \tau^2 = n \cdot Z^T M^{-1} Z = n \cdot \frac{Z \cdot 1 \cdot Z}{M} = \frac{Z^2}{M/n} = \left(\frac{Z}{\sqrt{M/n}}\right)^2 $$Example: $t$-statistics
Univariate case
$$ \frac{\bar{x} - \mu}{S / \sqrt{n}} \sim t(n-1) $$- Numerator: We have @@\bar{x} \sim N(\mu, \sigma^2/n)@@. Therefore, @@\sqrt{n} \frac{\bar{x} - \mu}{\sigma} \sim N(0, 1)@@.
- Denominator (Numerator of $\chi^2$ part): We have @@nS^2 \sim \sigma^2 \chi^2(n-1)@@. Therefore, @@\frac{nS^2}{\sigma^2} \sim \chi^2(n-1)@@.
- Putting it together: The $t$-statistic is formed by the ratio of the $N(0, 1)$ part to the square root of the $\chi^2(n-1)$ part divided by its degrees of freedom $(n-1)$: $$ \frac{\sqrt{n} \frac{\bar{x} - \mu}{\sigma}}{\sqrt{\frac{nS^2}{\sigma^2} / (n-1)}} = \frac{\sqrt{n}(\bar{x} - \mu)}{\sigma} \cdot \frac{\sigma}{\sqrt{n} S / \sqrt{n}} = \frac{\bar{x} - \mu}{S / \sqrt{n}} \sim t(n-1) $$
Multivariate case
$$ (n-1) (\bar{x} - \mu)^T S^{-1} (\bar{x} - \mu) \sim T^2(p, n-1) $$- Vector $Z$ part: We have @@\bar{x} \sim N_p(\mu, \Sigma/n)@@. Therefore, @@\sqrt{n} \Sigma^{-1/2} (\bar{x} - \mu) \sim N_p(0, I)@@.
- Matrix $M$ part: We have @@n \Sigma^{-1/2} S \Sigma^{-1/2} \sim W_p(I, n-1)@@, which is independent of $\bar{x}$.
- Putting it together (the full expression): $$ (n-1) \left( \sqrt{n} \Sigma^{-1/2} (\bar{x} - \mu) \right)^T \left( n \Sigma^{-1/2} S \Sigma^{-1/2} \right)^{-1} \left( \sqrt{n} \Sigma^{-1/2} (\bar{x} - \mu) \right) $$
- Simplifying the middle inverse term (The Step Skipped in the Note): $$ \left( n \Sigma^{-1/2} S \Sigma^{-1/2} \right)^{-1} = \frac{1}{n} (\Sigma^{-1/2})^{-1} S^{-1} (\Sigma^{-1/2})^{-1} = \frac{1}{n} \Sigma^{1/2} S^{-1} \Sigma^{1/2} $$
- Final Simplification: Substituting the simplified inverse back into the expression: $$ (n-1) \cdot n \cdot (\bar{x} - \mu)^T \Sigma^{-1/2} \left[ \frac{1}{n} \Sigma^{1/2} S^{-1} \Sigma^{1/2} \right] \Sigma^{-1/2} (\bar{x} - \mu) = (n-1) (\bar{x} - \mu)^T S^{-1} (\bar{x} - \mu) \sim T^2(p, n-1) $$
Univariate ($p=1$)
$$ x_1, \ldots, x_n \sim N(\mu, \sigma^2) $$Sample mean:
$$ \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i $$$$ \bar{x} \sim N(\mu, \frac{\sigma^2}{n}) $$Sample variance
$$ s^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2 $$$$ n s^2 \sim \sigma^2 \chi^2(n-1) $$$T$-statistics
$$ \frac{\bar{x} - \mu}{s / \sqrt{n-1}} \sim t(n-1) $$Multivariate
$$ x_1, \ldots, x_n \sim N_p(\mu, \Sigma) $$Sample mean vector:
$$ \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i $$$$ \bar{x} \sim N_p(\mu, \frac{\Sigma}{n}) $$Sample covariance matrix:
$$ S = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})(x_i - \bar{x})^T $$$$ n S \sim W_p(\Sigma, n-1) $$$T$-statistics
$$ (n-1) (\bar{x} - \mu)^T S^{-1} (\bar{x} - \mu) \sim T^2(p, n-1) $$Testing
Univariate Case: $\sigma^2$ is Known ($Z$-Test)
$$ x_1, \ldots, x_n \sim N(\mu, \sigma^2) $$- Hypotheses: @@H_0: \mu = a \quad \text{vs} \quad H_1: \mu \ne a@@
- Compute $z$-statistic: $$ Z = \frac{\bar{x} - a}{\sigma / \sqrt{n}} \stackrel{\text{under } H_0}{\sim} N(0, 1) $$
- Compute $p$-value: @@P(|Z| \ge |Z_{\text{obs}}| \mid H_0)@@
- Decision: Reject @@H_0@@ if @@p\text{-value} < \alpha@@
Univariate Case: $\sigma^2$ is Unknown ($t$-Test)
$$ x_1, \ldots, x_n \sim N(\mu, ?) $$- Hypotheses: @@H_0: \mu = a \quad \text{vs} \quad H_1: \mu \ne a@@
- Compute $t$-statistic: $$ t = \frac{\bar{x} - a}{s / \sqrt{n-1}} \stackrel{\text{under } H_0}{\sim} t(n-1) $$
- Compute $p$-value: @@P(|t| \ge |t_{\text{obs}}| \mid H_0)@@
- Decision: Reject @@H_0@@ if @@p\text{-value} < \alpha@@
Univariate Case: Two-Sample Test (Equal Variance)
$$ x_1, \ldots, x_n \sim N(\mu_x, ?), \quad y_1, \ldots, y_m \sim N(\mu_y, ?), \quad \text{equal variance} $$- Hypotheses: @@H_0: \mu_x = \mu_y \quad \text{vs} \quad H_1: \mu_x \ne \mu_y@@
- Compute Pooled Variance ($S_{\text{pooled}}^2$): $$ \begin{aligned} S_{\text{pooled}}^2 &= \frac{1}{n+m-2} \left( \sum_{i=1}^n (x_i - \bar{x})^2 + \sum_{i=1}^m (y_i - \bar{y})^2 \right) \\ &= \frac{1}{n+m-2} \left( n S_x^2 + m S_y^2 \right) \end{aligned} $$
- Compute $t$-statistic: $$ t = \frac{\bar{x} - \bar{y}}{S_{\text{pooled}} \sqrt{\frac{1}{n} + \frac{1}{m}}} \stackrel{\text{under } H_0}{\sim} t(n+m-2) $$
- Compute $p$-value: @@P(|t| \ge |t_{\text{obs}}| \mid H_0)@@
- Decision: Reject @@H_0@@ if @@p\text{-value} < \alpha@@
Multivariate Testing
Multivariate Case: $\Sigma$ is Known ($\chi^2$-Test)
$$ x_1, \ldots, x_n \sim N_p(\mu, \Sigma) $$- Hypotheses: @@H_0: \mu = a \quad \text{vs} \quad H_1: \mu \ne a@@
- Compute $\chi^2$-statistic: $$ \chi^2 = n (\bar{x} - a)^T \Sigma^{-1} (\bar{x} - a) \stackrel{\text{under } H_0}{\sim} \chi^2(p) $$
- Derivation (The Step Skipped in the Note): We have @@\bar{x} \sim N_p(\mu, \Sigma/n)@@. Under @@H_0@@, @@\mu=a@@. $$ y = \sqrt{n} \Sigma^{-1/2} (\bar{x} - a) \stackrel{\text{under } H_0}{\sim} N_p(0, I_p) $$ Let @@y = (y_1, \ldots, y_p)^T@@. Then the statistic is: $$ \chi^2 = y^T y = \sum_{i=1}^p y_i^2 \sim \chi^2(p) $$
- Compute $p$-value: @@P(\chi^2 \ge \chi^2_{\text{obs}} \mid H_0)@@
- Decision: Reject @@H_0@@ if @@p\text{-value} < \alpha@@
Multivariate Case: $\Sigma$ is Unknown (Hotelling $T^2$-Test)
$$ x_1, \ldots, x_n \sim N_p(\mu, ?) $$- Hypotheses: @@H_0: \mu = a \quad \text{vs} \quad H_1: \mu \ne a@@
- Compute Hotelling $T^2$-statistic: $$ T^2 = (n-1) (\bar{x} - a)^T S^{-1} (\bar{x} - a) \stackrel{\text{under } H_0}{\sim} T^2(p, n-1) $$
- Compute $p$-value: @@P(T^2 \ge T^2_{\text{obs}} \mid H_0)@@
- Decision: Reject @@H_0@@ if @@p\text{-value} < \alpha@@
Univariate Case: Two-Sample Test (Known $\sigma^2$)
$$ \bar{x} \sim N(\mu_x, \frac{\sigma^2}{n}), \quad \bar{y} \sim N(\mu_y, \frac{\sigma^2}{m}) $$$$ n s_x^2 \sim \sigma^2 \chi^2(n-1), \quad m s_y^2 \sim \sigma^2 \chi^2(m-1) $$Derivation of $t$-statistic:
- Numerator: Under @@H_0: \mu_x = \mu_y@@, the difference of means is: $$ \bar{x} - \bar{y} \sim N\left(0, \sigma^2 \left(\frac{1}{n} + \frac{1}{m}\right)\right) $$ Thus, the standard normal variable $Z$ is: $$ Z = \frac{\bar{x} - \bar{y}}{\sqrt{\sigma^2 (\frac{1}{n} + \frac{1}{m})}} \stackrel{\text{under } H_0}{\sim} N(0, 1) $$
- Denominator (Chi-squared part): The pooled chi-squared variable is: $$ \frac{(n+m-2) S_{\text{pooled}}^2}{\sigma^2} = \frac{n s_x^2 + m s_y^2}{\sigma^2} \sim \chi^2(n+m-2) $$
- Final $t$-statistic: The $t$-statistic is $Z$ divided by the square root of the $\chi^2$ variable divided by its degrees of freedom: $$ t = \frac{\frac{\bar{x} - \bar{y}}{\sigma \sqrt{\frac{1}{n} + \frac{1}{m}}}}{\sqrt{\frac{(n+m-2) S_{\text{pooled}}^2}{\sigma^2} / (n+m-2)}} = \frac{\bar{x} - \bar{y}}{\sigma \sqrt{\frac{1}{n} + \frac{1}{m}}} \cdot \frac{\sigma}{S_{\text{pooled}}} = \frac{\bar{x} - \bar{y}}{S_{\text{pooled}} \sqrt{\frac{1}{n} + \frac{1}{m}}} \stackrel{\text{under } H_0}{\sim} t(n+m-2) $$
Multivariate Case: Two-Sample Test (Equal Covariance)
$$ \bar{x} \sim N_p\left(\mu_x, \frac{\Sigma}{n}\right), \quad \bar{y} \sim N_p\left(\mu_y, \frac{\Sigma}{m}\right) $$$$ n S_x \sim W_p(\Sigma, n-1), \quad m S_y \sim W_p(\Sigma, m-1) $$$T^2$ Statistic and Distribution:
The Hotelling $T^2$-statistic for two samples is:
$$ T^2 = \frac{1}{ \frac{1}{n} + \frac{1}{m} } (\bar{x} - \bar{y})^T S_{\text{pooled}}^{-1} (\bar{x} - \bar{y}) \stackrel{\text{under } H_0}{\sim} T^2(p, n+m-2) $$- Hypotheses: @@H_0: \mu_x = \mu_y \quad \text{vs} \quad H_1: \mu_x \ne \mu_y@@
- Compute $p$-value: @@P(T^2 \ge T^2_{\text{obs}} \mid H_0)@@
- Decision: Reject @@H_0@@ if @@p\text{-value} < \alpha@@
Derivation of $T^2$-statistic:
Pooled Covariance:
$$ (n+m-2) S_{\text{pooled}} = n S_x + m S_y \sim W_p(\Sigma, n+m-2) $$Difference of Means (Vector $Z$):
$$ \bar{x} - \bar{y} \sim N_p\left(\mu_x - \mu_y, \Sigma \left(\frac{1}{n} + \frac{1}{m}\right)\right) $$Under @@H_0: \mu_x = \mu_y@@, the standardized $Z$ vector is:
$$ Z = \sqrt{\frac{1}{\frac{1}{n} + \frac{1}{m}}} \Sigma^{-1/2} (\bar{x} - \bar{y}) \stackrel{\text{under } H_0}{\sim} N_p(0, I_p) $$Wishart Matrix ($M$): The Wishart matrix with identity scaling is:
$$ M = (n+m-2) \Sigma^{-1/2} S_{\text{pooled}} \Sigma^{-1/2} \sim W_p(I_p, n+m-2) $$$T^2$ Expression and Simplification (The Steps Skipped in the Note): The $T^2$ statistic is defined as $\tau^2 = (n+m-2) Z^T M^{-1} Z$:
$$ \begin{aligned} T^2 &= (n+m-2) Z^T M^{-1} Z \\ &= (n+m-2) \left[ \sqrt{\frac{nm}{n+m}} \Sigma^{-1/2} (\bar{x} - \bar{y}) \right]^T \left[ (n+m-2) \Sigma^{-1/2} S_{\text{pooled}} \Sigma^{-1/2} \right]^{-1} \left[ \sqrt{\frac{nm}{n+m}} \Sigma^{-1/2} (\bar{x} - \bar{y}) \right] \\ &= (n+m-2) \cdot \frac{nm}{n+m} \cdot (\bar{x} - \bar{y})^T \Sigma^{-1/2} \left[ \frac{1}{n+m-2} \Sigma^{1/2} S_{\text{pooled}}^{-1} \Sigma^{1/2} \right] \Sigma^{-1/2} (\bar{x} - \bar{y}) \\ &= \frac{nm}{n+m} \cdot (\bar{x} - \bar{y})^T S_{\text{pooled}}^{-1} (\bar{x} - \bar{y}) \\ &= \frac{1}{\frac{1}{n} + \frac{1}{m}} (\bar{x} - \bar{y})^T S_{\text{pooled}}^{-1} (\bar{x} - \bar{y}) \stackrel{\text{under } H_0}{\sim} T^2(p, n+m-2) \end{aligned} $$(in which $\frac{1}{\frac{1}{n} + \frac{1}{m}} = \frac{nm}{n+m}$)