Sample Parameters

1. Scalar Sample Parameters

For a univariate sample:

  • Sample Mean: $\bar{x} = \frac{1}{n}\sum_{i=1}^{n}x_i$
  • Sample Variance: $s^2 = \frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})^2$

Note: $\lim_{n\to\infty} s^2 = \sigma^2$ for $n$ large enough.

2. Vector Sample Parameters

Let the data matrix be $X$:

$$ X = \begin{pmatrix} x_1^T \\ \vdots \\ x_n^T \end{pmatrix} = \begin{pmatrix} f_{11} & \dots & f_{1p} \\ \vdots & \dots & \vdots \\ f_{n1} & \dots & f_{np} \end{pmatrix} \in \mathbb{R}^{n \times p} $$
  • Each row represents an observation (a sample).
  • Each column represents a feature.

Table: Statistics Notation Comparison

StatisticScalar (Notation)Vector $\mathbb{R}^2$ (Notation)Vector $\mathbb{R}^p$ (Notation)
Sample Mean$\bar{f}_1, \bar{f}_2$$\bar{x} = \frac{1}{n}\sum_{i=1}^{n}x_i = \begin{pmatrix} \bar{f}_1 \\ \bar{f}_2 \end{pmatrix}$$\bar{x} = \begin{pmatrix} \bar{f}_1 \\ \vdots \\ \bar{f}_p \end{pmatrix} \in \mathbb{R}^p$
Sample Variance / Covariance$s_{f_1}^2, s_{f_2}^2$
$s_{f_1, f_2} = \frac{1}{n}\sum (f_{i1}-\bar{f}_1)(f_{i2}-\bar{f}_2)$
$S = \begin{bmatrix} s_{f_1}^2 & s_{f_1, f_2} \\ s_{f_1, f_2} & s_{f_2}^2 \end{bmatrix}$$S = \begin{bmatrix} s_{f_1}^2 & \dots & s_{f_1 f_p} \\ \vdots & \ddots & \vdots \\ s_{f_p f_1} & \dots & s_{f_p}^2 \end{bmatrix} \in \mathbb{R}^{p \times p}$
Sample Correlation$r_{f_1, f_2} = \frac{s_{f_1, f_2}}{s_{f_1}s_{f_2}}$$\hat{R} = \begin{bmatrix} 1 & r_{f_1, f_2} \\ r_{f_1, f_2} & 1 \end{bmatrix}$$\hat{R} = \begin{bmatrix} 1 & r_{f_1, f_2} & \dots \\ r_{f_2, f_1} & 1 & \dots \\ \vdots & \vdots & \ddots \end{bmatrix} \in \mathbb{R}^{p \times p}$

Sample Covariance Matrix Between $X$ and $Y$

Given $X \in \mathbb{R}^{n \times p}$ and $Y \in \mathbb{R}^{n \times q}$:

$$ S_{xy} \in \mathbb{R}^{p \times q} \quad \text{with} \quad (S_{xy})_{ij} = s_{f_i, g_j} $$

Population Parameters (Expectation, Mean, Variance)

1D Case (Univariate)

  • Expectation: $\mathbb{E}[g(x_1)] = \int_{\mathbb{R}} g(x_1)f(x_1)dx_1$
    • Where $f(x_1)$ is the density function.
  • Mean: $\mu = \mathbb{E}(x_1) = \int_{\mathbb{R}} x_1 f(x_1) dx_1$
  • Variance: $\sigma^2 = Var(x_1) = \mathbb{E}[(x_1 - \mathbb{E}x_1)^2] = \mathbb{E}[x_1^2] - (\mathbb{E}[x_1])^2$

2D Case (Joint Density)

Random vector $x = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}$ with joint density $f(x)$.

  • Covariance: $Cov(x_1, x_2) = \mathbb{E}[(x_1 - \mathbb{E}x_1)(x_2 - \mathbb{E}x_2)]$
  • Correlation: $Corr(x_1, x_2) = \frac{Cov(x_1, x_2)}{\sqrt{Var(x_1)Var(x_2)}}$
  • Function Expectation: $\mathbb{E}[g(x)] = \int_{\mathbb{R}^2} g(x)f(x)dx$

Matrix Forms:

  • Mean Vector: $\mu = \mathbb{E}(x) = \begin{pmatrix} \mathbb{E}(x_1) \\ \mathbb{E}(x_2) \end{pmatrix}$
  • Covariance Matrix: $\Sigma = Var(x) = \begin{pmatrix} Var(x_1) & Cov(x_1, x_2) \\ Cov(x_1, x_2) & Var(x_2) \end{pmatrix}$
  • Correlation Matrix: $Corr(x) = \begin{pmatrix} 1 & Corr(x_1, x_2) \\ Corr(x_1, x_2) & 1 \end{pmatrix}$

Note on Joint Density:

  • If $x_1, x_2$ are independent: $f_{x_1, x_2}(x_1, x_2) = f_{x_1}(x_1)f_{x_2}(x_2)$
  • Marginal Density: $f_{x_2}(x_2) = \int f_{x_1, x_2}(x_1, x_2) dx_1$
  • Conditional Density: $f_{x_1|x_2}(x_1|x_2) = \frac{f_{x_1, x_2}(x_1, x_2)}{f_{x_2}(x_2)}$

Higher Dimension ($x \in \mathbb{R}^p$)

For $x = \begin{pmatrix} x_1 \\ \vdots \\ x_p \end{pmatrix}$ with joint density $f(x)$:

  • Mean Vector: $\mu = \mathbb{E}(x) \in \mathbb{R}^p$
  • Covariance Matrix: $\Sigma = Var(x) \in \mathbb{R}^{p \times p}$, where $\Sigma_{ij} = Cov(x_i, x_j)$ and $\Sigma_{ii} = Var(x_i)$.
  • Correlation Matrix: $R = Corr(x) \in \mathbb{R}^{p \times p}$, where $R_{ij} = Corr(x_i, x_j)$ and $R_{ii} = 1$.

Covariance Between Vectors: For $x \in \mathbb{R}^p, y \in \mathbb{R}^q$:

$$ \Sigma_{xy} = Cov(x, y) \in \mathbb{R}^{p \times q}, \quad (\Sigma_{xy})_{ij} = Cov(x_i, y_j) $$

Sample Estimates Computations

Given data matrix $X \in \mathbb{R}^{n \times p}$:

$$ X = \begin{pmatrix} x_1^T \\ \vdots \\ x_n^T \end{pmatrix} = (f_1, \dots, f_p) $$

1. Sample Mean

The sample mean can be computed using matrix multiplication with the vector of ones ($1_n$):

$$ \bar{x} = \frac{X^T 1_n}{n} = \frac{1}{n} \sum_{i=1}^{n} x_i $$

Verification:

$$ \frac{X^T 1_n}{n} = \frac{1}{n} \begin{pmatrix} f_1^T \\ \vdots \\ f_p^T \end{pmatrix} 1_n = \begin{pmatrix} f_1^T 1_n / n \\ \vdots \\ f_p^T 1_n / n \end{pmatrix} = \begin{pmatrix} \bar{f}_1 \\ \vdots \\ \bar{f}_p \end{pmatrix} $$

2. Sample Covariance

The sample covariance matrix $S$ is defined as:

$$ S = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})(x_i - \bar{x})^T $$

Using matrix notation and the Centering Operator $C$:

$$ S = \frac{X^T C X}{n} $$

The Centering Operator ($C$):

$$ C = I_n - \frac{1_n 1_n^T}{n} $$
  • $C$ is a projection matrix (idempotent: $C^2 = C$) and symmetric ($C^T = C$).
  • It “centers” the data by removing the mean.

Derivation: Let $\tilde{X} = CX = (f_1 - \bar{f}_1, \dots, f_p - \bar{f}_p)$. Then:

$$ \frac{\tilde{X}^T \tilde{X}}{n} = \frac{(CX)^T (CX)}{n} = \frac{X^T C^T C X}{n} = \frac{X^T C X}{n} $$

(Since $C^T C = C^2 = C$).

Understanding $CX$: If $X$ is a column vector (univariate sample), $CX = X - \frac{1_n 1_n^T}{n}X$. Since $\frac{1_n^T}{n}X = \bar{x}$ (scalar mean), then $\frac{1_n 1_n^T}{n}X$ results in a vector where every element is the mean. Thus, $CX$ subtracts the mean from every observation in $X$.

Properties of Sample Estimates

Let $x_1, \dots, x_n$ be p-dimensional i.i.d. random vectors with mean $\mu$ and covariance $\Sigma$.

The sample statistics are random variables:

  • Sample Mean: $\bar{x} = \frac{1}{n}\sum_{i=1}^{n}x_i$
  • Sample Covariance: $S = \frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})(x_i - \bar{x})^T$

1. Expectation and Variance of $\bar{x}$

  • Unbiasedness: $\mathbb{E}(\bar{x}) = \mu$. Thus, $\bar{x}$ is an unbiased estimator of $\mu$.

    $$ \mathbb{E}\left(\frac{1}{n}\sum_{i=1}^{n}x_i\right) = \frac{1}{n}\sum_{i=1}^{n}\mathbb{E}(x_i) = \frac{1}{n}\sum_{i=1}^{n}\mu = \mu $$
  • Variance: $Var(\bar{x}) = \frac{1}{n}\Sigma$.

    $$ Var\left(\frac{1}{n}\sum_{i=1}^{n}x_i\right) = \frac{1}{n^2}\sum_{i=1}^{n}Var(x_i) = \frac{1}{n^2}\sum_{i=1}^{n}\Sigma = \frac{1}{n}\Sigma $$

2. Expectation of $S$ (Sample Covariance)

$S$ is not an unbiased estimator of $\Sigma$. The expectation is:

$$ \mathbb{E}(S) = \frac{n-1}{n}\Sigma $$

Derivation: Recall decomposition:

$$ \sum_{i=1}^{n}(x_i - \bar{x})(x_i - \bar{x})^T = \sum_{i=1}^{n}(x_i - \mu)(x_i - \mu)^T - n(\bar{x} - \mu)(\bar{x} - \mu)^T $$

Taking expectations:

$$ \begin{aligned} \mathbb{E}(S) &= \mathbb{E}\left[ \frac{1}{n}\sum_{i=1}^{n}(x_i - \mu)(x_i - \mu)^T \right] - \mathbb{E}[(\bar{x} - \mu)(\bar{x} - \mu)^T] \\ &= \frac{1}{n}\sum_{i=1}^{n}Var(x_i) - Var(\bar{x}) \\ &= \Sigma - \frac{1}{n}\Sigma = \frac{n-1}{n}\Sigma \end{aligned} $$

Note: The unbiased estimator is $S_{unbiased} = \frac{n}{n-1}S = \frac{1}{n-1}\sum (x_i - \bar{x})(x_i - \bar{x})^T$.


Multivariate Normal (MVN) Distribution

Univariate Review

A random variable $X \sim N(\mu, \sigma^2)$ has density:

$$ f_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} $$

Multivariate Definition

A random vector $X \in \mathbb{R}^p$ is Multivariate Normal (MVN) if and only if $v^T X$ is a univariate normal random variable for all $v \in \mathbb{R}^p$.

Properties

If $X \sim N_p(\mu, \Sigma)$:

  1. Linear Transformation: For $A \in \mathbb{R}^{q \times p}$ and $b \in \mathbb{R}^q$, the vector $Y = AX + b$ is q-variate normal.
  2. Marginals: Each component $x_i$ is univariate normal.

MVN Density Function

Let $X \sim N_p(\mu, \Sigma)$ where $\Sigma$ is positive definite (PD). The density is:

$$ f_X(x) = \frac{1}{(2\pi)^{p/2} (\det(\Sigma))^{1/2}} e^{-\frac{1}{2}(x-\mu)^T \Sigma^{-1} (x-\mu)} $$
  • Mean: $\mathbb{E}(x) = \mu$
  • Variance: $Var(x) = \Sigma$

Example (Independent Components): If $x_1, \dots, x_p$ are independent with $x_i \sim N(\mu_i, \sigma_i^2)$, then $\Sigma = \text{diag}(\sigma_1^2, \dots, \sigma_p^2)$. The joint density factors into the product of marginals:

$$ f_X(x) = \prod_{i=1}^{p} f_{x_i}(x_i) = \frac{1}{(2\pi)^{p/2} \prod \sigma_i} e^{-\frac{1}{2} \sum \frac{(x_i-\mu_i)^2}{\sigma_i^2}} $$

This matches the matrix form since $\det(\Sigma) = \prod \sigma_i^2$ and the exponent term becomes a sum.


Transformations and Marginals

Linear Combinations

If $X \sim N_p(\mu, \Sigma)$, then for $y = AX + b$:

$$ y \sim N_q(A\mu + b, A\Sigma A^T) $$

Marginal Distributions

If $X \sim N_p(\mu, \Sigma)$, then any subset of variables is also normal.

  • Single variable: $x_i \sim N(\mu_i, \Sigma_{ii})$
    • Proof using transformation: Let $A = (0, \dots, 1, \dots, 0)$ (1 at $i$-th position). Then $AX = x_i$. Mean is $A\mu = \mu_i$, Variance is $A\Sigma A^T = \Sigma_{ii}$.
  • Subset vector: For index set $I$, $X_I \sim N(\mu_I, \Sigma_{II})$.

Standardization (Whitening)

  • If $X \sim N_p(0, I_p)$ and $Y = \Sigma^{1/2}X + \mu$, then $Y \sim N_p(\mu, \Sigma)$.
  • If $X \sim N_p(\mu, \Sigma)$, then $Y = \Sigma^{-1/2}(X - \mu) \sim N_p(0, I_p)$.

Independence and Density Contours

Independence

Let partitioned vector $\begin{pmatrix} X \\ Y \end{pmatrix} \sim N_{p+q} \left( \begin{pmatrix} \mu_X \\ \mu_Y \end{pmatrix}, \begin{pmatrix} \Sigma_X & 0 \\ 0 & \Sigma_Y \end{pmatrix} \right)$. If the off-diagonal block is zero (uncorrelated), then $X$ and $Y$ are independent.

$$ f_Z(x, y) = f_X(x)f_Y(y) $$

Property: For MVN, independence $\iff$ uncorrelated.

Density Contours and Mahalanobis Distance

The density $f_X(x)$ is constant where the quadratic form is constant:

$$ (x-\mu)^T \Sigma^{-1} (x-\mu) = c $$

This quadratic form $d(x, \mu)$ is called the squared Mahalanobis distance.

Confidence Ellipses

The quantity $(x-\mu)^T \Sigma^{-1} (x-\mu)$ follows a Chi-square distribution:

$$ (x-\mu)^T \Sigma^{-1} (x-\mu) \sim \chi^2(p) $$

To draw a $95\%$ confidence ellipse (for $p=2$):

  1. Find $t$ such that $P(\chi^2(2) < t) = 0.95$.
  2. The ellipse is defined by $(x-\mu)^T \Sigma^{-1} (x-\mu) = t$.

Geometry of Contours

The shape of the contours $(x-\mu)^T \Sigma^{-1} (x-\mu) = c$ depends on $\Sigma$:

  1. Identity Covariance ($\Sigma = I$): Contours are circles (hyperspheres) centered at $\mu$.

    $$ (x-\mu)^T I (x-\mu) = ||x-\mu||^2 = c $$
  2. Diagonal Covariance ($\Sigma = \text{diag}(\sigma_1^2, \sigma_2^2)$): Contours are ellipses centered at $\mu$ with axes aligned to the coordinate axes.

    $$ \frac{(x_1-\mu_1)^2}{\sigma_1^2} + \frac{(x_2-\mu_2)^2}{\sigma_2^2} = c $$

    Axes lengths are proportional to $\sigma_1$ and $\sigma_2$.

  3. General Covariance ($\Sigma$ is symmetric positive definite): Use Eigen-decomposition $\Sigma = U \Lambda U^T$. Contours are ellipses centered at $\mu$, but rotated.

    • Axes directions: Aligned with eigenvectors $u_1, \dots, u_p$.
    • Axes lengths: Proportional to $\sqrt{\lambda_1}, \dots, \sqrt{\lambda_p}$.

Marginal & Conditional Distributions

Definitions

For a random vector $X = (X_1, X_2)^T$:

  • Marginal Density: $f_{X_1}(x_1) = \int f_X(x_1, x_2) dx_2$.
  • Conditional Density: $f_{X_1|X_2}(x_1|x_2) = \frac{f_X(x_1, x_2)}{f_{X_2}(x_2)}$.

Conditional Distribution of MVN

Let $\begin{pmatrix} X \\ Y \end{pmatrix} \sim N_{p+q} \left( \begin{pmatrix} \mu_X \\ \mu_Y \end{pmatrix}, \begin{pmatrix} \Sigma_X & \Sigma_{XY} \\ \Sigma_{YX} & \Sigma_Y \end{pmatrix} \right)$.

Consider the linear transformation $Z = Y - \Sigma_{YX}\Sigma_X^{-1}X$. It can be shown that $Z$ and $X$ are uncorrelated (and thus independent for MVN).

$$ Cov(X, Z) = 0 $$

This leads to the conditional distribution formulas.


Conditional Formulas

The conditional distribution of $Y$ given $X=x$ is multivariate normal:

$$ Y | X=x \sim N_q(\mu_{Y|X}, \Sigma_{Y|X}) $$

Conditional Mean:

$$ \mathbb{E}(Y | X=x) = \mu_Y + \Sigma_{YX}\Sigma_X^{-1}(x - \mu_X) $$

Conditional Variance:

$$ Var(Y | X=x) = \Sigma_Y - \Sigma_{YX}\Sigma_X^{-1}\Sigma_{XY} $$

Note: The conditional variance does not depend on the specific value of $x$.

Example: Bivariate Case ($p=1, q=1$)

Let $X = \begin{pmatrix} X_1 \\ X_2 \end{pmatrix} \sim N_2(\mu, \Sigma)$. The conditional expectation of $X_2$ given $X_1 = x$ is a line (regression line):

$$ \mathbb{E}(X_2 | X_1=x) = \mu_2 + \frac{\Sigma_{21}}{\Sigma_{11}}(x - \mu_1) $$
  • Slope: $\frac{\Sigma_{21}}{\Sigma_{11}}$
  • Intercept: Passes through mean vector.

Conditional Variance:

$$ Var(X_2 | X_1=x) = \Sigma_{22} - \frac{\Sigma_{21}\Sigma_{12}}{\Sigma_{11}} = \Sigma_{22} - \frac{\Sigma_{12}^2}{\Sigma_{11}} $$