Quesiton 2 examines the basic statistical properties of the data matrix (Data Matrix), especially the calculation of Sample Mean and Sample Covariance Matrix, and the impact of linear transformation on them.
Core settings of the question:
- $X \in \mathbb{R}^{10 \times 3}$: 10 samples, 3 variables. $n=10, p=3$.
- Column-centered: The mean of each column is 0.
- Column-orthogonal: $X^\top X = I_3$.
(a) Find the sample mean vector $\bar{x}$ and the sample covariance matrix $S$ of $X$.
These two concepts describe the “positional nature” and “interrelationship” of matrix column vectors respectively. They are critical in data preprocessing (such as PCA principal component analysis) and linear algebra.
We think of a matrix $A$ as consisting of several column vectors: $A = [c_1, c_2, ..., c_n]$.
Column Centered / Orthogonal
1. Column Centered
**One sentence explanation: The average value of each column is 0. **
**Intuitive meaning (statistical perspective): ** Imagine that each column of the matrix represents a feature (for example, the first column is “height” and the second column is “weight”). “Column centering” is to move the origin of this set of data to the center (mean point) of the data.
It turns out your height data might be $[170, 180, 160]$ (mean 170).
After centering, it becomes $[0, 10, -10]$ (mean 0).
This is exactly what the $I - J$ (Centering Matrix) matrix you learned earlier does.
Mathematical definition: For each column $c_j$ of matrix $A$, the sum of all its elements is 0:
$$\sum_{i=1}^m a_{ij} = 0 \quad \text{or represented by vector} \quad \mathbf{1}^\top c_j = 0$$This means that the column vector $c_j$ is orthogonal to the all-1 vector $\mathbf{1}$.
2. Column Orthogonal (column orthogonal)
**One sentence explanation: Any two columns are perpendicular to each other. **
**Intuitive meaning (geometric perspective): ** Column vectors do not interfere with each other, and the included angle is 90 degrees. If you are drawing a graph and the first column is the X-axis direction and the second column is the Y-axis direction, they are orthogonal.
Statistical significance: If the data has been “column centered”, then “column orthogonality” usually means that the two variables are Uncorrelated. That is, knowing the changing trend of variable A, variable B cannot be predicted.
Mathematical definition: For any two columns $c_i$ and $c_j$ ($i \neq j$) of matrix $A$, their dot product is 0:
$$c_i^\top c_j = 0$$The form of matrix multiplication written is: $A^\top A$ is a diagonal matrix (Diagonal Matrix).
An important distinction (Orthogonal vs Orthonormal):
Column Orthogonal: As long as it is vertical, the length is arbitrary.
Column Orthonormal: Not only are they perpendicular to each other, but the length (modulo) of each column must be 1. If the square matrix meets this condition, it is the legendary Orthogonal Matrix and satisfies $Q^\top Q = I$.
How Centering Operator works
Okay, let’s demonstrate this with a concrete numerical example.
The so-called Centering Operator is usually recorded as $C$. Its mathematical essence is $C = I - J$. How it works is very intuitive: “raw data - average”.
We set a matrix $X$ of $3 \times 2$. This means we have 3 samples (rows) and each sample has 2 features (columns).
1. Set data matrix $X$
To facilitate oral calculation, we choose a simple set of integers:
$$X = \begin{bmatrix} 1 & 2 \\ 2 & 4 \\ 3 & 6 \end{bmatrix}$$- Column 1 (Feature 1): The data is 1, 2, 3. The average is 2.
- Column 2 (Feature 2): The data is 2, 4, 6. The average is 4.
Our goal is to make the first column $[-1, 0, 1]^\top$ and the second column $[-2, 0, 2]^\top$ (each column minus its own mean).
2. Construct the centralized operator $C$
Because there are 3 rows of data, $n=3$. The centralization operator $C$ is a square matrix of $3 \times 3$:
$$C = I - J = I - \frac{1}{3}\mathbf{1}\mathbf{1}^\top$$$$I = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \quad J = \begin{bmatrix} 1/3 & 1/3 & 1/3 \\ 1/3 & 1/3 & 1/3 \\ 1/3 & 1/3 & 1/3 \end{bmatrix}$$Subtract to get $C$:
$$C = \begin{bmatrix} 2/3 & -1/3 & -1/3 \\ -1/3 & 2/3 & -1/3 \\ -1/3 & -1/3 & 2/3 \end{bmatrix}$$3. Witness the miracle: matrix multiplication $CX$
Now we use operator $C$ to left multiply the data matrix $X$:
$$X_{\text{centered}} = C \cdot X$$$$= \begin{bmatrix} 2/3 & -1/3 & -1/3 \\ -1/3 & 2/3 & -1/3 \\ -1/3 & -1/3 & 2/3 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 2 & 4 \\ 3 & 6 \end{bmatrix}$$Let’s count the elements of the first row and the first column (corresponding to the first feature of the first sample):
$$(\frac{2}{3} \times 1) + (-\frac{1}{3} \times 2) + (-\frac{1}{3} \times 3) = \frac{2}{3} - \frac{2}{3} - \frac{3}{3} = -1$$(Yes! The original data is 1, the mean is 2, and the result is indeed -1)
Let’s count the elements of the second row and the first column:
$$(-\frac{1}{3} \times 1) + (\frac{2}{3} \times 2) + (-\frac{1}{3} \times 3) = -\frac{1}{3} + \frac{4}{3} - \frac{3}{3} = 0$$(Yes! The original data is 2, the mean is 2, and the result is indeed 0)
By analogy, the final result is:
$$X_{\text{centered}} = \begin{bmatrix} -1 & -2 \\ 0 & 0 \\ 1 & 2 \end{bmatrix}$$4. A more intuitive way to understand: decompose actions
Directly multiplying that $C$ matrix is rigorous, but seems tiring. The smartest way to understand it is to use the properties of $C = I - J$ to break down the operation:
$$CX = (I - J)X = \underbrace{IX}_{\text{Original data}} - \underbrace{JX}_{\text{Mean matrix}}$$Let’s see what $JX$ is: $J$ is an “averaged” matrix. When applied to $X$, it calculates the mean of each column and fills the entire column with it.
$$JX = \begin{bmatrix} 1/3 & 1/3 & 1/3 \\ 1/3 & 1/3 & 1/3 \\ 1/3 & 1/3 & 1/3 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 2 & 4 \\ 3 & 6 \end{bmatrix} = \begin{bmatrix} 2 & 4 \\ 2 & 4 \\ 2 & 4 \end{bmatrix}$$(Look, this is the mean of each column, repeated 3 times)
Finally do the subtraction:
$$X - JX = \begin{bmatrix} 1 & 2 \\ 2 & 4 \\ 3 & 6 \end{bmatrix} - \begin{bmatrix} 2 & 4 \\ 2 & 4 \\ 2 & 4 \end{bmatrix} = \begin{bmatrix} -1 & -2 \\ 0 & 0 \\ 1 & 2 \end{bmatrix}$$Summarize
The function of Centering Operator $C$ is: **Keep the shape of the data ($I$), minus the center of gravity of the data ($J$). **
The result is that the mean values of all data columns become 0, achieving Column Centered.
Outer-Product calculation example and why its rank is always 1
no problem. To help you see the pattern, this time we don’t use all 1 vectors, but two completely different vectors.
It’s like an expanded version of the multiplication table.
Let’s say we have two vectors:
- Column vector $u$ (put on the left): $\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$
- Row vector $v^\top$ (put on the right): $\begin{bmatrix} 4 & 5 & 6 \end{bmatrix}$
1. Get ready
$$u v^\top = \begin{bmatrix} \mathbf{1} \\ \mathbf{2} \\ \mathbf{3} \end{bmatrix} \times \begin{bmatrix} \color{blue}4 & \color{blue}5 & \color{blue}6 \end{bmatrix}$$2. Calculation steps: Take what you need
The calculation rule is still: $C_{ij} = u_i \times v_j$ (the $i$th number on the left $\times$ the $j$th number on the right).
**First row (responsible for 1 on the left): ** It is 1 times of $v^\top$.
$$1 \times [4, 5, 6] \rightarrow \begin{bmatrix} 4 & 5 & 6 \end{bmatrix}$$**Second row (responsible by 2 on the left): ** It is 2 times of $v^\top$.
$$2 \times [4, 5, 6] \rightarrow \begin{bmatrix} 8 & 10 & 12 \end{bmatrix}$$**Third row (responsible for 3 on the left): ** It is 3 times the size of $v^\top$.
$$3 \times [4, 5, 6] \rightarrow \begin{bmatrix} 12 & 15 & 18 \end{bmatrix}$$
3. Final result
$$\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \begin{bmatrix} 4 & 5 & 6 \end{bmatrix} =\begin{bmatrix} 4 & 5 & 6 \\ 8 & 10 & 12 \\ 12 & 15 & 18 \end{bmatrix}$$4. The core rules here (see through at a glance)
Take a closer look at the last $3\times3$ matrix, and you will find a surprising pattern, which is very important for linear algebra:
- Look at “Rows”:
- Line 2 $(8, 10, 12)$ is exactly 2 times of line 1.
- Line 3 $(12, 15, 18)$ is exactly 3 times the size of line 1.
- Conclusion: All rows are linearly related (that is, parallel).
- Look at “Columns”:
- Column 1 is $\begin{bmatrix} 4 \\ 8 \\ 12 \end{bmatrix}$, which is 4 times the size of $\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$.
- Column 2 is $\begin{bmatrix} 5 \\ 10 \\ 15 \end{bmatrix}$, which is 5 times the size of $\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$.
- Conclusion: All columns are essentially multiples of the vector $u$ on the left.
This is why we often say that the matrix generated by Outer Product has a rank (Rank) of always 1. Because although it looks large ($3\times3$), all the rows are the same (just scaled differently), and all the columns are also the same.
The specific calculation expression of Y matrix and X in (b)
no problem! Completely understand your needs.
To give you the clearest view, let’s drop all abbreviations and expand all 30 data points (3 columns × 10 rows). We define the symbol $f_{1, i}$ to represent the $i$th observation of the 1st variable (column 1).
Step 1: Disassemble the original matrix $X$
The title says that $X$ has 3 columns and 10 rows in each column. In order not to clutter the screen, I use $f_{1,1}$ to represent the number in the first column and the first row, and so on.
$$X = \begin{pmatrix} f_{1,1} & f_{2,1} & f_{3,1} \\ f_{1,2} & f_{2,2} & f_{3,2} \\ \vdots & \vdots & \vdots \\ f_{1,10} & f_{2,10} & f_{3,10} \end{pmatrix}$$**What does the key property given in the question (orthogonality $X^\top X = I$) mean in this expression? ** This means:
- Multiply itself (sum of squares) by itself is 1:
- Sum of squares in column 1: $\sum_{i=1}^{10} (f_{1,i})^2 = 1$
- Sum of squares in column 2: $\sum_{i=1}^{10} (f_{2,i})^2 = 1$
- Sum of squares in column 3: $\sum_{i=1}^{10} (f_{3,i})^2 = 1$
- Multiply yourself by others (sum of cross products) is 0:
- Column 1 $\times$ Column 2: $\sum_{i=1}^{10} f_{1,i} f_{2,i} = 0$
- (Other pairwise combinations are also 0 by analogy)
Step 2: Write the transformed matrix $Y$
Topic requirements:
- $f_1$ (column 1) becomes meters $\to$ cm ($\times 100$)
- $f_2$ (column 2) keeps cm unchanged ($\times 1$)
- $f_3$ (column 3) becomes mm $\to$ cm ($\times 0.1$)
So the matrix $Y$ is to multiply each element in each column of $X$ by the corresponding coefficient:
$$Y = \begin{pmatrix} 100 f_{1,1} & 1 f_{2,1} & 0.1 f_{3,1} \\ 100 f_{1,2} & 1 f_{2,2} & 0.1 f_{3,2} \\ \vdots & \vdots & \vdots \\ 100 f_{1,10} & 1 f_{2,10} & 0.1 f_{3,10} \end{pmatrix}$$Step 3: Calculate $Y^\top Y$ (core of covariance)
The covariance matrix we require is $S_Y = \frac{1}{10} Y^\top Y$. Let’s start with the most complex $Y^\top Y$. $Y^\top$ is to lay down (transpose) $Y$ into 3 rows × 10 columns:
$$Y^\top = \begin{pmatrix} 100f_{1,1} & 100f_{1,2} & \dots & 100f_{1,10} \\ 1f_{2,1} & 1f_{2,2} & \dots & 1f_{2,10} \\ 0.1f_{3,1} & 0.1f_{3,2} & \dots & 0.1f_{3,10} \end{pmatrix}$$Now do the matrix multiplication $Y^\top \times Y$ (3 rows and 10 columns $\times$ 10 rows and 3 columns = 3 rows and 3 columns):
Let’s look at it element by element:
1. Calculate the element in the upper left corner (row 1 $\times$ column 1)
This is the dot product of row 1 of $Y^\top$ (row of $f_1$) and column 1 of $Y$ (column of $f_1$).
$$\text{Result}_{1,1} = (100 f_{1,1} \times 100 f_{1,1}) + (100 f_{1,2} \times 100 f_{1,2}) + \dots + (100 f_{1,10} \times 100 f_{1,10})$$Extract common factors $100 \times 100 = 10000$:
$$= 10000 \underbrace{(f_{1,1}^2 + f_{1,2}^2 + \dots + f_{1,10}^2)}_{\text{This is the sum of the squares of column 1 of X, the question is known to be 1}}$$$$= 10000 \times 1 = \mathbf{10000}$$2. Calculate the middle element (line 2 $\times$ column 2)
This is the dot product of row 2 of $Y^\top$ and column 2 of $Y$ .
$$\text{Result}_{2,2} = (1 f_{2,1} \times 1 f_{2,1}) + \dots = 1 \times \underbrace{\sum f_{2,i}^2}_{1} = \mathbf{1}$$3. Calculate the element in the lower right corner (row 3 $\times$ column 3)
This is the dot product of row 3 of $Y^\top$ and column 3 of $Y$ .
$$\text{Result}_{3,3} = (0.1 f_{3,1} \times 0.1 f_{3,1}) + \dots = 0.01 \times \underbrace{\sum f_{3,i}^2}_{1} = \mathbf{0.01}$$4. Calculate off-diagonal elements (such as row 1 $\times$ column 2)
This is the dot product of row 1 of $Y^\top$ and column 2 of $Y$ .
$$\text{Result}_{1,2} = (100 f_{1,1} \times 1 f_{2,1}) + (100 f_{1,2} \times 1 f_{2,2}) + \dots$$Extract constant $100 \times 1 = 100$:
$$= 100 \underbrace{(f_{1,1}f_{2,1} + f_{1,2}f_{2,2} + \dots + f_{1,10}f_{2,10})}_{\text{This is the dot product of column 1 and column 2 of X. The question is known to be orthogonal to 0}}$$$$= 100 \times 0 = \mathbf{0}$$(Similarly, all non-diagonal elements, because the $f$ multiplication involving different columns, the result is 0)
Step 4: Summarize the results
After the above disassembly calculation, the result matrix of $Y^\top Y$ is:
$$Y^\top Y = \begin{pmatrix} 10000 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0.01 \end{pmatrix}$$Finally, the sample covariance matrix $S_Y$ required by the question is the above result divided by $n=10$:
$$S_Y = \frac{1}{10} \begin{pmatrix} 10000 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0.01 \end{pmatrix} = \begin{pmatrix} 1000 & 0 & 0 \\ 0 & 0.1 & 0 \\ 0 & 0 & 0.001 \end{pmatrix}$$Doesn’t it look much clearer this way? Its essence is: **Because the original data are perpendicular to each other (orthogonal), no new correlation will be generated (off-diagonal or 0) no matter how you scale it, it will only change its own variance (diagonal value). **
Continue to solve matrix multiplication $\mathbf{1}^\top X$
What we want to calculate is $\mathbf{1}^\top X$.
Here $\mathbf{1}$ represents a vector of ones (Vector of ones). Because $X$ has 10 rows (10 observations), in order to perform matrix multiplication, $\mathbf{1}$ must be a column vector of $10 \times 1$.
So $\mathbf{1}^\top$ is a row vector of $1 \times 10$, which is all 1’s.
Step 1: Write out the matrices and vectors
$$ \mathbf{1}^\top X = \underbrace{\begin{pmatrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \end{pmatrix}}_{1 \times 10 \text{ Full 1-row vector}} \times \underbrace{\begin{pmatrix} f_{1,1} & f_{2,1} & f_{3,1} \\ f_{1,2} & f_{2,2} & f_{3,2} \\ f_{1,3} & f_{2,3} & f_{3,3} \\ \vdots & \vdots & \vdots \\ f_{1,10} & f_{2,10} & f_{3,10} \end{pmatrix}}_{10 \times 3 \text{ data matrix } X} $$Step 2: Perform matrix multiplication (rows $\times$ columns)
The size of the resulting matrix should be $(1 \times 10) \times (10 \times 3) = \mathbf{1 \times 3}$ (a row vector containing 3 numbers).
We need to take the “whole row” on the left and multiply “each column” on the right.
**1. The first number of the calculation result (corresponding to the first column of $X$ $f_1$): ** Each 1 on the left is multiplied by each number in column 1 and then added together.
$$= 1 \cdot f_{1,1} + 1 \cdot f_{1,2} + 1 \cdot f_{1,3} + \dots + 1 \cdot f_{1,10}$$This is actually the sum of all elements in the column $f_1$.
**2. Calculate the second number of the result (corresponding to the second column of $X$ $f_2$): **
$$= 1 \cdot f_{2,1} + 1 \cdot f_{2,2} + \dots + 1 \cdot f_{2,10}$$This is actually the sum of all elements in the column $f_2$.
**3. The third number of the calculation result (corresponding to the third column of $X$ $f_3$): **
$$= 1 \cdot f_{3,1} + \dots + 1 \cdot f_{3,10}$$This is actually the sum of all elements in the column $f_3$.
Step 3: Write the intermediate results
After the above calculation, the result of $\mathbf{1}^\top X$ is:
$$ \mathbf{1}^\top X = \begin{pmatrix} \sum_{i=1}^{10} f_{1,i} & \sum_{i=1}^{10} f_{2,i} & \sum_{i=1}^{10} f_{3,i} \end{pmatrix} $$In other words, $\mathbf{1}^\top$ has a nickname in linear algebra, called “summation operator”. Whoever it hits, it adds up all the numbers in that column.
Step 4: Substitute the question conditions (Column-Centered)
Now back to the most critical known conditions of the question: "$X$ is column-centered".
This means that the mean of each column is 0.
$$\text{mean} = \frac{\text{sum}}{n} = 0 \quad \Rightarrow \quad \text{sum} = 0$$Since the mean of each column is 0, the sum of each column must also be 0.
so:
- Sum of column 1 $\sum f_{1,i} = 0$
- Sum of column 2 $\sum f_{2,i} = 0$
- Sum of column 3 $\sum f_{3,i} = 0$
Step 5: Final Result
Substitute 0 into the vector just now:
$$ \mathbf{1}^\top X = \begin{pmatrix} 0 & 0 & 0 \end{pmatrix} = \mathbf{0}^\top $$This is why $\mathbf{1}^\top X = \mathbf{0}^\top$ is written directly in the parsing. The physical meaning of this step is: **Add the data in each column, and the result will be 0. **