STA414 Notation Index#
Standard notation used across all STA414 notes. When in doubt, follow this table.
Probability and Distributions#
| Symbol | Meaning | Notes |
|---|
| $p(x)$ | Probability density/mass of $x$ | Lowercase for densities |
| $P(A)$ | Probability of event $A$ | Uppercase for events |
| $p(x \mid y)$ | Conditional density of $x$ given $y$ | |
| $p(x, y)$ | Joint density | |
| $\mathcal{N}(\mu, \Sigma)$ | Multivariate Gaussian | $\mathcal{N}_m$ when emphasizing dimension $m$ |
| $\text{Bernoulli}(p)$ | Bernoulli distribution | |
| $\text{Beta}(\alpha, \beta)$ | Beta Distribution | |
| $\text{Uniform}(a,b)$ | Uniform Distribution | |
Expectations and Operators#
| Symbol | Meaning | Notes |
|---|
| $\mathbb{E}[\cdot]$ | Expected Value | Always use $\mathbb{E}$, never bare $E$ |
| $\mathbb{E}_q[\cdot]$ | Expectation under distribution $q$ | |
| $\text{Var}(\cdot)$ | Variance | Use $\text{Var}$, not $\text{var}$ |
| $\text{Cov}(\cdot, \cdot)$ | Covariance | |
| $\mathbf{1}_A$ | Indicator Function for event $A$ | $I(\cdot)$ also used in lecture |
| Symbol | Meaning | Notes |
|---|
| $D_{\text{KL}}(q \| p)$ | KL Divergence from $q$ to $p$ | Always use $\text{KL}$ in subscript, never $\mathrm{KL}$ |
| $\mathcal{L}(q)$ | ELBO | Also written $\mathrm{ELBO}(q)$ in some contexts |
| $H(p)$ | Entropy of $p$ | |
Linear Algebra#
| Symbol | Meaning | Notes |
|---|
| $x^\top$ | Transpose | Always use $\top$, never $T$ |
| $\|x\|$ | Euclidean norm | |
| $ | A | $ or $\det(A)$ |
| $A^{-1}$ | Matrix inverse | |
| $\text{diag}(\cdot)$ | Diagonal matrix | |
| $I$ | Identity matrix | |
| $\odot$ | Element-wise (Hadamard) product | Used in forward algorithm |
Optimization#
| Symbol | Meaning | Notes |
|---|
| $\nabla f$ | Gradient of $f$ | |
| $\nabla^2 f$ | Hessian of $f$ | |
| $\arg\max$ | Argument of the maximum | Use $\arg\max$, not $\text{argmax}$ |
| $\arg\min$ | Argument of the minimum | |
| $\theta^{(t)}$ | Parameter at iteration $t$ | |
Neural Networks#
| Symbol | Meaning | Notes |
|---|
| $W^{(\ell)}$ | Weight matrix at layer $\ell$ | Backpropagation |
| $b^{(\ell)}$ | Bias vector at layer $\ell$ | Backpropagation |
| $z^{(\ell)}$ | Pre-activation vector at layer $\ell$ | $W^{(\ell)} a^{(\ell-1)} + b^{(\ell)}$ |
| $a^{(\ell)}$ | Activation vector at layer $\ell$ | $a^{(0)} = x$ |
| $\delta^{(\ell)}$ | Backpropagated error signal at layer $\ell$ | $\partial \ell / \partial z^{(\ell)}$ |
| $\ell(y,\hat{y})$ | Per-observation supervised loss | Lowercase $\ell$ to avoid collision with ELBO $\mathcal{L}$ |
| $J(\theta)$ | Full training objective | Sum or average of per-observation losses |
Model-Specific#
Functions#