STA414 Notation Index

Standard notation used across all STA414 notes. When in doubt, follow this table.

Probability and Distributions

SymbolMeaningNotes
$p(x)$Probability density/mass of $x$Lowercase for densities
$P(A)$Probability of event $A$Uppercase for events
$p(x \mid y)$Conditional density of $x$ given $y$
$p(x, y)$Joint density
$\mathcal{N}(\mu, \Sigma)$Multivariate Gaussian$\mathcal{N}_m$ when emphasizing dimension $m$
$\text{Bernoulli}(p)$Bernoulli distribution
$\text{Beta}(\alpha, \beta)$Beta Distribution
$\text{Uniform}(a,b)$Uniform Distribution

Expectations and Operators

SymbolMeaningNotes
$\mathbb{E}[\cdot]$Expected ValueAlways use $\mathbb{E}$, never bare $E$
$\mathbb{E}_q[\cdot]$Expectation under distribution $q$
$\text{Var}(\cdot)$VarianceUse $\text{Var}$, not $\text{var}$
$\text{Cov}(\cdot, \cdot)$Covariance
$\mathbf{1}_A$Indicator Function for event $A$$I(\cdot)$ also used in lecture

Information Theory

SymbolMeaningNotes
$D_{\text{KL}}(q \| p)$KL Divergence from $q$ to $p$Always use $\text{KL}$ in subscript, never $\mathrm{KL}$
$\mathcal{L}(q)$ELBOAlso written $\mathrm{ELBO}(q)$ in some contexts
$H(p)$Entropy of $p$

Linear Algebra

SymbolMeaningNotes
$x^\top$TransposeAlways use $\top$, never $T$
$\|x\|$Euclidean norm
$A$ or $\det(A)$
$A^{-1}$Matrix inverse
$\text{diag}(\cdot)$Diagonal matrix
$I$Identity matrix
$\odot$Element-wise (Hadamard) productUsed in forward algorithm

Optimization

SymbolMeaningNotes
$\nabla f$Gradient of $f$
$\nabla^2 f$Hessian of $f$
$\arg\max$Argument of the maximumUse $\arg\max$, not $\text{argmax}$
$\arg\min$Argument of the minimum
$\theta^{(t)}$Parameter at iteration $t$

Neural Networks

SymbolMeaningNotes
$W^{(\ell)}$Weight matrix at layer $\ell$Backpropagation
$b^{(\ell)}$Bias vector at layer $\ell$Backpropagation
$z^{(\ell)}$Pre-activation vector at layer $\ell$$W^{(\ell)} a^{(\ell-1)} + b^{(\ell)}$
$a^{(\ell)}$Activation vector at layer $\ell$$a^{(0)} = x$
$\delta^{(\ell)}$Backpropagated error signal at layer $\ell$$\partial \ell / \partial z^{(\ell)}$
$\ell(y,\hat{y})$Per-observation supervised lossLowercase $\ell$ to avoid collision with ELBO $\mathcal{L}$
$J(\theta)$Full training objectiveSum or average of per-observation losses

Model-Specific

SymbolMeaningContext
$z$Latent variableLatent Variable Model
$\theta$Model parameters
$\phi$Variational Parameters
$q(z \mid \phi)$Variational distributionVariational Inference (VI)
$Q(\theta \mid \theta^{(t)})$Expected complete-data log-likelihoodExpectation-Maximization (EM)
$r_{nk}$Responsibility of component $k$ for point $n$Gaussian Mixture Model (GMM)
$\pi_k$Mixing weight / initial distributionGMM or Hidden Markov Model (HMM)
$A_{ij}$Transition probability $i \to j$HMM
$\lambda_t(k)$Emission likelihood $p(x_t \mid z_t = k)$HMM
$\alpha_t(k)$Forward messageForward-Backward Algorithm
$\beta_t(k)$Backward messageForward-Backward Algorithm
$\delta_t(k)$Viterbi max-probabilityViterbi Algorithm
$\eta$Natural parametersExponential Family
$T(x)$Sufficient StatisticsExponential Family
$A(\eta)$Log-partition functionExponential Family

Functions

SymbolMeaningNotes
$\Gamma(z)$Gamma Function
$S(x)$Survival Function$1 - F(x)$
$h(x)$Hazard Function$f(x)/S(x)$
$\mathcal{I}(\theta)$Fisher Information