DOCUMENT NODE

Home » Main Content

STA414 Notation Index

January 1, 2026

Table of Contents [TOC]

STA414 Notation Index

Standard notation used across all STA414 notes. When in doubt, follow this table.

Probability and Distributions

Symbol	Meaning	Notes
$p(x)$	Probability density/mass of $x$	Lowercase for densities
$P(A)$	Probability of event $A$	Uppercase for events
$p(x \mid y)$	Conditional density of $x$ given $y$
$p(x, y)$	Joint density
$\mathcal{N}(\mu, \Sigma)$	Multivariate Gaussian	$\mathcal{N}_m$ when emphasizing dimension $m$
$\text{Bernoulli}(p)$	Bernoulli distribution
$\text{Beta}(\alpha, \beta)$	Beta Distribution
$\text{Uniform}(a,b)$	Uniform Distribution

Expectations and Operators

Symbol	Meaning	Notes
$\mathbb{E}[\cdot]$	Expected Value	Always use $\mathbb{E}$, never bare $E$
$\mathbb{E}_q[\cdot]$	Expectation under distribution $q$
$\text{Var}(\cdot)$	Variance	Use $\text{Var}$, not $\text{var}$
$\text{Cov}(\cdot, \cdot)$	Covariance
$\mathbf{1}_A$	Indicator Function for event $A$	$I(\cdot)$ also used in lecture

Information Theory

Symbol	Meaning	Notes
$D_{\text{KL}}(q \\| p)$	KL Divergence from $q$ to $p$	Always use $\text{KL}$ in subscript, never $\mathrm{KL}$
$\mathcal{L}(q)$	ELBO	Also written $\mathrm{ELBO}(q)$ in some contexts
$H(p)$	Entropy of $p$

Linear Algebra

Symbol	Meaning	Notes
$x^\top$	Transpose	Always use $\top$, never $T$
$\\|x\\|$	Euclidean norm
$	A	$ or $\det(A)$
$A^{-1}$	Matrix inverse
$\text{diag}(\cdot)$	Diagonal matrix
$I$	Identity matrix
$\odot$	Element-wise (Hadamard) product	Used in forward algorithm

Optimization

Symbol	Meaning	Notes
$\nabla f$	Gradient of $f$
$\nabla^2 f$	Hessian of $f$
$\arg\max$	Argument of the maximum	Use $\arg\max$, not $\text{argmax}$
$\arg\min$	Argument of the minimum
$\theta^{(t)}$	Parameter at iteration $t$

Neural Networks

Symbol	Meaning	Notes
$W^{(\ell)}$	Weight matrix at layer $\ell$	Backpropagation
$b^{(\ell)}$	Bias vector at layer $\ell$	Backpropagation
$z^{(\ell)}$	Pre-activation vector at layer $\ell$	$W^{(\ell)} a^{(\ell-1)} + b^{(\ell)}$
$a^{(\ell)}$	Activation vector at layer $\ell$	$a^{(0)} = x$
$\delta^{(\ell)}$	Backpropagated error signal at layer $\ell$	$\partial \ell / \partial z^{(\ell)}$
$\ell(y,\hat{y})$	Per-observation supervised loss	Lowercase $\ell$ to avoid collision with ELBO $\mathcal{L}$
$J(\theta)$	Full training objective	Sum or average of per-observation losses

Model-Specific

Symbol	Meaning	Context
$z$	Latent variable	Latent Variable Model
$\theta$	Model parameters
$\phi$	Variational Parameters
$q(z \mid \phi)$	Variational distribution	Variational Inference (VI)
$Q(\theta \mid \theta^{(t)})$	Expected complete-data log-likelihood	Expectation-Maximization (EM)
$r_{nk}$	Responsibility of component $k$ for point $n$	Gaussian Mixture Model (GMM)
$\pi_k$	Mixing weight / initial distribution	GMM or Hidden Markov Model (HMM)
$A_{ij}$	Transition probability $i \to j$	HMM
$\lambda_t(k)$	Emission likelihood $p(x_t \mid z_t = k)$	HMM
$\alpha_t(k)$	Forward message	Forward-Backward Algorithm
$\beta_t(k)$	Backward message	Forward-Backward Algorithm
$\delta_t(k)$	Viterbi max-probability	Viterbi Algorithm
$\eta$	Natural parameters	Exponential Family
$T(x)$	Sufficient Statistics	Exponential Family
$A(\eta)$	Log-partition function	Exponential Family

Functions

Symbol	Meaning	Notes
$\Gamma(z)$	Gamma Function
$S(x)$	Survival Function	$1 - F(x)$
$h(x)$	Hazard Function	$f(x)/S(x)$
$\mathcal{I}(\theta)$	Fisher Information