Exponential Family

1. Standard Definition

A distribution belongs to the exponential family if its probability density function (or mass function) can be expressed as:

$$f(x|\theta) = h(x) \exp\left(\eta(\theta)^\top T(x) - A(\eta(\theta))\right)$$

$h(x)$ (Base Measure): A function dependent solely on the data $x$, independent of the parameter $\theta$.
$T(x)$ (Sufficient Statistic): The specific data information required for inference. For $n$ independent observations, $\sum_{i=1}^n T(x_i)$ preserves all necessary information to estimate the parameters; all other data details can be discarded. $T(x)$ can be a scalar or a vector.
$\eta$ (Natural Parameter): The transformed parameter space, where $\eta = \eta(\theta)$. If $\eta = \theta$, the distribution is defined to be in Canonical Form.
$A(\eta)$ (Log-partition Function): The normalization constant (in log space) ensuring that the probability distribution integrates (or sums) to 1: $\int f(x) dx = 1$.

The log-partition function is the cumulant generating function for the sufficient statistic.

To rewrite a given distribution $f(x|\theta)$ into the exponential family form:

Take Logarithm: Compute $\log f(x|\theta)$.
Categorize Terms: Separate the resulting expression into pure $x$-terms, pure $\theta$-terms, and cross terms (containing both $x$ and $\theta$).
Analyze Cross Terms: Format the cross terms into the inner product $\eta^\top T(x)$ to explicitly extract the natural parameter $\eta$ and the sufficient statistic $T(x)$.
Determine $A(\eta)$: Identify the pure $\theta$-terms as $-A(\eta)$. Express this strictly as a function of $\eta$ (solving for $\theta$ in terms of $\eta$ if necessary).
Determine $h(x)$: Identify the pure $x$-terms (and constants) as $\log h(x)$. Consequently, $h(x) = \exp(\text{pure } x \text{ terms})$.

Original: $f(x|p) = p^x (1-p)^{1-x}$ for $x \in \{0, 1\}$
Log form: $x \log p + (1-x) \log(1-p) = x \log\left(\frac{p}{1-p}\right) + \log(1-p)$
Extraction:
- Cross term: $x \log(\frac{p}{1-p}) \implies \eta = \log(\frac{p}{1-p}), \quad T(x) = x$
- $\theta$-term: $\log(1-p)$. Substitute $p = \frac{e^\eta}{1+e^\eta}$ to get $-A(\eta) = -\log(1+e^\eta) \implies A(\eta) = \log(1+e^\eta)$
- $x$-term: None $\implies \log h(x) = 0 \implies h(x) = 1$

Original: $f(x|\lambda) = \lambda e^{-\lambda x}$ for $x \ge 0$
Log form: $\log \lambda - \lambda x$
Extraction:
- Cross term: $-\lambda x \implies \eta = -\lambda, \quad T(x) = x$
- $\theta$-term: $\log \lambda \implies -A(\eta) = \log(-\eta) \implies A(\eta) = -\log(-\eta)$
- $x$-term: None $\implies \log h(x) = 0 \implies h(x) = 1$