1. Standard Definition

A distribution belongs to the exponential family if its probability density function (or mass function) can be expressed as:

$$f(x|\theta) = h(x) \exp\left(\eta(\theta)^\top T(x) - A(\eta(\theta))\right)$$

Components Breakdown:

  • $h(x)$ (Base Measure): A function dependent solely on the data $x$, independent of the parameter $\theta$.
  • $T(x)$ (Sufficient Statistic): The specific data information required for inference. For $n$ independent observations, $\sum_{i=1}^n T(x_i)$ preserves all necessary information to estimate the parameters; all other data details can be discarded. $T(x)$ can be a scalar or a vector.
  • $\eta$ (Natural Parameter): The transformed parameter space, where $\eta = \eta(\theta)$. If $\eta = \theta$, the distribution is defined to be in Canonical Form.
  • $A(\eta)$ (Log-partition Function): The normalization constant (in log space) ensuring that the probability distribution integrates (or sums) to 1: $\int f(x) dx = 1$.

2. Crucial Identities of $A(\eta)$

The log-partition function is the cumulant generating function for the sufficient statistic.

  1. First Derivative (Mean): $$A'(\eta) = \mathbb{E}[T(X)]$$
  2. Second Derivative (Variance): $$A''(\eta) = \text{Var}(T(X))$$

3. Standard Steps for Identification

To rewrite a given distribution $f(x|\theta)$ into the exponential family form:

  1. Take Logarithm: Compute $\log f(x|\theta)$.

  2. Categorize Terms: Separate the resulting expression into pure $x$-terms, pure $\theta$-terms, and cross terms (containing both $x$ and $\theta$).

  3. Analyze Cross Terms: Format the cross terms into the inner product $\eta^\top T(x)$ to explicitly extract the natural parameter $\eta$ and the sufficient statistic $T(x)$.

  4. Determine $A(\eta)$: Identify the pure $\theta$-terms as $-A(\eta)$. Express this strictly as a function of $\eta$ (solving for $\theta$ in terms of $\eta$ if necessary).

  5. Determine $h(x)$: Identify the pure $x$-terms (and constants) as $\log h(x)$. Consequently, $h(x) = \exp(\text{pure } x \text{ terms})$.

4. Analytical Examples

Example A: Bernoulli Distribution (Discrete)

  • Original: $f(x|p) = p^x (1-p)^{1-x}$ for $x \in \{0, 1\}$
  • Log form: $x \log p + (1-x) \log(1-p) = x \log\left(\frac{p}{1-p}\right) + \log(1-p)$
  • Extraction:
    • Cross term: $x \log(\frac{p}{1-p}) \implies \eta = \log(\frac{p}{1-p}), \quad T(x) = x$
    • $\theta$-term: $\log(1-p)$. Substitute $p = \frac{e^\eta}{1+e^\eta}$ to get $-A(\eta) = -\log(1+e^\eta) \implies A(\eta) = \log(1+e^\eta)$
    • $x$-term: None $\implies \log h(x) = 0 \implies h(x) = 1$

Example B: Exponential Distribution (Continuous)

  • Original: $f(x|\lambda) = \lambda e^{-\lambda x}$ for $x \ge 0$
  • Log form: $\log \lambda - \lambda x$
  • Extraction:
    • Cross term: $-\lambda x \implies \eta = -\lambda, \quad T(x) = x$
    • $\theta$-term: $\log \lambda \implies -A(\eta) = \log(-\eta) \implies A(\eta) = -\log(-\eta)$
    • $x$-term: None $\implies \log h(x) = 0 \implies h(x) = 1$