Maximum Likelihood Estimation (MLE)
Maximum Likelihood Estimation (MLE) estimates an unknown parameter by choosing the value that makes the observed data most probable under the model.
1. Definition
Given i.i.d. observations $x_1,\dots,x_n$ from a model $p(x|\theta)$, the likelihood function is
$$L(\theta) = \prod_{i=1}^n p(x_i|\theta)$$The MLE is
$$\hat{\theta}_{\text{MLE}} = \arg\max_\theta L(\theta)$$Since $\log(\cdot)$ is strictly increasing, this is equivalent to maximizing the log-likelihood
$$\ell(\theta) = \log L(\theta) = \sum_{i=1}^n \log p(x_i|\theta)$$2. Standard Procedure
For differentiable models, MLE is typically obtained by solving
$$\frac{\partial \ell(\theta)}{\partial \theta} = 0$$and checking that the solution gives a maximum.
3. Interpretation
MLE selects the parameter under which the observed sample is most likely to have been generated.
4. Core Properties
- Consistency: Under regularity conditions, $\hat{\theta}_{\text{MLE}} \to \theta_0$ as $n \to \infty$.
- Asymptotic Normality: For large $n$,
where $I(\theta)$ is the Fisher Information.
- Invariance: If $\hat{\theta}_{\text{MLE}}$ is the MLE of $\theta$, then the MLE of $g(\theta)$ is $g(\hat{\theta}_{\text{MLE}})$.
5. Limitation
For latent-variable models, direct maximization of $\ell(\theta)$ is often difficult, which motivates methods such as EM and VI.