In statistics, maximum likelihood estimation (MLE) is a method for estimating the parameters of a hypothesized probability distribution from observed data. This method maximizes a likelihood function to ensure that the likelihood of the observed data is maximized under the assumed statistical model. The point in the parameter space where the likelihood function reaches its maximum value is the maximum likelihood estimate. This logic is not only intuitive but also flexible, and therefore has become a mainstream means of statistical inference.
Maximum likelihood estimation makes the data no longer silent, but awakens the hidden information in the data through parameter adjustment.
The basic principle of maximum likelihood estimation is to regard a set of observations as random samples from some unknown joint probability distribution. The goal is to determine the parameter values that give the highest joint probability of observing the data.
We represent the parameters controlling the joint allocation as a vector θ = [θ1, θ2, ..., θk ] so that it falls within a parameter family {f(⋅; θ) | θ ∈ Θ}, where Θ is the parameter space, a finite-dimensional subset of Euclidean space.
When we evaluate the joint density y = (y1, y2, ..., yn) on the observed data sample When , we can get a real-valued function, which is called the likelihood function Ln(θ) = Ln(θ; y). For independent and identically distributed random variables, the likelihood function is the product of the univariate density functions.
The purpose of maximum likelihood estimation is to find the parameter value that minimizes the likelihood function in the parameter space.
This process can be understood intuitively. The key to maximum likelihood estimation is to select parameter values that make the observed data most likely to occur. Computationally, a common approach is to use the natural logarithm of the likelihood function, called the log-likelihood.
By calculating what is called the likelihood function, we can find the maximum value that is possible. For some models, these equations can be solved explicitly, but in general, there is no closed-form solution, so one has to rely on numerical optimization to find the maximum likelihood estimate.
In data analysis, MLE is not just a mathematical formula, but an art of letting data speak.
In addition to numerical optimization, it is also important to note that for finite samples, there may be multiple solutions. Whether the solution we identified is indeed a (local) maximum depends on the matrix of second-order derivatives, which is called the Hessian matrix.
Usually, maximum likelihood estimation can also correspond to Bayesian inference. Under a uniform prior distribution, MLE can approximate maximum a posteriori estimation (MAP). This is especially important when performing statistical inference and building models.
The magic of maximum likelihood estimation lies in its ability to not only characterize the data itself, but also provide a meaningful basis for decision making. Therefore, whether in economics, medicine or other scientific research, MLE occupies an indispensable position.
Finally, we must reflect that the power of data lies in the process of understanding it. Have we made full use of the data to explain the stories behind it?