M-estimator

From Wikipedia, the free encyclopedia

In statistics, M-estimators are a broad class of statistics which are obtained as the solution to the problem of minimizating certain functions of the data.

Some authors define M-estimators to be the root or roots of a system of equations consisting of certain functions of the data. This class is a subset of the class of minimization solutions. Typically these functions are the derivatives of the functions to be minimized in the broader definition.

Many classical statistics can be shown to be M-estimators. Their main utility, however, is as robust alternatives to classical statistical estimators.

1 Historical Motivation
2 Definition
3 Types of M-estimators
4 Computation
5 Properties
- 5.1 Distribution
- 5.2 Influence function
6 Applications
7 Examples
- 7.1 Mean
8 See also
9 References

[edit] Historical Motivation

For a family of probability density functions f parameterized by θ, the maximum likelihood estimate of θ (which could be vector valued) are computed by maximizing the likelihood function over θ. The estimate is

$\hat{\theta} = \operatorname{argmax}_{\theta}{ \left( \prod_{i=1}^n f(x_i, \theta) \right) }\,\!$

or, equivalently,

$\hat{\theta} = \operatorname{argmin}_{\theta}{ \left( -\sum_{i=1}^n \log{( f(x_i, \theta) ) }\right) }.\,\!$

The performance of maximum likelihood estimators depends heavily on the assumed distribution family of the data being true or close to it. In particular, maximum likelihood estimators can be inefficient and biased when the data are not from the assumed distribution. Of particular concern is the presence of outliers.

[edit] Definition

In 1964, Huber proposed generalizing maximum likelihood estimation to the minimization of

$\sum_{i=1}^n\rho(x_i, \theta),\,\!$

where ρ is a function with certain properties (see below). The solutions

$\hat{\theta} = \operatorname{argmin}_{\theta}\left(\sum_{i=1}^n\rho(x_i, \theta)\right) \,\!$

are called M-estimators ("M" for "maximum likelihood-type" (Huber, 1981, page 43); other types of robust estimator include L-estimators, R-estimators and S-estimators). Maximum likelihood estimators are thus a special case of M-estimators.

The function ρ, or its derivative, ψ, can be chosen in such a way to provide the estimator attractive behaviour (in terms of bias and efficiency) when the data are truly from the assumed distribution, and 'not bad' behaviour when the data are generated from a model that is, in some sense, close to the assumed distribution.

[edit] Types of M-estimators

M-estimators are solutions θ which minimize

$\sum_{i=1}^n\rho(x_i,\theta).\,\!$

This minimization can always be done directly. Often it is simpler to differentiate with respect to θ and solve for the root of the derivative. When this differentiation is possible, the M-estimator is said to be of ψ-type. Otherwise, the M-estimator is said to be of ρ-type.

In most practical cases, the M-estimators are of ψ-type.

[edit] Computation

For many choices of ρ or ψ, no closed form solution exists and an iterative approach to computation is required. It is possible to use standard function optimization algorithms, such as Newton-Raphson. However, an iterated reweighting algorithm can be constructed for univariate problems, and is usually the preferred method. The iterations are usually begun from robust starting points such as the median as an estimate of location, and the median absolute deviation as an estimate of scale.

For some choices of ψ, specifically, redescending functions, the solution may not be unique. Thus, some care is needed in ensuring that good starting points are chosen. The problem is particularly important in multivariate and regression problems.

[edit] Properties

[edit] Distribution

It can be shown that M-estimators are asymptotically normally distributed. As such, Wald-type approaches to constructing confidence intervals and hypothesis tests can be used. However, since the theory is asymptotic, it will frequently be sensible to check the distribution, perhaps by examining the permutation or bootstrap distribution.

[edit] Influence function

The influence function of an M-estimator of $ψ$ -type is proportional to its defining $ψ$ function.

Let T be an M-estimator of ψ-type, and G be a probability distribution for which $T (G)$ is defined. Its influence function IF is

$\operatorname{IF}(x;T,G) = -\frac{\psi(x,T(G))} {\int\left[\frac{\partial\psi(y,\theta)} {\partial\theta} \right] \mathrm{d}y }$

A proof of this property of M-estimators can be found in Huber (1981, Section 3.2).

[edit] Applications

M-estimators can be constructed for location parameters and scale parameters in univariate and multivariate settings, as well as being used in robust regression.

[edit] Examples

[edit] Mean

Let (X₁, ... , X_n) be a set of independent, identically distributed random variables, with distribution F.

If we define

$\rho(x, \theta)=\frac{(x - \theta)^2}{2},\,\!$

we note that this is minimized when θ is the mean of the Xs. Thus the mean is an M-estimator of ρ-type, with this ρ function.

As this ρ function is continuously differentiable in θ, the mean is thus also an M-estimator of ψ-type for ψ(x, θ) = θ - x.

[edit] See also

[edit] References

Robust Statistics, Peter. J. Huber, Wiley, 1981 (republished in paperback, 2004)

Wilcox, R.R. (2003). Applying contemporary statistical techniques. Summarizing data (pp. 55-79). San Diego, CA: Academic Press.

Retrieved from "http://en.wikipedia.org../../../m/-/e/M-estimator.html"

Categories: Statistics | Estimation theory