Statistics 512 Notes 14: Properties of Maximum Likelihood Estimates Continued

Good properties of maximum likelihood estimates:

(1)  Invariance

(2)  Consistency

(3)  Asymptotic Normality

(4)  Efficiency

Asymptotic Normality

Suppose iid with density , . Under regularity conditions, the large sample distribution of is approximately normal with mean and variance where is the true value of .

Regularity Conditions:

(R0) The pdfs are distinct, i.e., implies (the model is identifiable).

(R1) The pdfs have common support for all .

(R2) The point is an interior point of .

(R3) The pdf is twice differentiable as a function of .

(R4) The integral can be differentiated twice under the integral sign as a function of

Note that iid uniform on does not satisfy (R1).

Fisher information: Define by

.

is called the Fisher information about .

The greater the squared value of is on average, the more information there is to distinguish between different values of , making it easier to estimate .

Lemma: Under the regularity conditions,

.

Proof: First, we observe that since ,

.

Combining this with the identity

,

we have

where we have interchanged differentiation and integration using regularity condition (R4). Taking derivatives of the expressions just above, we have

so that

Example: Information for a Bernoulli random variable.

Let X be Bernoulli (p). Then

,

Thus,

There is more information about p when p is closer to zero or one.

Additional regularity condition:

(R5) The pdf is three times differentiable as a function of . Further, for all , there exists a constant c and a function M(x) such that

with for all and all x in the support of X.

Theorem (6.2.2): Assume are iid with pdf for such that the regularity conditions (R0)-(R5) are satisfied. Suppose further that Fisher information satisfies . Then

Proof: Sketch of proof. From a Taylor series expansion,

First, we consider the numerator of this last expression. Its expectation is

because

Its variance is

Next we consider the denominator:

By the law of large numbers, the latter expression converges to

We thus have

Therefore,

.

Furthermore,

and thus

The central limit theorem may be applied to , which is a sum of iid random variables:

Corollary: Under the same assumptions as Theorem 6.2.2,

Informally, Theorem 6.2.2 and its corollary say that the distribution of the MLE can be approximated by .

From this fact, we can construct an asymptotic correct confidence interval.

Let .

Then as .

For so is an approximate 95% confidence interval for .

Example 1: Let be iid Bernoulli (p). The MLE is . We calculated above that . Thus, an approximate 95% confidence interval for p is

. This is what the newspapers report when they say “the poll is accurate to within four points, 95 percent of the time.”

Computation of maximum likelihood estimates

Example 2: Logistic distribution. Let be iid with density

, .

The log of the likelihood simplifies to:

Using this, the first derivative is

Setting this equal to 0 and rearranging terms results in teh equation:

. (*)

Although this does not simplify, we can show the equation (*) has a unique solution. The derivative of the left hand side of (i) simplifies to

Thus, the left hand side of (*) is a strictly increasing function of . Finally, the left hand side of (*) approaches 0 as and approaches n as . Thus, the equation (*) has a unique solution. Also the second derivative of is strictly negative for all ; so the solution is a maximum.

How do we find the maximum likelihood estimate that is the solution to (*)?

Newton’s method is a numerical method for approximating solutions to equations. The method produces a sequence of values that, under ideal conditions, converges to the MLE .

To motivate the method, we expand the derivative of the log likelihood around :

Solving for gives

This suggests the following iterative scheme:

.

The following is an R function that uses Newton’s method to approximate the maximum likelihood estimate for a logistic distribution:

mlelogisticfunc=function(xvec,toler=.001){

startvalue=median(xvec);

n=length(xvec);

thetahatcurr=startvalue;

# Compute first deriviative of log liklelihood

firstderivll=n-2*sum(exp(-xvec+thetahatcurr)/(1+exp(-xvec+thetahatcurr)));

# Continue Newton’s method until the first derivative

# of the likelihood is within toler of 0

while(abs(firstderivll)>toler){

# Compute second derivative of log likelihood

secondderivll=-2*sum(exp(-xvec+thetahatcurr)/(1+exp(-xvec+thetahatcurr))^2);

# Newton’s method update of estimate of theta

thetahatnew=thetahatcurr-firstderivll/secondderivll;

thetahatcurr=thetahatnew;

# Compute first derivative of log likelihood

firstderivll=n-2*sum(exp(-xvec+thetahatcurr)/(1+exp(-xvec+thetahatcurr)));

}

list(thetahat=thetahatcurr);

}