Statistics 512 Notes 14: Properties of Maximum Likelihood Estimates Continued
Good properties of maximum likelihood estimates:
(1) Invariance
(2) Consistency
(3) Asymptotic Normality
(4) Efficiency
Asymptotic Normality
Suppose iid with density , . Under regularity conditions, the large sample distribution of is approximately normal with mean and variance where is the true value of .
Regularity Conditions:
(R0) The pdfs are distinct, i.e., implies (the model is identifiable).
(R1) The pdfs have common support for all .
(R2) The point is an interior point of .
(R3) The pdf is twice differentiable as a function of .
(R4) The integral can be differentiated twice under the integral sign as a function of
Note that iid uniform on does not satisfy (R1).
Fisher information: Define by
.
is called the Fisher information about .
The greater the squared value of is on average, the more information there is to distinguish between different values of , making it easier to estimate .
Lemma: Under the regularity conditions,
.
Proof: First, we observe that since ,
.
Combining this with the identity
,
we have
where we have interchanged differentiation and integration using regularity condition (R4). Taking derivatives of the expressions just above, we have
so that
Example: Information for a Bernoulli random variable.
Let X be Bernoulli (p). Then
,
Thus,
There is more information about p when p is closer to zero or one.
Additional regularity condition:
(R5) The pdf is three times differentiable as a function of . Further, for all , there exists a constant c and a function M(x) such that
with for all and all x in the support of X.
Theorem (6.2.2): Assume are iid with pdf for such that the regularity conditions (R0)-(R5) are satisfied. Suppose further that Fisher information satisfies . Then
Proof: Sketch of proof. From a Taylor series expansion,
First, we consider the numerator of this last expression. Its expectation is
because
Its variance is
Next we consider the denominator:
By the law of large numbers, the latter expression converges to
We thus have
Therefore,
.
Furthermore,
and thus
The central limit theorem may be applied to , which is a sum of iid random variables:
Corollary: Under the same assumptions as Theorem 6.2.2,
Informally, Theorem 6.2.2 and its corollary say that the distribution of the MLE can be approximated by .
From this fact, we can construct an asymptotic correct confidence interval.
Let .
Then as .
For so is an approximate 95% confidence interval for .
Example 1: Let be iid Bernoulli (p). The MLE is . We calculated above that . Thus, an approximate 95% confidence interval for p is
. This is what the newspapers report when they say “the poll is accurate to within four points, 95 percent of the time.”
Computation of maximum likelihood estimates
Example 2: Logistic distribution. Let be iid with density
, .
The log of the likelihood simplifies to:
Using this, the first derivative is
Setting this equal to 0 and rearranging terms results in teh equation:
. (*)
Although this does not simplify, we can show the equation (*) has a unique solution. The derivative of the left hand side of (i) simplifies to
Thus, the left hand side of (*) is a strictly increasing function of . Finally, the left hand side of (*) approaches 0 as and approaches n as . Thus, the equation (*) has a unique solution. Also the second derivative of is strictly negative for all ; so the solution is a maximum.
How do we find the maximum likelihood estimate that is the solution to (*)?
Newton’s method is a numerical method for approximating solutions to equations. The method produces a sequence of values that, under ideal conditions, converges to the MLE .
To motivate the method, we expand the derivative of the log likelihood around :
Solving for gives
This suggests the following iterative scheme:
.
The following is an R function that uses Newton’s method to approximate the maximum likelihood estimate for a logistic distribution:
mlelogisticfunc=function(xvec,toler=.001){
startvalue=median(xvec);
n=length(xvec);
thetahatcurr=startvalue;
# Compute first deriviative of log liklelihood
firstderivll=n-2*sum(exp(-xvec+thetahatcurr)/(1+exp(-xvec+thetahatcurr)));
# Continue Newton’s method until the first derivative
# of the likelihood is within toler of 0
while(abs(firstderivll)>toler){
# Compute second derivative of log likelihood
secondderivll=-2*sum(exp(-xvec+thetahatcurr)/(1+exp(-xvec+thetahatcurr))^2);
# Newton’s method update of estimate of theta
thetahatnew=thetahatcurr-firstderivll/secondderivll;
thetahatcurr=thetahatnew;
# Compute first derivative of log likelihood
firstderivll=n-2*sum(exp(-xvec+thetahatcurr)/(1+exp(-xvec+thetahatcurr)));
}
list(thetahat=thetahatcurr);
}