I. Review from Last Class (Conditions for Uniqueness and Existence of the MLE)

ORMAT Statistics 550 Notes 14

Reading: Section 2.3-2.4

I. Review from last class (Conditions for Uniqueness and Existence of the MLE).

Lemma 2.3.1: Suppose we are given a function where is open and is continuous. Suppose also that

Then there exists such that

Proposition 2.3.1: Suppose our model is that has pdf or pmf , and that (i) is strictly concave; (ii) as . Then the maximum likelihood estimator exists and is unique.

Corollary: If the conditions of Proposition 2.3.1 are satisfied and is differentiable in , then is the unique solution to the estimating equation:

Note: It is the strict concavity of that guarantees that has a unique solution.

II. Application to Exponential Families.

1. Theorem 1.6.4, Corollary 1.6.5: For a full exponential family, the log likelihood is strictly concave.

Consider the exponential family

Note that if is convex, then the log likelihood

is concave in .

Proof that is convex:

Recall that . To show that is convex, we want to show that

for

or equivalently

We use Holder’s Inequality to establish this. Holder’s Inequality (B.9.4 on page 518 of Bickel and Doksum) states that for any two numbers r and s with ,

More generally, Holder’s inequality states

We have

For a full exponential family, the log likelihood is strictly concave.

For a curved exponential family, the log likelihood is concave but not strictly concave.

2. Theorem 2.3.1, Corollary 2.3.2 spell out specific conditions under which as for exponential families.

Example 1: Gamma distribution

for the parameter space .

The gamma distribution is a full two-dimensional exponential family so that the likelihood function is strictly concave.

The boundary of the parameter space is

Can check that .

Thus, by Proposition 2.3.1, the MLE is the unique solution to the likelihood equation.

The partial derivatives of the log likelihood are

Setting the second partial derivative equal to zero, we find

When this solution is substituted into the first partial derivative, we obtain a nonlinear equation for the MLE of :

This equation cannot be solved in closed form.

II. Numerical Methods for Finding MLEs

The Bisection Method

The bisection method is a method for finding the root of a one-dimensional function that is continuous on , for which f is increasing (an analogous method can be used for f decreasing).

Note: There is a root by the intermediate value theorem.

Bisection Algorithm:

Decide on tolerance for

Stop algorithm when we find

1. Find such that .

Initialize .

2. If

Else set

3. If .

If and go to step 2.

Lemma 2.4.1: The bisection algorithm stops at a solution such that

Proof: If is the mth iterate of ,

Moreover, by the intermediate value theorem,

Therefore,

For we have .

Note: Bisection can be much more efficient than the approach of specifying a grid of points between a and b and evaluating f at each grid point, since for finding the root to within , a grid of size is required, while bisection requires only evaluations of f.

Coordinate Ascent Method

The coordinate ascent method is an approach to finding the maximum likelihood estimate in a multidimensional family. Suppose we have a k-dimensional parameter . The coordinate ascent method is:

Choose an initial estimate

0. Set

1. Maximize over using the bisection method by solving (assuming the log likelihood is differentiable). Reset to the that maximizes .

2. Maximize over using the bisection method. Reset to the that maximizes .

....

K. Maximize over using the bisection method. Reset to the that maximizes .

K+1. Stop if the distance between and is less than some tolerance . Otherwise return to step 0.

The coordinate ascent method converges to the maximum likelihood estimate when the log likelihood function is strictly concave on the parameter space (see diagram below).

Newton’s Method

Newton’s method is a numerical method for approximating solutions to equations. The method produces a sequence of values that, under ideal conditions, converges to the MLE .

To motivate the method, we expand the derivative of the log likelihood around :

Solving for gives

This suggests the following iterative scheme:

Newton’s method can be extended to more than one dimension (usually called Newton-Raphson)

where denotes the gradient vector of the likelihood and denote the Hessian.

Comments on methods for finding the MLE:

1. The bisection method is guaranteed to converge if there is a unique root in the interval being searched over but is slower than Newton’s method.

2. Newton’s method:

A. The method does not work if .

B. The method does not always converge.

See attached pages from Numerical Recipes in C book.

3. For the coordinate ascent method and Newton’s method, a good choice of starting values is often the method of moments estimator.

4. When there are multiple roots to the likelihood equation, the solution found by the bisection method, the coordinate ascent method and Newton’s method depends on the starting value. These algorithms might converge to a local maximum (or a saddlepoint) rather than a global maximum.

5. The EM (Expectation/Maximization) algorithm (Section 2.4.4) is another approach to finding the MLE that is particularly suitable when part of the data is missing.

Numerical Examples:

Example 1: MLE for the gamma distribution

In a study of the natural variability of rainfall, the rainfall of summer storms was measured by a network of rain gauges in southern Illinois for the years 1960-1964. 227 measurements were taken.

R program for finding the maximum likelihood estimate using the bisection method.

digamma(x) = function in R that computes the derivative of the log of the gamma function of x,

uniroot(f,interval) = function in R that finds the approximate zero of a function in the interval using bisection type method.

alphahatfunc=function(alpha,xvec){

n=length(xvec);

eq=-n*digamma(alpha)-n*log(mean(xvec))+n*log(alpha)+sum(log(xvec));

eq;

}

> alphahatfunc(.3779155,illinoisrainfall)

[1] 65.25308

> alphahatfunc(.5,illinoisrainfall)

[1] -45.27781

alpharoot=uniroot(alphahatfunc,interval=c(.377,.5),xvec=illinoisrainfall)

> alpharoot

$root

[1] 0.4407967

$f.root

[1] -0.004515694

$iter

[1] 4

$estim.prec

[1] 6.103516e-05

betahatmle=mean(illinoisrainfall)/.4407967

[1] 0.5090602

Comparison with method of moments:

Substituting into the expression for , we obtain

Thus,

betahatmom=(mean(illinoisrainfall^2)-(mean(illinoisrainfall))^2)/mean(illinoisrainfall)

> betahatmom

[1] 0.5937626

alphahatmom=(mean(illinoisrainfall))^2/(mean(illinoisrainfall^2)-(mean(illinoisrainfall))^2)

> alphahatmom

[1] 0.3779155

Example 2: MLE for Cauchy Distribution

Cauchy model:

Suppose are iid Cauchy() and we observe .

Log likelihood is not concave and has two local maxima between 0 and 10. There is also a local minimum.

The likelihood equation is

The local maximum (i.e., the solution to the likelihood equation) that the bisection method finds depends on the interval searched over.

R program to use bisection method

derivloglikfunc=function(theta,x1,x2,x3){

dloglikx1=2*(x1-theta)/(1+(x1-theta)^2);

dloglikx2=2*(x2-theta)/(1+(x2-theta)^2);

dloglikx3=2*(x3-theta)/(1+(x3-theta)^2);

dloglikx1+dloglikx2+dloglikx3;

}

When the starting points for the bisection method are , the bisection method finds the MLE:

uniroot(derivloglikfunc,interval=c(0,5),x1=0,x2=1,x3=10);

$root

[1] 0.6092127

When the starting points for the bisection method are , the bisection method finds a local maximum but not the MLE: uniroot(derivloglikfunc,interval=c(0,10),x1=0,x2=1,x3=10);

$root

[1] 9.775498