Statistics 550 Notes 13

Reading: Sections 3.1-3.2

In Chapter 3, we return to the theme of Section 1.3 which is select among point estimators and decision procedures the “optimal” estimator for the given sample size, rather than select a procedure based on asymptotic optimality considerations. In Chapter 1.3, we studied two optimality criteria – Bayes risk and minimax risk – and found Bayes and minimax estimators by computing the risk functions of all candidate estimators. This is intractable for many statistical models. In Chapter 3, we develop several tractable tools for finding the Bayes and minimax estimators.

I. Bayes Procedures

Review of the Bayes criteria:

Suppose a person’s prior distribution about is and the probability distribution for the data is that has probability density function (or probability mass function) .

This can be viewed as a model in which is a random variable and the joint pdf of is .

The Bayes risk of a decision procedure for a prior distribution , denoted by, is the expected value of the loss function over the joint distribution of , which is the expected value of the risk function over the prior distribution of :

For a person with prior distribution , the decision procedure which minimizes minimizes the expected loss and is the best procedure from this person’s point of view. The decision procedure which minimizes the Bayes risk for a prior is called the Bayes rule for the prior .

A nonsubjective interpretation of the Bayes criteria: The Bayes approach leads us to compare procedures on the basis of

if is discrete with frequency function or

if is continuous with density .

Such comparisons make sense even if we do not interpret as a prior density or frequency, but only as a weight function that reflects the importance we place on doing well at the different possible values of .

Computing Bayes estimators: In Chapter 1.3, we found the Bayes decision procedure by computing the Bayes risk for each decision procedure. This is usually an impossible task. We now provide a constructive method for finding the Bayes decision procedure

Recall from Chapter 1.2 that the posterior distribution is the subjective probability distribution for the parameter after seeing the data x:

The posterior risk of an action a is the expected loss from taking action a under the posterior distribution .

The following proposition shows that a decision procedure which chooses the action that minimizes the posterior risk for each sample x is a Bayes decision procedure.

Proposition 3.2.1: Suppose there exists a function such that for all ,

,(1.1)

where denotes the action space, then is a Bayes rule.

Proof: For any decision rule , we have that the Bayes risk can be written as

(1.2)

By (1.1), for all

Therefore, for all ,

and the result follows from (1.2).

The value of Proposition 3.2.1 is that it enables us to compute the action for the Bayes rule based on the sample data x by just minimize the posterior risk , i.e., we do not to find the entire Bayes procedure.

Consider the oil-drilling example (Example 1.3.5) again.

Example 2: We are trying to decide whether to drill a location for oil. There are two possible states of nature,

location contains oil and location doesn’t contain oil. We are considering three actions, =drill for oil, =sell the location or =sell partial rights to the location.

The following loss function is decided on

(Drill)
/ (Sell)
/ (Partial rights)

(Oil) / 0 / 10 / 5
(No oil) / 12 / 1 / 6

An experiment is conducted to obtain information about resulting in the random variable X with possible values 0,1 and frequency function given by the following table:

Rock formation
X
0 / 1
(Oil) / 0.3 / 0.7
(No oil) / 0.6 / 0.4

Application of Prop. 3.2.1 to Example 1.3.5:

Consider the prior . Suppose we observe . Then the posterior distribution of is

which equals .

Thus, the posterior risks of the actions are

Therefore, has the smallest posterior risk and for the Bayes rule ,

Similarly,

and we conclude that

Bayes procedures for common loss functions:

Bayes decision procedures for point estimation of for some common loss functions using Proposition 3.2.1:

(1) Squared Error Loss (): The action (point estimate) taken by the Bayes rule is the action that minimizes the posterior expected square loss:

By Lemma 1.4.1, is the mean of for the posterior distribution .

(2) Absolute Error Loss (). The action (point estimate) taken by the Bayes rule is the action that minimizes the posterior expected absolute loss:

(1.3)

The minimizer of (1.3) is any median of the posterior distribution so that a Bayes rule is to use any median of the posterior distribution.

Proof that the minimizer of is a median of X: Let X be a random variable and let the interval be the medians of X, i.e., . For and a continuous random variable,

A similar result holds for . A similar argument holds for discrete random variables.

(3) Zero-one loss

The Bayes rule is the midpoint of the interval of length that maximizes the posterior probability that belongs to the interval.

Example 1: Recall from Notes 2 (Chapter 1.2), for iid Bernoulli() and a Beta(r,s) prior for p, the posterior distribution for p is Beta().

Thus, for squared error loss, the Bayes estimate of p is the mean of Beta(), which equals

For absolute error loss, the Bayes estimate of p is the median of the Beta() distribution which does not have a closed form.

For n=10, here are the Bayes estimators and MLE for the Beta(1,1) = uniform prior.

/ MLE / Bayes absolute error loss / Bayes squared error loss
0 / .0000 / .0611 / .0833
1 / .1000 / .1480 / .1667
2 / .2000 / .2358 / .2500
3 / .3000 / .3238 / .3333
4 / .4000 / .4119 / .4167
5 / .5000 / .5000 / .5000
6 / .6000 / .5881 / .5833
7 / .7000 / .6762 / .6667
8 / .8000 / .7642 / .7500
9 / .9000 / .8520 / .8333
10 / 1.0000 / .9389 / .9137

Example 2: Suppose iid , known, and our prior on is .

We showed in Notes 3 that the posterior distribution for is

The mean and median of the posterior distribution is , so that the Bayes estimator for both squared error loss and absolute error loss is .

Note on Bayes procedures and sufficiency: Suppose the prior distribution has support on and the family of distributions for the data has sufficient statistic . Then to find the Bayes procedure, we can reduce the data to .

This is because the posterior distribution of is the same as the posterior distribution of since

where the last uses the sufficiency of .