Statistics 550 Notes 4

Reading: Section 1.3

I. Review from last class

Decision theory framework:

We observe data from a probability distribution where is unknown but we know belongs to the model

We want to choose an action (e.g., a point estimate) based on the data .

Loss function: = loss incurred by taking the action when the true parameter is (i.e., the true is ).

Decision procedure (rule): Rule for choosing what action to take based on the data .

Risk: . The average loss incurred by using decision procedure when the true parameter is .

We would like to choose a decision procedure which has small risk at the true parameter .

II. Example of comparing risk of point estimators under squared error loss

Suppose that an iid sample X1,...,Xn is drawn from the uniform distribution on [0,] where is an unknown parameter and the distribution of Xi is

Several point estimators:

1.

2. . Note: Unlike W1, W2 is unbiased because .

3. W3=2. Note: W3 is unbiased,

Comparison of three estimators under squared error loss

1.

The sampling distribution for W1 is

and

To calculate , we calculate and use the formula .

Thus,

.

2.

Note .

Thus, ,

and

Because W2 is unbiased,

3.

To find the mean square error, we use the fact that if iid with mean and variance , then has mean and variance .

We have

Thus, , and

and .

W3 is unbiased and has mean square error .

The mean square errors of the three estimators are the following:

W1 /
W2 /
W3 /

For n=1, the three estimators have the same MSE.

For n>1,

So W2 is best, W1 is second best and W3 is the worst.

III. Admissibility/Inadmissibility of Decision Procedures

A decision procedure is inadmissible if there exists another decision procedure such that for all and for at least one . The decision procedure is said to dominate ; there is no justification for using rather than .

In Example 1, W1 and W3 are inadmissible point estimators under squared error loss for .

A decision procedure is admissible if it is not inadmissible, i.e., if there does not exist a decision procedure such that for all and for at least one .

IV. Selection of a decision procedure:

We would like to choose a decision procedure which has a “good” risk function.

Ideal: We’d like to construct a decision procedure that is at least as good as all other decision procedures for all , i.e., such that for all and all other decision procedures .

This is generally impossible!

Example 2: For X1,...,Xn iid , is an admissible point estimator of for squared error loss.

Proof: Suppose is inadmissible. Then there exists a decision procedure that dominates . This implies that . Hence,The above equation implies with probability 1 for all , which means that for all . This contradicts dominates . Thus is admissible.

Comparison of risk under squared error loss for and .

Although is admissible, it does not have good risk properties for many values of .

Approaches to choosing a decision procedure with good risk properties:

(1) Restrict class of decision procedures and try to choose optimal procedure within this class, e.g., for point estimation, we might only consider unbiased estimators of such that for all .

(2) Compare risk functions by global criterion. We shall discuss Bayes and minimax criteria.

Example 2: We are trying to decide whether to drill a location for oil. There are two possible states of nature,

location contains oil and location doesn’t contain oil. We are considering three actions, =drill for oil, =sell the location or =sell partial rights to the location.

The following loss function is decided on

(Drill)
/ (Sell)
/ (Partial rights)

(Oil) / 0 / 10 / 5
(No oil) / 12 / 1 / 6

An experiment is conducted to obtain information about resulting in the random variable X with possible values 0,1 and frequency function given by the following table:

Rock formation
X
0 / 1
(Oil) / 0.3 / 0.7
(No oil) / 0.6 / 0.4

X represents the presence of a certain geological formation that is more likely to be present when there is oil.

The possible nonrandomized decision procedures are

Rule
1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9
x=0 / / / / / / / / /
x=1 / / / / / / / / /

The risk of at is

The risk functions are

Rule
1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9
/ 0 / 7 / 3.5 / 3 / 10 / 6.5 / 1.5 / 8.5 / 5
/ 12 / 7.6 / 9.6 / 5.4 / 1 / 3 / 8.4 / 4 / 6

IV. Bayes Criteria

The Bayesian point of view leads to a natural global criterion.

Suppose a person’s prior distribution about is and the model is that has probability density function (or probability mass function) . Then the joint (subjective) pdf (or pmf) of is .

The Bayes risk of a decision procedure for a prior distribution , denoted by, is the expected value of the risk function over the joint distribution of :

.

For a person with subjective probability distribution , the decision procedure which minimizes minimizes the expected loss and is the best procedure from this person’s point of view. The decision procedure which minimizes the Bayes risk for a prior is called the Bayes rule for the prior .

Example 2 continued: For prior, and , the Bayes risks are

Rule
1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9
/ 9.6 / 7.48 / 8.38 / 4.92 / 2.8 / 3.7 / 7.02 / 4.9 / 5.8

Thus, rule 5 is the Bayes rule for this prior distribution.

A non-subjective interpretation of Bayes rules: The Bayes approach leads us to compare procedures on the basis of

if is discrete with frequency function or

if is continuous with density .

Such comparisons make sense even if we do not interpret as a prior density or frequency, but only as a weight function that reflects the importance we place on doing well at the different possible values of .

For example, in Example 2, if we felt that doing well at both and are equally important, we would set .

V. Minimax Criteria

The minimax criteria minimizes the worst possible risk. That is, we prefer to , if and only if

.

A procedure is minimax (over a class of considered decision procedures) if it satisfies

.

Among the nine decision rules considered for Example 2, rule 4 is the minimax rule.

Rule
1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9
/ 0 / 7 / 3.5 / 3 / 10 / 6.5 / 1.5 / 8.5 / 5
/ 12 / 7.6 / 9.6 / 5.4 / 1 / 3 / 8.4 / 4 / 6
max{,
} / 12 / 7.6 / 9.6 / 5.4 / 10 / 6.5 / 8.4 / 8.5 / 6

Game theory motivation for minimax criterion: Suppose we play a two-person zero sum game against Nature. Then the minimax decision procedure is the minimax strategy for the game.

Comments on the minimax criteria: The minimax criteria is very conservative. It aims to give maximum protection against the worst can happen. The principle would be compelling if the statistician believed that Nature was a malevolent “opponent” but in fact Nature is just the inanimate state of the world.

Although the minimax criterion is conservative, in many cases the principle does lead to reasonable procedures.