Statistics 550 Notes 4
Reading: Section 1.3
I. Review from last class
Decision theory framework:
We observe data from a probability distribution where is unknown but we know belongs to the model
We want to choose an action (e.g., a point estimate) based on the data .
Loss function: = loss incurred by taking the action when the true parameter is (i.e., the true is ).
Decision procedure (rule): Rule for choosing what action to take based on the data .
Risk: . The average loss incurred by using decision procedure when the true parameter is .
We would like to choose a decision procedure which has small risk at the true parameter .
II. Example of comparing risk of point estimators under squared error loss
Suppose that an iid sample X1,...,Xn is drawn from the uniform distribution on [0,] where is an unknown parameter and the distribution of Xi is
Several point estimators:
1.
2. . Note: Unlike W1, W2 is unbiased because .
3. W3=2. Note: W3 is unbiased,
Comparison of three estimators under squared error loss
1.
The sampling distribution for W1 is
and
To calculate , we calculate and use the formula .
Thus,
.
2.
Note .
Thus, ,
and
Because W2 is unbiased,
3.
To find the mean square error, we use the fact that if iid with mean and variance , then has mean and variance .
We have
Thus, , and
and .
W3 is unbiased and has mean square error .
The mean square errors of the three estimators are the following:
W1 /W2 /
W3 /
For n=1, the three estimators have the same MSE.
For n>1,
So W2 is best, W1 is second best and W3 is the worst.
III. Admissibility/Inadmissibility of Decision Procedures
A decision procedure is inadmissible if there exists another decision procedure such that for all and for at least one . The decision procedure is said to dominate ; there is no justification for using rather than .
In Example 1, W1 and W3 are inadmissible point estimators under squared error loss for .
A decision procedure is admissible if it is not inadmissible, i.e., if there does not exist a decision procedure such that for all and for at least one .
IV. Selection of a decision procedure:
We would like to choose a decision procedure which has a “good” risk function.
Ideal: We’d like to construct a decision procedure that is at least as good as all other decision procedures for all , i.e., such that for all and all other decision procedures .
This is generally impossible!
Example 2: For X1,...,Xn iid , is an admissible point estimator of for squared error loss.
Proof: Suppose is inadmissible. Then there exists a decision procedure that dominates . This implies that . Hence,The above equation implies with probability 1 for all , which means that for all . This contradicts dominates . Thus is admissible.
Comparison of risk under squared error loss for and .
Although is admissible, it does not have good risk properties for many values of .
Approaches to choosing a decision procedure with good risk properties:
(1) Restrict class of decision procedures and try to choose optimal procedure within this class, e.g., for point estimation, we might only consider unbiased estimators of such that for all .
(2) Compare risk functions by global criterion. We shall discuss Bayes and minimax criteria.
Example 2: We are trying to decide whether to drill a location for oil. There are two possible states of nature,
location contains oil and location doesn’t contain oil. We are considering three actions, =drill for oil, =sell the location or =sell partial rights to the location.
The following loss function is decided on
(Drill)/ (Sell)
/ (Partial rights)
(Oil) / 0 / 10 / 5
(No oil) / 12 / 1 / 6
An experiment is conducted to obtain information about resulting in the random variable X with possible values 0,1 and frequency function given by the following table:
Rock formationX
0 / 1
(Oil) / 0.3 / 0.7
(No oil) / 0.6 / 0.4
X represents the presence of a certain geological formation that is more likely to be present when there is oil.
The possible nonrandomized decision procedures are
Rule1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9
x=0 / / / / / / / / /
x=1 / / / / / / / / /
The risk of at is
The risk functions are
Rule1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9
/ 0 / 7 / 3.5 / 3 / 10 / 6.5 / 1.5 / 8.5 / 5
/ 12 / 7.6 / 9.6 / 5.4 / 1 / 3 / 8.4 / 4 / 6
IV. Bayes Criteria
The Bayesian point of view leads to a natural global criterion.
Suppose a person’s prior distribution about is and the model is that has probability density function (or probability mass function) . Then the joint (subjective) pdf (or pmf) of is .
The Bayes risk of a decision procedure for a prior distribution , denoted by, is the expected value of the risk function over the joint distribution of :
.
For a person with subjective probability distribution , the decision procedure which minimizes minimizes the expected loss and is the best procedure from this person’s point of view. The decision procedure which minimizes the Bayes risk for a prior is called the Bayes rule for the prior .
Example 2 continued: For prior, and , the Bayes risks are
Rule1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9
/ 9.6 / 7.48 / 8.38 / 4.92 / 2.8 / 3.7 / 7.02 / 4.9 / 5.8
Thus, rule 5 is the Bayes rule for this prior distribution.
A non-subjective interpretation of Bayes rules: The Bayes approach leads us to compare procedures on the basis of
if is discrete with frequency function or
if is continuous with density .
Such comparisons make sense even if we do not interpret as a prior density or frequency, but only as a weight function that reflects the importance we place on doing well at the different possible values of .
For example, in Example 2, if we felt that doing well at both and are equally important, we would set .
V. Minimax Criteria
The minimax criteria minimizes the worst possible risk. That is, we prefer to , if and only if
.
A procedure is minimax (over a class of considered decision procedures) if it satisfies
.
Among the nine decision rules considered for Example 2, rule 4 is the minimax rule.
Rule1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9
/ 0 / 7 / 3.5 / 3 / 10 / 6.5 / 1.5 / 8.5 / 5
/ 12 / 7.6 / 9.6 / 5.4 / 1 / 3 / 8.4 / 4 / 6
max{,
} / 12 / 7.6 / 9.6 / 5.4 / 10 / 6.5 / 8.4 / 8.5 / 6
Game theory motivation for minimax criterion: Suppose we play a two-person zero sum game against Nature. Then the minimax decision procedure is the minimax strategy for the game.
Comments on the minimax criteria: The minimax criteria is very conservative. It aims to give maximum protection against the worst can happen. The principle would be compelling if the statistician believed that Nature was a malevolent “opponent” but in fact Nature is just the inanimate state of the world.
Although the minimax criterion is conservative, in many cases the principle does lead to reasonable procedures.