Statistics 512 Notes 27: Bayesian Statistics
Views of Probability
“The probability that this coin will land heads up is .”
Frequentist (sometimes called objectivist) viewpoint: This statement means that if the experiment were repeated many, many times, the long-run average proportion of heads would tend to .
Bayesian (sometimes called subjectivist or personal) viewpoint: This statement means that the person making the statement has a prior opinion about the coin toss such that he or she would as soon guess heads or tails if the rewards are equal.
In the frequentist viewpoint, the probability of an event , , represents the long run frequency of event in repeated experiments.
In the Bayesian viewpoint, the probability of an event , , has the following meaning: For a game in which if occurs the Bayesian will be paid $1, is the amount of money the Bayesian would be willing to pay to buy into the game. Thus, if the Bayesian is willing to pay 50 cents to buy in, =.5. Note that this concept of probability is personal: may vary from person to person depending on their opinions.
In the Bayesian viewpoint, we can make probability statements about lots of things, not just data which are subject to random variation. For example, I might say that “the probability that Franklin D. Roosevelt had a cup of coffee on February 21, 1935” is .68. This does not refer to any limiting frequency. It reflects my strength of belief that the proposition is true.
Rules for Manipulating Subjective Probabilities
All the usual rules for manipulating probabilities apply to subjective probabilities. For example,
Theorem 11.1: If and are mutually exclusive, then .
Proof: Suppose a person thinks a fair price for is and that for is . However, that person believes that the fair price for is which differs from . Say and let the difference be . A gambler offers this person the price for . The person takes the offer because it is better than . The gambler sells at a discount price of and sells at a discount price of to the person. Being a rational person with those given prices of , all three of these deals seem very satisfactory. At this point, the person has received and paid . Thus before any bets are paid off, the person has
.
That is, the person is down before any bets are settled. We now show that no matter what event happens, the person will pay and receive the same amount in settling the bets:
- Suppose happens: the gambler has and the person has so they exchange $1’s and the person is still down . The same thing occurs if happens.
- Suppose neither nor happens, then the gambler and the person receive zero, and the person is still down .
- and cannot occur together since they are mutually exclusive.
Thus, we see that it is bad for the person to assign
Because the gambler can put the person in a position to lose no matter what happens. This is sometimes referred to as a Dutch book.
The argument when is similar and can also lead to a Dutch book. Thus must equal to avoid a Dutch book; that is, .
The Bayesian can consider subjective conditional probabilities, such as , which is the fair price of only if is true. If is not true, the bet is off. Of course, could differ from . To illustrate, say is the event that “it will rain today” and is the event that “a certain person who will be outside on that day will catch a cold.” Most of us would probably assign the fair prices so that
.
Consequently, a person has a better chance of getting a cold on a rainy day.
Frequentist vs. Bayesian statistics
The frequentist point of view towards statistics is based on the following postulates:
- F1: Probability refers to limiting relative frequencies. Probabilities are objective properties of the real world.
- F2: Parameters are fixed, unknown constants. Because they are not fluctuating, no useful probability statements can be made about parameters.
- F3: Statistical procedures should be designed to have well-defined long run frequency properties. For example, a 95 percent confidence interval should trap the true value of the parameter with limiting frequency at least 95 percent.
The Bayesian approach to statistics is based on the following postulates:
- B1: Probability describes a person’s degree of belief, not limiting frequency.
- B2: We can make probability statements about parameters that are reflect our degree of belief about the parameters, even though the parameters are fixed constants.
- B3: We can make inferences about a parameter by producing a probability distribution for . Inferences, such as point estimates and interval estimates, may then be extracted from this distribution.
Bayesian inference
Bayesian inference about a parameter is usually carried out in the following way:
1. We choose a probability density -- called the prior distribution – that expresses our beliefs about a parameter before we see any data.
2. We choose a probability model that reflects our beliefs about given .
3. After observing data , we update our beliefs and calculate the posterior distribution.
As in our discussion of the Bayesian approach in decision theory, the posterior distribution is calculated using Bayes rule:
Note that as varies so that the posterior distribution is proportional to the likelihood times the prior.
Based on the posterior distribution, we can get a point estimate, an interval estimate and carry out hypothesis tests as we shall discuss below.
Bayesian inference for the normal distribution
Suppose that we observe a single observation from a normal distribution with unknown mean and known variance . Suppose that our prior distribution for is .
The posterior distribution of is
Now
Let be the coefficients in the quadratic polynomial in that is the last expression. The last expression may then be written as
To simplify this further, we use the technique of completing the square and rewrite the expression as
The second term does not depend on and we thus have that
This is the density of a normal random variable with mean and variance .
Thus, the posterior distribution of is normal with mean
and variance
.
Comments about role of prior in the posterior distribution: The posterior mean is a weighted average of the prior mean and the data, with weights proportional to the respective precisions of the prior and the data, where the precision is equal to 1/variance. If we assume that the experiment (the observation of ) is much more informative than the prior distribution in the sense that , then
Thus, the posterior distribution of is nearly normal with mean and variance . This result illustrates that if the prior distribution is quite flat relative to the likelihood, then
- the prior distribution has little influence on the posterior
- the posterior distribution is approximately proportional to the likelihood function.
On a heuristic level, the first point says that if one does not have strong prior opinions, one’s posterior opinion is mainly determined by the data one observes. Such a prior distribution is often called a vague or noninformative prior.
Bayesian decisionmaking:
When faced with a decision, the Bayesian wants to minimize the expected loss (i.e., maximize the expected utility) of a decision rule under the prior distribution for . In other words the Bayesian chooses the decision rule that minimizes the Bayes risk:
,
i.e., the Bayesian chooses to use the Bayes rule for the Bayesian’s prior distribution
As we showed in last class, for point estimation with squared error loss, the Bayes rule is to use the posterior mean as the estimate.
Thus, for the above normal distribution setup, the Bayesian’s estimate of is