Statistics 510: Notes 2

Reading: Sections 2.3, 2.4, 2.7.

I. Wrap-up of Section 2.2

Example 6 from last class: A fashionable country club has 100 members, 30 of whom are lawyers. Rumor has it that 25 of the club members are liars and that 55 are neither lawyers nor liars. What proportion of the lawyers are liars?

Let set of lawyers, set of liars and set of all members of the country club.

Let the number of members in any set be denoted by . The proportions of the lawyers that are liars is equal to . We are given that

The last statement implies that .

To use this information to calculate , we verify using a Venn diagram that . Thus, and the proportion of lawyers that are liars is .

DeMorgan’s Laws:

Let and denote any two events. Use Venn diagrams to show that

(a) the complement of their intersection is the union of their complements:

(b) the complement of their union is the intersection of their complements:

Review of Section 2.2: We have defined the key concepts of an experiment, the sample space for an experiment and events in the sample space. We have discussed relations between events and introduced the Venn diagram as a tool for examining the relations between events.

The relations between events will be useful for manipulating probabilities. We now introduce the concept of the probability of an event.

II. Frequency interpretation of probability (Section 2.3)

Therelative frequencyof an event is a proportion measuring how often, or how frequently, the event occurs in a sequence of experiments.

Example 1: Experiment: Toss a coin. Sample space is .

If the experiment is repeated many times, the relative frequency of heads will usually be close to ½:

  • The French naturalist Count Buffon (1707-1788) tossed a coin 4040 times. Result: 2048 heads, or relative frequency 2048/4040=0.5069 for heads.
  • Around 1900, the English statistician Karl Pearson heroically tossed a coin 24,000 times. Result: 12,012 heads, a relative frequency of 0.5005.
  • While imprisoned by the Germans during World War II, the Australian mathematician John Kerrich tossed a coin 10,000 times. Result: 5067 heads, a relative frequency of 0.5067.

In the frequency interpretation of probability, the probability of an event is the expected relative frequency of in a large number of trials. In symbols, the proportion of times occurs in trials, call it , is expected to be roughly equal to the theoretical probability if is large:

for large .

Example 2: Experiment: Observation of the sex of a child. The sample space is . The following table shows the proportion of boys among live births to residents of the U.S.A. over the past 20 years (Source: Information Please Almanac).

Year / Number of births / Proportion of boys
1983 / 3,638,933 / 0.5126648
1984 / 3,669,141 / 0.5122425
1985 / 3,760,561 / 0.5126849
1986 / 3,756,547 / 0.5124035
1987 / 3,809,394 / 0.5121951
1988 / 3,909,510 / 0.5121931
1989 / 4,040,958 / 0.5121286
1990 / 4,158,212 / 0.5121179
1991 / 4,110,907 / 0.5112054
1992 / 4,065,014 / 0.5121992
1993 / 4,000,240 / 0.5121845
1994 / 3,952,767 / 0.5116894
1995 / 3,926,589 / 0.5084196
1996 / 3,891,494 / 0.5114951
1997 / 3,880,894 / 0.5116337
1998 / 3,941,553 / 0.5115255
1999 / 3,959,417 / 0.5119072
2000 / 4,058,814 / 0.5117182
2001 / 4,025,933 / 0.5111665
2002 / 4,021,726 / 0.5117154

The relative frequency of boys among newborn children in the U.S.A. appears to be stable at around 0.512. This suggests that a reasonable model for the outcome of a single birth is and .

This model for births is equivalent to the sex of a child being determined by drawing at random with replacement from a box of 1000 tickets, containing 512 tickets marked and 488 tickets marked .

III. Axioms of Probability (Section 2.3)

The frequency interpretation of probability is the way that many scientists think about what probability represents but it is hard to make it into a rigorous mathematical definition of probability.

Kolmogorov (1933) developed an axiomatic definition of probability which he then showed can be interpreted, in a certain sense, as the limit of the relative frequency in a large number of experiments.

A probability function(measure) on the events in a sample space is a function on the events that satisfies the following three axioms:

Axiom 1: for all events .

Axiom 2: where is the sample space.

Axiom 3: For any sequence of mutually exclusive events (that is, events for which when ),

.

We refer to as the probability of an event .

Using these axioms, we shall be able to prove that if an experiment is repeated over and over again, then with probability 1, the proportion of times that a specific event occurs converges to , which is essentially the frequency interpretation of probability. This is called the strong law of large numbers and we shall prove it in Chapter 8.

Consequences of axioms:

1. .

Proof: Consider the sequence of events , where and for . Then, as the events are mutually exclusive and as , we have from Axiom 3 that

,

implying that .

2. For any finite sequence of mutually exclusive events ,

.

Proof: Let for . The results follows from Axiom 3 combined with the fact established above that .

IV. Examples of probability functions

Example 3: If a die is rolled and we suppose that all six sides are equally likely to appear, then we would have .

The probability of rolling an even number would equal, from Axiom 3,

.

Example 4: A die is loaded in such a way that the probability of any particular face’s showing is directly proportional to the number on that face. What is the probability that an even number appears?

To solve this requires that we make use of Axiom 2 that . The experiment – tossing a die – generates a sample space containing six outcomes. But the six are not equally likely: by assumption,

where is a constant. From Axiom 2,

,

which implies that and . It follows then from Axiom 3 that the probability that an even number appears is

V. Probability as a Measure of Belief (Section 2.7)

Another interpretation of probability, besides the frequency interpretation, is that probability measures an individual’s belief in the statement that he or she is making. This is called subjective or personal probability. Consider the question,

“What is the probability that the Philadelphia Eagles will win the Super Bowl this year?”

It is hard to interpret such a probability using the frequency interpretation because the football season can only be played once. The subjective interpretation of a statement that the Eagles have a probability of 0.1 of winning the Super Bowl is that:

  • If the person making the statement were offered a chance to play a game in which the person was required to pay less than 10 cents to buy into the game and would win $1 if the Eagles win the Super Bowl, then the person would buy into the game.
  • By contrast, if the person making the statement were offered a chance to play a game in which the person was required to pay more than 10 cents to buy into the game and would win $1 if the Eagles win the Super Bowl, then the person would not buy into the game.

More generally, if is an event, a person’s subjective probability of has the following interpretation: For a game in which the person will be paid $1 if occurs, is the amount of money the person would be willing to pay to buy into the game. Thus, if the person is willing to pay 50 cents to buy in, .

Note that this concept of probability is personal: may vary from person to person depending on their opinions.

A rational person has a “coherent” system of personal probabilities: a system is said to be “incoherent” if there exists some structure of bets such that the bettor will lose no matter what happens. It can be shown that a coherent system of personal probabilities requires that the personal probabilities satisfy Axioms 1, 2 and 3 (for details on this, see Hogg, McKean and Craig, Introduction to Mathematical Statistics, Chapter 11.1).

Thus, whether the probability function is interpreted as a measure of belief or as a long-run relative frequency, its mathematical properties remain unchanged.

I personally think of probability in terms of the frequentist interpretation but it is equally valid to view probability as a measure of belief; all results in the course are equally applicable to both interpretations.

VI. Propositions about Probability Function Based on Axioms (Section 2.4)

Proposition 4.1: .

Proof: Because , by Axiom 2 we have

.

Because and are mutually exclusive, it follows from Axiom 3 that

.

Thus, .

Example 5: In a certain population, 10% of the people are rich, 5% are famous and 3% are rich and famous. For a person picked at random from this population (meaning that each person has an equal probability of being picked), what is the chance that the person is not rich?

Proposition 4.2: If (meaning that every outcome in is contained in ), then .

Proof: Note that the event may be written in the form

,

where and are mutually exclusive. Therefore, by Axiom 3,

. By Axiom 1, so that .

Furthermore, from the proof of Proposition 4.2, we have the difference rule that if ,

.

Example 5 continued: For a person picked at random from the population, what is the chance that the person is rich but not famous?

Proposition 4.3: .

Proof: The Venn diagram suggests the statement of the proposition is true. More formally, we have from Axiom 3 that

From the first two equations, we have that

Substituting these expressions in the expression for , we conclude that

.

Note: Proposition 4.3 can be extended to provide an expression for ; see Proposition 4.4, the inclusion-exclusion identity).

Example 5 continued: What is the chance that the randomly selected person is either rich or famous?

Example 6: Winthrop, a premed student, has been summarily rejected by all 126 U.S. medical schools. Desperate, he sends his transcripts and MCATs to the two least selective campuses he can think of, the two branch campuses (and ) of Swampwater Tech. Based on the success his friends have had there, he estimates that his probability of being accepted at is 0.7, and at , 0.4. He also suspects that there is a 75% chance that at least one of his applications will be rejected. What is the probability that at least one of the schools will accept him?