Statistics 510: Notes 12

Reading: Sections 4.8-4.9

Schedule:

I will e-mail review problems for midterm by Thursday night.

Friday, 10/21, 5 pm: Homework 5 due. Office hours by appointment.

Monday, 10/24, class: Chapters 5.1-5.2

Monday, 10/24, evening (time, location TBA): question and answer session for midterm

Tuesday, 10/25: Office hours: 1-2, 4:45-6:45, by appointment.

Wednesday, 10/26P Midterm.

Midterm info:

Covers lectures 1-12 on Chapters 1-4.

Best review is to review the homework problems and class notes.

The exam is closed book but you are allowed two 8.5 x 11 sheets of notes, front and back. Bring a calculator.

I. Review: Poisson Distribution

Arises in two settings:

(1) Poisson distribution provides an approximation to the binomial distribution when n is large, p is small and is moderate.

(2) Poisson distribution is used to model the number of events that occur in a time period t when

(a) the probability of an event occurring in a given small time period is approximately proportion to

(b) the probability of two or more events occurring in a given small time period is much smaller than

(c) the number of events occurring in two non-overlapping time periods are independent

When (a), (b) and (c) are satisfied, the number of events occurring in a time period t has a Poisson () distribution. The parameter is called the rate of the Poisson distribution. The mean number of events that occur is and the variance of the number of events is also .

Sketch of proof for Poisson distribution under (a)-(c):

For a large value of n, we can divide the time period t into n nonoverlapping intervals of length . The number of events occurring in time period t is then approximately Binomial . Using the Poisson approximation to the binomial, the number of events occurring in time period t is approximately Poisson =Poisson (). Taking the limit as yields the result.

Number of events occurring in space: The Poisson distribution also applies to the number of events occurring in space. Instead of intervals of length t, we have domains of area or volume t. Assumptions (a)-(c) become:

(a’) the probability of an event occurring in a given small region of area or volume t is approximately proportion to

(b’) the probability of two or more events occurring in a given small region of area or volumeis much smaller than

(c’) the number of events occurring in two non-overlapping regions are independent

The parameter for a Poisson distribution for the number of events occurring in space is called the intensity.

Example 1: Bacteria are distributed throughout a volume of liquid according to assumptions (a’), (b’) and (c’) with an intensity of organisms per mm3. A measuring device counts the number of bacteria in a 10 mm3 volume of the liquid. What is the probability that more than two bacteria are in this measured volume?

II. Geometric Random Variable (Section 4.8.1)

Suppose that independent trials, each having a probability p, , of being a success, are performed until a success occurs. Let X be the random variable that denotes the number of trials required. The probability mass function of X is

(1.1)

The pmf follows because in order for X to equal n, it is necessary and sufficient that the first n-1 trials are failures and the nth trial is a success.

A random variable that has the pmf (1.1) is called a geometric random variable with parameter p.

The expected value and variance of a geometric (p) random variable are

.

Example 2: A fair die is tossed. What is the probability that the first six occurs on the fourth roll? What is the expected number of tosses needed to toss the first six?

III. Negative Binomial Distribution (Section 4.8.2)

Suppose that independent trials, each having a probability p, , of being a success, are performed until r successes occur. Let X be the random variable that denotes the number of trials required. The probability mass function of X is

(1.2)

A random variable whose pmf is given by (1.3) is called a negative binomial random variable with parameters .

Note that the geometric random variable is a negative binomial random variable with parameters .

The expected value and variance of a negative binomial random variable are

The pmf follows because in order for X to equal n, it is necessary and sufficient that the first n-1 trials are failures and the nth trial is a success.

Example 3: Suppose that an underground military installation is fortified to the extent that it can withstand up to four direct hits from air-to-surface missiles and still function. Enemy aircraft can score direct hits with these particular missiles with probability 0.7. Assume all firings are independent. What is the probability that a plane will require fewer than 8 shots to destroy the installation? What is the expected number of shots required to destroy the installation?

IV. Hypergeometric Random Variables (Section 4.8.3)

Suppose that a sample of size n is to be chosen randomly (without replacement) from an urn containing N balls, of which are white and are black. If we let X be the random variable that denotes the number of white balls selected, then

(1.4)

A random variable X whose pmf is given by (1.4) is said to be a hypergeometric random variable with parameters .

The expected value and variance of a hypergeometric random variable with parameters is

.

Example 4: A Scrabble set consists of 54 consonants and 44 vowels. What is the probability that your initial draw (of seven letters) will be all consonants? six consonants and one vowel? five consonants and two vowels?

V. Zeta (or Zipf) distribution

A random variable is said to have a zeta (sometimes called the Zipf) distribution with parameter if its probability mass function is given by

for some value of .

Since the sum of the foregoing probabilities must equal 1, it follows that

Consider a population of objects that are grouped into categories (such as all words in a book (grouped into words) or people living in urban areas in a country (grouped into cities). Let denote the event that a randomly chosen object belongs to the kth largest group.

The Zipf distribution has been found to accurately describe such as words in a book and the cities people live in.

Rank
n / City / Population
(1990) / Expected population under
Zipf’s distribution with

1 / NewYork / 7,322,564 / 10,000,000
7 / Detroit / 1,027,974 / 1,428,571
13 / Baltimore / 736,014 / 769,231
19 / Washington, D.C. / 606,900 / 526,316
25 / New Orleans / 496,938 / 400,000
31 / Kansas City, Mo. / 434,829 / 322,581
37 / Virginia Beach, Va. / 393,089 / 270,270
49 / Toledo / 332,943 / 204,082
61 / Arlington'Texas / 261,721 / 163,934
73 / Baton Rouge, La. / 219,531 / 136,986
85 / Hialeah, Fla. / 188,008 / 117,647
97 / Bakersfield, Calif. / 174,820 / 103,093

VI. Properties of the Cumulative Distribution Function (Section 4.9)

Recall that the cumulative distribution function (CDF) of a random variable X is the function .

All probability questions about X can be answered in terms of the cdf F. For example,

.