1. Constructing Probability Distributions with Data

EMGT 269 - Elements of Problem Solving and Decision Making

10. USING DATA

1. Constructing Probability Distributions with Data

Discrete Case

The Empirical Probability Mass function is constructed using relative frequencies of events

Estimation of Discrete Empirical Probability Mass Function

# Accidents / # Occurrences / Pr(#Accidents)
0 / N0 / N0/M
1 / N1 / N1/M
2 / N2 / N2/M
3 / N3 / N3/M
4 / N4 / N4/M
Total / M = / 1.0

Maintenance Example

You own a manufacturing plant and are trying to design a maintenance policy. You would like to design your maintenance interval such that you balance the failure time of a machine to the maintenance interval.

Maintenance interval is short  costly due to frequent maintenance and machines never fail.

Maintenance interval is too long the machines fail, interrupting production resulting into high cost.

Suppose you suggest an interval of 260 days. You want to calculate the probability of #machine failures per day in this period and use this in your interval selection.

Also important in selecting a maintenance interval is whether these "two failure days" happen towards the end of the 260 day period.

Graph of Probability Mass Function

Notes:

Make sure you have enough data for accuracy (at least 5 observations in each category)
Always ask: Does past data represent future uncertainty?

Continuous Case

Estimation of Emperical Continuous Distribution Function

Y = Failure Time of Machine

Given data: , i = 1,…,n.
Order data such that

Estimate . In this case, we may set:
Set:

Plot the pointsin a graph.
Connect these points by a straight line.

Above procedure may be referred to as

STRAIGHT LINE APPROXIMATION

EXAMPLE: HALFWAY HOUSE

What if we observe ties in the data?

How do we use Empirical CDF in Decision Trees?

As before, use discrete approximation e.g.

Extended Pearson Tukey Method
The Bracket Median Method

Using Data to fit Theoretical Probability Models

Method of Moments

Let Y be a random variable e.g. failure time of a machine

1. Given data: , I=1,…,n.

2. Calculate the Sample Mean (=First Moment):

3. Calculate the Sample Variance (=Second Moment)

4. Select a Theoretical Probability Model with CDF , where are the parameters

5. Calculate the theoretical expressions for

6. Solve for the parameters by setting

HALFWAY HOUSE EXAMPLE CONTINUED

Divide by n=35Divide by (n-1)=34

Random Variable Y =

Yearly Bed Cost in Half Way House

Propose Normal Probability Model: i.e Y
E[Y] = , Var(Y) =

EXAMPLES "METHOD OF MOMENTS"

FOR CONTINUOUS DISTRIBUTIONS

Theoretical
Distribution / Theoretical Expressions / Parameter Solutions
Normal:
/ /
Gamma:
/ /
Exponential:
/ /
Beta:
/ /

EXAMPLES "METHOD OF MOMENTS"

FOR DISCRETE DISTRIBUTIONS

Theoretical
Distribution / Theoretical Expressions / Parameter Solutions
Binomial:
/ /
Poisson:
/ /
Geometric:
/ /

Fitting Theoretical Distributions using quantile estimates

Y= Yearly bed cost in Half Way House

1. Given data: , I=1,…,n.

2. Order the data such that

3. Set:

7. Fit a Theoretical Probability Model with CDF by selecting parameters such that

is minimized.

Note:

Above procedure requires the use of numerical algorithms to calculate the parameters .
Software BESTFIT not only determines optimal parameters but also test multiple theoretical distributions.

HALFWAY HOUSE EXAMPLE USING BESTFIT

Uncertainty about Parameters and Bayesian Updating

Discrete Case

B = {Killer in a Murder Case}

B  {B1, B2, B3}, where; B1 = Hunter, B2 = Near Sighted Man, B3 = Sharp Shooter

After interrogations, interviews with witnesses, we are able to establish the following prior distribution.

Pr(B= B1)=0.2, Pr(B= B2)=0.7, Pr(B= B3)=0.1.

Evidence A becomes available, being that the victim was shot from 2000 ft. We establish the following probability model.

Pr(A|B1)=0.7, Pr(A|B2)=0.1, Pr(A|B3)=0.9.

We update our prior distribution using the evidence into a posterior distribution using Bayes Theorem.

Pr(A) = Pr(A|B1)Pr(B1)+ Pr(A|B2)Pr(B2)+ Pr(A|B3)Pr(B3)

= 0.70.2+0.10.7+0.90.1=0.30

Conclusion:

Refocus investigation on Hunter and Sharp shooter.

B. Continuous Case

Two calculations in above diagram have not been specified:

Calculating the Predictive Distribution

Probability Model: , e.g. X Bin(N,p).

Prior distribution on : e.g. f(p) = Beta(n0,r0)

To calculate predictive distribution apply Law of Total Probability for the continuous case:

SOFTE PRETZLE EXAMPLE CONTINUED

Y = # Customers out of N that buy your pretzle, Y Bin(N,p),

Where p is your market percentage. You are uncertain about p and you decide to model your uncertainty using a Beta distribution. p  Beta(n0,r0).

But:

looks like a Beta(n0+N ,r0+k) distribution without the term

Thus:

Finally:

Note:

In Soft Pretzel Example you decide to set f(p) = Beta(4,1) or in other words n0,=4, r0=1. Thus,

Use Excel Spreadsheet to perform the Calculations

Conclusion:

E.g. Pr(Y>10|N) = 1- 0.8758=0.1242, you believe there is approximately a 12.5% chance that you will sell more than 10 pretzels.

Calculating the Posterior Distribution

Probability Model: , e.g. X Bin(N,p).

Prior distribution on : e.g. f(p) = Beta(n0,r0)

Observed data:

To calculate the posterior distribution apply Bayes Theorem for the continuous case:

SOFTE PRETZLE EXAMPLE CONTINUED

Y = # Customers out of N that buy your pretzle, Y Bin(N,p),

Where p is your market percentage. You are uncertain about p and you decide to model your uncertainty using a Beta distribution. p  Beta(n0,r0).

Suppose you observe the following data: D = (N,k), i.e. k out of N customers bought your pretzle. Then

, the predictive distribution that we just calculated.

Conclusion:

The posterior distribution is ALSO a beta distribution but with parameters (n0+N,r0+k).

Definition:

The prior distribution and the theoretical probability model are such that the prior distribution and the posterior distribution belong to the same family of distributions

Prior Distribution and Theoretical Probability Model are Conjugate distributions.

SOFT PRETZLE EXAMPLE CONTINUED:

In Soft Pretzel Example you decide to set f(p) = Beta(4,1) as your prior distribution on the market percentage p or in other words n0,=4, r0=1. You observed that out of 20 potential customers 7 bought your pretzle. Thus D=(20,7). Thus the posterior distribution of the market percentage p is

f(p|D)= Beta(4+20,1+7)= Beta(24,8)

Conclusion:

The good news is that with the observed data D=(20,7) I am becoming more certain about my market percentage p. However, what does this mean with respect to my investment in a soft pretzle stand. Is there some bad news too?

What about the posterior predictive distribution?

Recall:

In Soft Pretzel Example you decide to set f(p) = Beta(4,1) or in other words n0,=4, r0=1. Thus,

After observing data D:

f(p|D) = Beta(24,8) or in other words

Conclusion:

After observing the data Pr(Y>10|N, D) = 1- 0.9065=0.0935. Thus your updated belief says there is approximately a 9.3% chance that you will sell more than 10 pretzels. In addition, you observe that your posterior has less uncertainty than the prior. Hence you are becoming more certain that selling soft pretzles may not be a good investment.

Conjugate Analysis for Normal Distributions

Predictive Analysis Table

Probability Model / Prior / Predictive

/ 
/ 

Posterior Analysis Table

Probability Model / Prior / Posterior given
D =

/ 
/ 

Where:

Assignment:

Study conjugate analysis for Halfway House Example on pages 392-396.

Question 10.19

A comptroller was preparing to analyze the distribution of balances in the various accounts receivable for her firm. She knew from studies in previous years that the distribution would be normal with a standard deviation of $1500, but she was unsure of the mean value She thought carefully about her uncertainty about this parameter and assessed a normal distribution for with mean m0 = 10,000, 0 = 800.

Over lunch, she discussed this problem with her friend, who also worked in the accounting division. Her friend commented that she also was unsure of but would have place it somewhat higher. The friend said that “better” estimates for m0 and 0 would have been $12000 and $750, respectively.

Define:

Y:= Balance in Accounts Receivable

Then: ,  = $1500

Find P($11000) for both prior distributions.

For the Comptroller:

PN( > 11,000 | m0 = 10,000, 0 = 800)

= P(Z > (11,000-10,000)/800) = P(Z > 1.25) = 0.1056.

For her friend,

PN( > 11,000 | m0 = 12,000, 0 = 750)

= P(Z > (11,000-12,000)/750) = P(Z > -1.33) = 0.9082.

That afternoon, the comptroller randomly chose nine accounts and calculated=$11,003. Find the posterior distributions of Find P($11000) for both posterior distributions.

For the Comptroller:

Thus, PN( > 11,000 | m* = 10,721, * = 424)

= P(Z > (11,000-10,721)/424) = P(Z > 0.66) = 0.2546.

For the friend:

Thus, PN( > 11,000 | m* = 11,310, * = 416)

= P(Z > (11,000-11,310)/416) = P(Z > -0.75) = 0.7734.

A week later the analysis had been completed. Of a total of 144 accounts (including the nine reported in part b), the average was =$11,254. Find the posterior distributions of Find P($11000) for both posterior distributions.

For the Comptroller:

Thus, PN( > 11,000 | m** = 11,224, ** = 123.5)

= P(Z > (11,000-11,224)/123.5) = P(Z > -1.81) = 0.9649.

For the friend:

Thus, PN( > 11,000 | m** = 11,274, ** = 123.3)

= P(Z > (11,000-11,274)/123.3) = P(Z > -2.22) = 0.9868

Discuss your answers to parts a,b, and c. What can you conclude?

Eventually the data overwhelm any prior information. In the limit, as more data are collected, the comptroller and her friend will end up with the same posterior distribution.

Lecture notes by: Dr. J. Rene van DorpSession 9 - Page 1

Source: Making Hard Decisions, An Introduction to Decision Analysis by R.T. Clemen