Bayes and Empirical Bayes Methods

September 1, 2009 page 1

Biostatistics 2063, Bioinformatics 2059, Sept 1 - Dec 17, 2009

Bayes and Empirical Bayes Methods

Tuesdays and Thursdays 11:30am-12:55pm, ParkvaleBankBuilding

Instructor: Roger Day, 412-609-3918, (note upper-case B)

Office hours: by arrangement.

Why take this course?

You need a good foundation to understand and apply
the EM algorithm, Monte Carlo Markov Chain methods, multiple imputation,
empirical Bayes methods, mixed effects models,
Bayesian networks, causal analysis, decision trees.
Actual application of Bayesian methods is rapidly growing in Biostats and related fields.
Bayesian methods are “coherent” and suitable for decision-making support.
Bayesian methods are cleaner for experimental design, multiple comparisons, etc.
Bayesian methods make you more honest about your subjectivity.

Learning Objectives

Identifying strengths and weaknesses of Bayes methods vs frequentist methods.
Relating decision theory concepts to the foundations of Bayesian and Frequentist theory.
Applying Bayesian methods to decision problems.
Developing and critiquing subjective priors.
Developing and critiquing objective/reference priors.
Calculating using conjugate priors.
Applying the likelihood principle in study design, sequential analysis, and multiple testing.
Analyzing hierarchical models using Bayesian methods.
Analyzing hierarchical models using empirical Bayes methods.

Books – Recommended, not Required:

Spiegelhalter, Abrams, Myles.Bayesian Approaches to Clinical Trials and Health-Care Evaluation,John Wiley & Sons 2004.
Gelman, Carlin, Stern, Rubin. Bayesian Data Analysis.
Berger. Statistical Decision Theory and Bayesian Analysis.
P.D. Lewis.R Programming for Medicine and Biology (Nones and Bartlett Series in Biomedical Informatics)
Gelman and Hill. Data Analysis Using Regression and Multilevel/Hierarchical Models

Student responsibilities:

Asking questions when things aren’t clear.

Honesty.

Frequent quizzes, weekly homeworks, mid-term, final, reading assignments.

September 1, 2009 page 1

Some key developments of probability & statistics

Blaise Pascal, Pierre Fermat / 1654 / Fair price for gambles
James Bernoulli / 1713 / Probability = degree of certainty
Law of large numbers  probabilities can be estimated from long range frequencies
Thomas Bayes / 1763 / Probability = " Inverse probability ", based on "expectation"
Probability is updated by data
Marquis de Laplace / 1812 / Probability = degree of belief
J.S. Mill, Richard Ellis, Jakob Fries / 1842 / Probability = frequency in long run
James Clerk Maxwell / 1850s-80s / Entropy; the arrow of time
K.Pearson, RA Fisher / 1890s-1940s / "Classical statistics"; likelihood theory; significance testing
David Hilbert,
Andrei Kolmogoroff / 1900s-1930s / Probability = abstract mathematical concept from axiom system
Schroedinger, Heisenberg / 1920s-30s / Quantum Mechanics
Probability = squared modulus of a projection of a complex-valued wave function !!
Probability at the heart of the universe??
E.S.Pearson, J. Neyman / 1930s-1950s / Decision theory basis for frequentism
Harold Jeffreys / 1930s / Inverse probability; "voice in the wilderness"; applications in astronomy, geology, physics
LJ “Jimmy” Savage
Frank Ramsey
Bruno DeFinetti / 1950s / Revival of “subjective” probability, based on axiom systems
E.T. Jaynes / 1960-80s / Entropy as a measure of ignorance (for developing Bayesian priors)
Harold Robbins / 1950s-60s / Empirical Bayes, hierarchical models
Brad Efron, Carl Morris / 1970s / Popularization of Empirical Bayes
Art Dempster,
Nan Laird, Don Rubin / 1970s-80s / EM algorithm for hierarchical models & other "missing data" problems
Nicholas Metropolis
Geman & Geman,
AFM Smith, A Gelfand,
M Tanner, many others / 1953
1980s-90s / Monte Carlo Markov Chain computational methods

September 1, 2009 page 1

Overview of Bayesian calculation:

A) Define the problem

Goal: Get the joint distribution of everything random & relevant.

Method: Define notation, factor joint distribution, calculate each part.

Notation:

Role / What is known / Unknown, we don’t care about it / Unknown, we do care about it

Symbol

/ X /  / 
Interpretation / Observed data;
anything else that is known. / Nuisance parameter,
Choice of model,
Choice of prior / Parameter,
future observable,
asymptotic observable

Joint distribution:

B) Apply Bayes “theorem”: Calculate the (posterior) distribution of the quantity of interest:

Role / What is known / Unknown, we don’t care about it / Unknown, we do care about it

Symbol

/ X /  / 
What to do with it / Condition on it! / Integrate it out! / Learn from posterior distr!
Make a decision!

[A note on the joint distribution of everything : we seem to be combining two different kinds of probability. (What are they?) Whether that’s legitimate is subject to philosophical critique and defense. If you become especially interested, see the work of Abner Shimony, Lane and Sudderth (but not yet)]

C) Decision-making:

Goal: Given a posterior distribution [| X], choose the best action to take.

Method: comparing Bayes expected loss for various actions:

C.1. Specify the loss function , so that for  in  and a in A (actions a, action space A) represents the loss ensuing if you take action a when the true state of nature is .

C.2 Calculate the Bayes expected loss:

C.3 Choose Bayes action,

All this is done after taking account all current knowledge (both current data and prior).

Example: unknown is in  = {S,H}, data is in X = {P,N}, action is in A = {T,W}.

Interpretation:

=H “patient is healthy” =S “patient is sick”

X=P “test is positive” X=N “test is negative”

A=T “decide to treat” A=W “decide to wait”

Prior = prevalence = Pr (=S)

Model =

Posterior =

[X,] = [] [X | ] = [X] [| X]

joint = marg’l cond’l = marg’l cond’l

joint = prior model = marg’l posterior

Loss table A where A={ T, W }

T (treat the patient) / W (wait)
Healthy (“null hypothesis”) / Loss(T,H) = LTH / Loss(W,H) = 0
Sick (“alternate hypothesis”) / Loss(T,S) = 0 / Loss(W,S) = LWS

Bayes expected loss:

E(Loss | P, T ) = LTH Pr(H | P) = LTH Pr(P | H) Pr(H) / Pr(P)

E(Loss | P, W ) = LWS Pr( S | P) = LWS Pr(P | S) Pr(S) / Pr(P)

Choose T over W(“the Bayes action is T”) if

E(Loss | P, W ) > E(Loss | P, T ), i.e.

LWS Pr(P | S) Pr(S) / Pr(P) > LTH Pr(P | H) Pr(H) / Pr(P)

LWS Pr(P | S) Pr(S)

LTH Pr(P | H) Pr(H)

{loss ratio} {likelihood ratio} {prior odds }

Example:

1) If prevalence = 0.001, sensitivity = 0.99, specificity = 0.99, loss ratio=1,

and a “P” test result is observed,

2) thenprior odds  1000-1, likelihood ratio  100,

3) so the Bayes action is W (“watchful waiting”; do nothing for now).

[What is the posterior odds?]

For Thursday:

1. Know the “Chart of Unknown Quantities” (page 3) cold!!  (Yes, there will be a quiz.)

2. Suppose we know that the test has different sensitivity and specificity in two different kinds of patients, depending on a characteristic  of the patient, for example the result of some other test (P=positive or N=negative). We know all four values (sensitivity & specificity for =0 and =1), but we don’t know what  is for this patient. Describe, in words, how Steps A-C above should be applied. Write your thoughts in words, & be ready to discuss it. Hint: use the Chart of Unknown Quantities.