1

COT DEATHS, BAYES’ THEOREM AND PLAIN THINKING.

By Mike Disney

Many British women are serving life imprisonment for murdering their own infants, some having been condemned solely on the basis of an entirely fallacious statistical argument. Readers will want to understand that fallacy before they join in the chorus of disapproval directed at the courts and their expert witness Sir Roy Meadow the distinguished paediatrician. Many will be mortified to find that they cannot spot the fallacy and so might have gone along with the jurors who actually found the mothers guilty. For reasons which I will come to later we are never explicitly taught to think straight with the result that even the most careful of us can be lured into drawing false and sometimes fateful conclusions.

Basically speaking there are two kinds of thinking: Logical Deduction argues from the general to the particular; Inference attempts to argue in the opposite direction. Deduction, which was analysed by Aristotle long ago, carries one forward from agreed premises to specific conclusions which may or may not be obvious, but which were implicit in those premises all along. It is the Mathematician’s Yes/No kind of logic which we understand so well, even if we are not very good at it as individuals, that we have built it into our digital computers.

The problem with Deductive Logic is that it only works in situations where all the relevant information, and I mean all, is known with certainty beforehand. That may be very well for Pure Mathematicians who must agree to their premises, and define them rigorously, before they can set out. But in most situations of human interest, including Science, that cannot generally be the case. Indeed in Pure Science it can literally never be the case because the scientist attempts to establish those general principles or laws by arguing backwards from specific instances – which are all he has to go by. Where some information is either lacking or uncertain, we cannot use Logical Deduction and must rely on the much more interesting but less well understood method of thinking called ‘Plausible Inference’ or sometimes Common Sense.

Plausible Inference relies not on certainty but on probability where probability is defined as “the degree of rational assent, on a scale from zero to one, that we can give to a proposition.” If I think that a proposition is definitely untrue I assign it a probability of zero; if definitely true a probability of one. To a fifty-fifty proposition I assign a probability of 0. 5, and so on. Note the “I think”s and the “I”s. As it is here defined, and as it is widely used in everyday life, probability is a subjective concept. You and I might not agree exactly, or in some cases even approximately, on the degree of assent we can give to a particular proposition – say that “Wave-hunter” is going to win the Derby. That is all right so long as we can agree on rational principles for updating our respective probabilities in the face of further evidence. The hope must be that, given enough evidence, our probabilities will eventually converge sufficiently for us to reach the same conclusion. For instance if we were jurors in the same case we might hope to agree that guilt had either been proven “beyond any reasonable doubt”, or not so proven, whatever our prior prejudices, or probabilities, might have been.

Before we get to the cot-death fallacy we need to spell out that rational procedure needed to update a Probability in the light of new evidence. We want to estimate P(HE), that is to say “The probability of some hypothesis H, in the light of new evidence E, when P(H) was our probability for the same H before the new evidence came in. ” While most of us happily update P(HE) from P(H) in an intuitive, indeed instinctual manner, there is in fact a formal way to do so using Bayes’ simple formula, which is shown below. A means “The Alternative to Hypothesis H”. Since any hypothesis must be either true or not true the sum of the probabilities P(H) and P(A) must equal certainty i.e. 1.

Never mind where Bayes’ Theorem comes from for now: suffice to say that it is a modest extension of the simplest laws of probability that could, and probably should, be taught to all 15-year olds at school. Note that to calculate the probability we want i.e. P(HE), we first need to estimate P(EH), that is to say “The probability of the evidence E, supposing that the hypothesis H were indeed true”.

Let us now remove to an imaginary courtroom where a mother is on trial for the serial murder of her two infants. The sole evidence against her is the sheer statistical improbability of two such cot-deaths in one family occurring through natural causes when roughly only one baby in a thousand is known to suffer such a natural death. The hypothesis H on trial is that the mother is innocent while the evidence E is that she has two dead infants. There are 650, 000 live-births a year in Britain annually and less than thirty children are known for sure to have been murdered by their mothers. We thus have all the information we need to make a plausible inference P(HE) about the innocence of the dead infants’ mother. For instance, in the absence of evidence, the prior probability of her guilt i.e. P(G) =P(A) would be no more than 30 in 650, 000, or roughly 1 in 20,000. It follows immediately that the probability of her innocence is 1 minus this, or 19,999 in 20,000, or 0. 99995.

Ah but there is the evidence E and the expert paediatrician EP is called to the stand to testify as to the value of P(EH), the one number still missing from the Bayes formula.

EP argues that the chance of an innocent mother having a single cot-death is roughly one in a thousand. But that her second infant should likewise die is less again by another factor of a thousand, so the combined probability of one natural cot-death after another is less than one in a thousand multiplied by itself, or one in a million. In other words EP argues that P(EH) is one in a million.

All the factors needed for Bayes’ formula are now available, except the rather obvious factor P(EA) i.e. the probability that the mother will have two dead infants if she is guilty, which must be 1 exactly. Readers will now get out their calculators and find from Bayes’ formula, with a little bit of multiplication addition and division, that P(HE) is less that 2 per cent. Therefore the chance is better than 98 percent that the mother is guilty. If the jury feel that 98% certainty amounts to “beyond any reasonable doubt” then they will convict her.

At this point I urge readers to pause for some time to consider their own verdict – they are after all in possession of exactly the same evidence as the fictional jury.

After Consideration

Some of you will have spotted one possible weakness in the fictional EP’s testimony. He argued that a second baby’s innocent death in a family is as improbable as the first

But this might not be so. Genetic or environmental considerations might plausibly predispose some mothers to have more blameless cot-deaths than others, and indeed there are some figures which suggest that a mother with one innocent cot-death already is roughly ten times as likely to suffer a second. The EP should have admitted as much and assigned to P(EH) a value ten times greater i.e. one in 100,000 instead of 1 in a million. A re-calculation of Bayes’ formula now yields a P(HE) of 0.17. In other words there is a 17 per cent probability that she innocent and therefore only an 83 per cent chance that she is guilty. The jury would probably refuse to convict in this case and the mother would be discharged – but with a nasty suspicion hanging over her head.

Alternatively the court might ask, in an intermediate case like this, for further clarification. For instance in the notorious case of Sally Clark, who was sentenced to life imprisonment in 1999, Sir Roy Meadow testified that in an affluent non-smoking family like hers the chance of an innocent cot-death was not 1 in a thousand, but 1 in

8,300, and hence that the chance of two babies dying innocently was 1 in 8,300 squared, or 1 in 73 million. A new recalculation using Bayes’ Theorem, even allowing for the second cot-death to be ten times more likely than the first, will yield a P(HE) of 0.00289 or less than three one-hundredths of one percent. The odds were better than 99. 9 percent that Sally Clark was guilty and the jury, advised by the judge, chose to convict. Once again the reader is invited to most carefully consider their own verdict.

On Reflection

The fact is that the Eminent Paediatrician’s testimony, or rather the court’s use of it, was based on a notorious statistical fallacy, a fallacy that has fooled the smartest people throughout history. Readers therefore shouldn’t feel ashamed if they would have gone along with the jury in the Sally Clark case. Indeed Sally Clark is reported to have said that she would have convicted herself if she had been sitting on the jury instead of in the dock.

So what is the fallacy? The real fallacy lies in the EP’s estimate of P(EH). He posed himself the question, call it question A, “What is the chance of an innocent mother having two dead infants?” and he reached the right conclusion: that it is very improbable indeed, especially so in the case of a non-smoking, affluent mother like Sally. But that is not, repeat not, the question that the court should have wanted an answer to. What they wanted was an estimate of P(EH) – the only unknown in the calculation. The question that should therefore have been put, call it question B, is “What is the probability that the defendant has two dead babies, even if she is innocent?” The answer of course is one, exactly one, because there is no dispute about that – innocent or guilty she does have two dead babies. But if you insert P(EH)=1 into Bayes’ formula the results are dramatic. You find P(HE) = 0. 9995, or a better than 99% probability of innocence! All the quibbles about higher probabilities for second deaths, about non-smoking, about affluence, scarcely matter. The villain has been a notorious and hard-to-spot fallacy in thinking called “The use of a posteriori statistics.” With the best of intentions the court tried to calculate the probability of innocence for the case of a mother who doesn’t already have two dead babies. But the unfortunate defendant in the dock does have two dead infants, so the calculation in her case must be quite different, and as it happens much simpler – indeed one scarcely needing an expert at all. After (a posteriori) the evidence is gathered in the probability of that same evidence is clearly 1, and not the different and possibly very small probability of actually finding that evidence in a specific case if you were to start from scratch (a priori). The conflation of an a priori calculation with an a posteriori situation is an all too easy mistake to make, and one that has sent many a poor innocent witch to the stake. Questions A and B seem superficially the same when actually they are entirely different .

The fallacy in the court’s thinking is exposed if you consider the population as a whole. If the chance of a second innocent cot-death is truly 1 in 100, 000 then among 650, 000 annual births you expect roughly 6 cases of two deaths in the same family. If you apply the aforementioned reasoning to each of them then 6 innocent mothers must go to prison every year. Since you cannot pick those innocents from amongst the guilty ones, purely on the basis of statistics, you should not apply a priori Probabilities to any of them. But if you apply the a posteriori P(EH)=1 you will , as you should, find them innocent.

Dangerous Thinking

My point is not to criticise the court, nor even the paediatric witnesses but to criticise an educational system that doesn’t train us to think more clearly, or to spot plausible fallacies like this. You don’t have to be a philosopher, or even an A-level mathematician at school, to go through the steps in Bayes’ simple theorem. You probably won’t want to use it every day, but you will certainly need to use it whenever a tricky or fateful decision, based on imperfect evidence has to be made. If nothing else it will help to identify the precise probabilities, or prejudices if you like, which go towards making that decision.

So why don’t we learn about Plausible Inference at school, or even at university, or even in those books on “How to think better”? As a professional scientist I had to teach myself PI in my late fifties. We scientists use PI constantly – indeed it is our main thinking tool. But like the general public we do so intuitively, and as a result we occasionally make the same all too plausible mistakes. Indeed Thinking about Thinking’ is considered a bit infra dig among scientists. As the zoologist Peter Medawar put it : “If you ask a scientist what The Scientific Method is he is likely to adopt a pose which is both solemn and shifty-eyed; solemn because he thinks he ought to declare an opinion, shifty-eyed because he has got nothing to declare.”

The main reasons for not teaching Plausible Inference, as opposed to the much less useful subject of Deductive Logic (also scarcely taught) are, I believe, largely historical. In the seventeenth century the Scottish philosopher David Hume taught us that Inductive Logic, that is to say the reaching of indisputable general conclusions from specific instances, is impossible. No matter how many white swans you count you cannot safely conclude that all swans are white. Since Hume, philosophers from Kant to Popper have struggled to explain how scientists nevertheless seem able to uncover very general Laws of Nature by studying specific instances. Scientists, who pay little attention to philosophers, seem to do so using Plausible Inference based on subjective Probabilities.

Unfortunately probability itself is the subject of a vicious dispute among statisticians. The purer mathematicians amongst them can claim, with some justice, that they invented Probability first, and that their Probability has nothing to do with the subjective definition I have given above – which is the one usually used by scientists. And unfortunately their champion R. A. Fisher conceived an irrational dislike for Bayesian Inference, claiming that he had developed a more palatable alternative, an alternative that is alas too complicated for the rest of us amateurs to really understand. Fortunately he was mistaken, as all but the most stubborn professional statisticians steeped in his school will now concede.

So the way is clear again to teach straight thinking. Heroes such as Bruno de Finetti , Harold Jeffries, Richard Cox, George Polya and Edwin Jaynes have wrested Plausible Inference back from the clutches of philosophers and statisticians. It works and it is not mathematically sophisticated. But it needs a clear head, a trained understanding of the rules, and an informed awareness of the most notorious pitfalls.

Personally I don’t think we should be too hard on anybody concerned with the cot-death trials. Like most scientists I have made too many mistakes to trust my own reasoning very far before submitting it to the critical attention of others. We scientists put our papers through an anonymous refereeing process – and thank goodness we do so. Readers who have found it as difficult as I have to understand the cot-death problem will be more tolerant towards the courts and their expert witnesses. Paediatricians, who are no better thinkers than the rest of us, have to face tragic cases of babies with innumerable unexplained injuries that can have been caused by no other than violent adults. Probably. We need their fearless testimony, just as we all need the understanding to weigh it properly.