1

Boolean algebra, Boolean nodes

Encyclopedia of Medical Decision Making, 2009

contribution by J. Hilden,

Boolean algebra, Boolean nodes

A Boolean, or logical, variable is one that can take the values T (true) or F (false), and the Boolean, or logical, algebra, pioneered by George Boole (1815-1864), holds the formal machinery that allows such truth values to be logically combined.Boolean principles offer a framework for handling questionnaire and symptom data of the common binary kind (Yes vs. No, normal (‘negative’) vs. abnormal (‘positive’), etc.). For clinical decisions, even measurements, probabilities, etc.,often have to be dichotomized (binarized).Library searches exploit Boolean AND, OR and NOT; and the digital computer is essentially a huge number of electronic switches (On vs. Off) connected in a Boolean manner, marching to the beat of a clock. Boolean principles also underlie logical checking of rule-based decision support systemsfor inconsistencies, incompleteness and redundancy.

The algebra

For example, let A, B, C, … be diagnostic tests or, more precisely, the Boolean variables that hold the answers to ‘Did test Acome out positive?’, etc.Boolean negation, alias NOT,swaps T and F, indicating, in our example, whether a test came out negative:

¬A= (not A) = (false if A is true; true if A is false) = (F if A, otherwise T).

Note that¬(¬A) = A. Other basic operations are AND and OR:

AND (‘Did both A and B come out positive?’):

AB = (A and B) = (T if bothA and B, otherwise F).

OR (‘Did A or B, or both, come out positive?’):

AB = (A or B) = (F if neither A nor B, otherwise T) = ¬(¬A ¬B).

The mirror image of the right-mostidentity also works:

AB = ¬(¬A¬B) = (F if one or both of A and B are false, otherwise T).

Set theory involves set intersection (), union () and complementing, which are the precise analogues of ,  and ¬, respectively.

OR and AND are associative operations: more than two terms can be ORed (ANDed), in arbitrary order, to reflect ‘at least one true’ (‘all true’). Distributive properties include:

(AB)(AC) = A (BC), (AB)(AC) = A(BC).

The former may be read: to be a ‘female diabetic or female with hypertension’ means to be a ‘female with diabetes or hypertension.’ Self-combination:

(AA) = (AA) = A .

Finally, test A must be either positive or negative, but cannot be both:

(A¬A) = T, (A¬A) = F .

The former expression is a tautology, i.e. a necessarily true proposition.

The OR described so far is the inclusive OR, cf. “or both”above. Informatics (checksums, cryptography) makes frequent use of the exclusive OR, abbr. EXOR. Equivalence(≡),in the sense of having the same truth value, and EXOR are each other’s negations:

(A ≡ B) = (A and B are both true or both false),

(A EXOR B) = ¬(A ≡ B) = (one of A and Bis true, not both) .

Now, ((A EXOR B) EXOR C) is true if just one or all three terms are true. Extending this rule to repeated EXORs of multiple Boolean terms, one finds that the result is true if the number of true terms is odd, and false if it is even.

The implication symbol – a warning

The implication symbol (→)is a treacherousabbreviation:

(A →U) = (A ‘implies’ U) = (U ¬A) = (if A, then U; otherwise T) .

Spelling it out:if A, then the expressionreproduces the truth value of U; if not A, then the result isT – regardless of U!

This is different from ‘A causes U,’however construed.It is also different from the language of decision recommendations and deductions in rule-based decision support systems. When you come acrossa statement like ‘If symptom A is present, diagnosis U is applicable,’ you take it to tell you nothing about patients without symptom A. If told that the statement is untrue, you take that to mean that Aalone should not triggerlabel U; and this is exactly how a rule-based system would react to cancellation of ‘If A, then U.’ The Boolean negation¬(A →U) = (¬UA), on the other hand, would claim that, despite symptom A, diagnosisUwasnot made (in a particular case), or: every patient has symptom A and diagnosisU is never made (when read as a general rule).

Physical analogues

Aand B may be gates. When they must be passed one after the other, entry is possible if and only if both gates are open. Symbolically,E= AB. If, on the other hand, A and B are alternative routes of entry,E = AB (is at least one gate open?).Electrical switches connected ‘in series’ or ‘in parallel’ are analogous.Diagnostic tests in series and in parallel, see below.

Boolean data in programming – Boolean nodes

Most programming languages make available a Boolean, or Logical, data type. It may look thus:

Boolean L; #declares L to be a Boolean variable#

L := (n < 20); #L becomes true or false depending on the value of n#

if( L )print(“n small”); #to be printed only if n is below 20#

Programming languages have different ways of writing AND and OR.

Often, however, it is convenient to let T and F be represented by 1 and 0, leading to:

¬A = (1 – A),

ABC … = ABC… [an ordinary product of zeros and ones]

= min(A, B, C, …) ,

ABC… = max(A, B, C, …).

Conversely, mathematicians often write  and for min and max; soxy = min(x, y), the smaller of x and y.Checksum calculations, involving EXORing of multiple 0-or-1 digits, produce the result 1 precisely when the number of component 1’s is odd, as stated above, i.e. when their ordinary sum is odd:

EXOR(A, B, C, …) = (1 if A + B + C + … is odd; otherwise 0) .

Boolean nodes

A standard decision tree has a stem (clinical problem) and two types of splits: chance nodes and decision nodes.Between the stem and the leaves, each of which represents a possible clinical diary, the tree gets increasingly bushy because there is no way branches can rejoin. In practice, many subtrees are nearly identical and can be drawn, and programmed, just once. They can then be entered from various points as subroutines, with arguments that let them inherit patient features from the calling point.

Numerical arguments allow probabilities and utilities to be context-dependent (e.g., age-dependent); Boolean argumentsallow the subtree to be structurally modified. They govern specialBoolean nodes, which act as switches. The Boolean argument ‘Has the patient already had a hemicolectomy?’maygovern a switch that blocks a decision branch involving hemicolectomy. Likewise, in modeling an annual screening program, the same annual subtree may have to be entered recursively: a Boolean switchmay serve to prevent endless recursion.

Boolean nodes allow compact, minimally redundant, trees to be drawn – sometimes at the expense of intelligibility. (Some authors have used the term for other kinds of two-way nodes.)

Boolean matrices

Consider a graph of N interconnected nodes (vertices). AnN×NmatrixC = {Cij} of Boolean elements may represent the interconnections (arcs, edges),Cij being T if and only if node i is directly connected to node j. Both directed and non-directed graphs may be represented in this way; in the latter case, the matrix is symmetric (Cij= Cji). The diagonal elements, Cii, are set to Funless self-referring nodes make sense, as they do in state-progression models, includingMarkov chains.Matrix C is called the adjacency matrix of the graph.

The matrix product D = CC, with addition and multiplication replaced with OR and AND, now answers the question: can one pass in precisely two steps from nodei to node k? Readers familiar with matrices will see that

Dik = j=1,…,N{CijCjk } = (Ci1C1k) (Ci2C2k) . . .

= (is there at least one node j that can be used as a stepping stone?) .

This idea can be elaborated to answer many types of connectedness questions.Example: In this directed graph, the arcs are numbered for convenience, and the elements in matrix C that go into the calculation of DA,Dare shaded.

Adjacency matrix C (edge absent marked with a dot; edge numbers shown for clarity)

To
From / A / B / C / D / E
A / . / T (1) / . / . / .
B / . / . / T (2) / T (3) / .
C / . / . / . / . / .
D / . / . / . / . / T (4)
E / . / . / . / . / T (5)

Matrix D = CC (by aligning the ‘From A’ row with the ‘To D’column in matrix C, both shaded, one sees that the only match is in the second position, implying that node B is the only stepping stone from A to D, i.e. route 1-3)

To
From / A / B / C / D / E
A / . / . / T(12) / T (13) / .
B / . / . / . / . / T (34)
C / . / . / . / . / .
D / . / . / . / . / T (45)
E / . / . / . / . / T (55)

The Boolean approach to diagnostic tests

The 2k possible outcomes ofk binary tests, and thek! possible sequences of execution, may render ordinary decision trees unmanageable: one ends up with2kk! leaves, or48 when k = 3 (80 when testsmay be executed simultaneously). A shortcut is offered by the following, much simpler, Boolean procedure, supplemented, if necessary, by an ad hoc analysis to find the least costly or riskyexecution scheme (flow-chart).

After tabulating the 2k outcomes and the associated clinical actions found by utility maximization,supposeit turns out that intervention U is recommended when, and only when, tests A, B and Chave the result patterns: (+ + +), (+ + –), (– + +) or (– – +). The Boolean representation U = (ABC)(AB¬C)(¬ABC)(¬A¬BC) reduces to U = (AB)(¬AC), suggesting that test A should be performed first and, depending on the result, B or C will decide. However, it also reduces to U = (BC) ((B EXOR C)(A ≡ B)), suggesting that one should begin with Band C and, if they disagree, check whether Asides with B.

The former execution schemeis attractive when A is inexpensive and risk-free, the latter when A is costly or risky. Unless the choice is obvious, one must go through all contingencies and calculate expected money and utility costs. These concerns, as well as the constraints that arisewhen tests are technically intertwined,can be handled in a decision tree butthe answer found by the Boolean procedure is otherwise exactly the one a decision tree would give.

In other words, the two procedures lead to the same test interpretation scheme but the Boolean procedure may require ad hoc supplementary calculations to find the optimal test execution scheme.Warning: Speaking of testsA and B being applied in parallel (in series) may refer to the tests being executed simultaneously (vs. one after the other), but some authors are thinking of the gate analogy above and therefore use the parallel vs. series terminology to characterize two interpretation schemes, viz. U = (AB) vs. U = (AB).

As a by-product, the Boolean procedure reveals whether any tests aremandatory (needed in all cases) or redundant (dispensable). In the small artificial example this did not happen.

The 1986 paper that popularized these techniques also illustrated some geometric features that the ROC diagram will possess when the lattice of all Boolean combinations of several binary tests is plotted.

Jørgen Hilden

See also Diagnostic process; Decision support systems; Subtrees (use in constructing decision trees); Causal inferences and diagrams; Markov models; ROC.

Further references

Boole, G. (1854). An investigation of the laws of thought. London, Macmillan.

Garcia-Remesal, M., Maojo, V., Laita, L., Roanes-Lozano, E., & Crespo, J. (2007). An algebraic approach to detect logical inconsistencies in medical appropriateness criteria.Conf Proc IEEE Eng Med Biol Soc, 1, 5148-5151.

Glasziou, P., & Hilden, J. (1986). Decision tables and logic in decision analysis. Medical Decision Making, 6, 154-160.

Lau, J., Kassirer, J.P., & Pauker, S.G. (1983). Decision Maker 3.0: improved decision analysis by personal computer. Medical Decision Making, 3, 39-43.