Graphical models
Svend Kreiner
Dept. of Biostatistics, University of Copenhagen
Graphical models for symmetrical relationships
Definition 1.
A graphical model is defined by a set of assumptions concerning conditional independence by pairs of variables given the remaining – the rest of the – variables of the model.
The properties of graphical models are encapsulated in a mathematical graph, a set of nodes and edges between nodes, as shown in Figure 1.
Independence graph defined by the following five assumptions:
A╨C |B,D,E,F
A╨F |B,C,D,E
B╨D |A,C,E,F
D╨F |A,B,C,E
E╨F |A,B,C,D
Independence graphs are second
order mathematical models.
A model is an intentionally simplified
representation of some kind of system.
Certain properties of models correspond
to properties of the system.
Certain properties of the system is not represented in the model.
Certain properties of the model do not represent anything
A statistical model is a mathematical (probabilistic)
model of reality.
An independence graph is a mathematical
model of a probabilistic model.
Certain graph theoretical properties of independence graphs correspond to properties of the statistical model.
Certain properties of statistical models have no
counterpart in independence graphs
Certain properties of independence graphs can not
be interpreted as properties of statistical models.
Graphical models for discrete data are loglinear.
Generators are defined by the cliques of the independence graph.
Cliques/generators: ABE, ADE, CDE, BCEF.
Models collapse onto graphical marginal models.
Collaps over D: P(A,B,C,E,F)
The conditional distribution of a subset, of the variables given the remaining variables will be a graphical model.
P(A,B,C,E,F|D)
The separation theorem: Separation implies conditional independence.
Paths without detours from D to F:
D-E-F
D-C-F
D-A-B-F
D and F are separated by
A, E, and C D ╨ F | A,E,C
or
B, C, and E D ╨ F | B,E,C
Separation implies parametric collapsibility
in loglinear models.
All indirect paths from A to D goes through (B,E) and (C,E).
Parameters relating to A and D is the same in P(A,B,C,D,E,F) and in the marginal models P(A,D,B,E) and P(A,D,C,E)
Marginal models sometimes have a simpler parametric
structure than implied by the marginal graphical model.
Decomposition by separation of complete subsets leads to decompositions of statistical models implying collapsibility in terms of likelihood inference for certain types of models.
The BCE clique separate A and D from F
(A,D)╨ F|(B,C,E)
The topography of marginal models.
Marginal model: P(A,B,C,E)
exterior – all variables and edges not in the marginal model: The exterior of P(A,B,C,E) is equal to variables D and F and edges AD, ED, CD, BF, EF, and CF
boundary - all variables connected to a disconnected subset of variables in the exterior. Two boundaries: AEC and BEC.
Boundaries are always complete and fixed in a marginal model.
border - the set of all boundaries
interior – variables and edges of a marginal model that are not included in any boundary. The interior of P(A,B,C,E) is the AB-edge.
problem core – the smallest irreducible component onto which the model may be collapsed for an analysis of a given problem.
P(A,B,C,E) is not the core of the problem of estimating the AB association.
Graphical regression models
Dependent variables: Y = (Y1,...,Yr),
Independent variables: X = (X1,...,Xs).
Definition 2.
A graphical regression model is a graphical model of P(Y|X).
Conditional independence assumptions are restricted to
Yi ╨ Yj | Y1,..,Yi-1,Yi+1,..,Yj-1,Yj+1, .,Yr,X1,..,Xs
Yi ╨ Xj | Y1,..,Yi-1,Yi+1,..,Yr,X1,..,Xj-1,Xj+1,..,Xs
Independence graph for a graphical regression model of P(a,c,h|b,d,e,f,g).
Chain graph models
Recursive models:
Variables V = (V1,…,Vk)
Disjoint subsets of variables :(U1,…,Ur) such that
Definition 2
A chain graph model is a block recursive model where P(Ur) is a graphical model and each of the components P(Ui | Ui+1,..,Ur) are graphical regression models.
Directed independence graph of a chain graph model
Regression graphs implied by a chain graph model
1