CS B553 Homework 7: Parameter and Structure Learning in Chow-Liu Tree Networks
Due date: 4/10/2012
In this assignment you will be looking at a very simple type of Bayesian network model called a Chow-Liu tree. Chow-Liu trees are useful because they provide a class network structures that are easy to learn. (In fact, they predate Bayesian networks.) In this model, the joint distribution over X1,…,Xn is decomposed into first-order conditionals as follows:
where R is the subset of root nodes, and for non-root nodes pai gives the index of i’s only parent. So, we can encode the graph structure through the encoding G=(pa1,…,pan) with pai{0,1,…,n}, where we take the convention that any node with pai=0 is taken to be a root. (Note that we’re allowing the encoding to produce forests rather than constraining the network to be a tree)
We’ll assume all variables are binary. You may wish to use the notation xi0 to indicate the event that Xi=0 and xi1 to indicate the event that Xi=1.
- List all of the parameters in the CPTs of a Chow-Liu tree network with structure G=(pa1,…,pan), with node Xr being the root. Gather these into a single parameter vector .
- Given a dataset D=( x[1],…,x[M]) of fully observed samples, write down the expression l(D; ,G) giving the log-likelihood of D given under the structure model G. Use the textbook’s notation M[‘event’] to indicate the number of times ‘event’ occurs in the dataset D.
- Give a formula for the maximum-likelihood estimate * given the data D. What is the log-likelihood l(D; *, G)?
- Consider the model G0=(0,…,0), i.e., the entirely disconnected network. Now consider adding a new edge j->i by setting pai = j to derive the model G =(0,…,0,j,0,…,0) (j is in the i’th index). What is the change in log-likelihood of the maximum-likelihood parameter estimates when changing from G0 to G? That is, compute l(D; , G) – l(D; 0, G0) where is the MLE for model G and 0 is the MLE for model G0.
- Now consider an arbitrary graph G0 in which Xi is a root node, and suppose that you could add the edge j->i to obtain a graph G that is also a forest. What expression do you get for l(D; , G) – l(D; 0, G0), with the MLE for G and 0 the MLE for G0?
- Consider an algorithm that starts with a completely disconnected network and greedily starts to add edges j->i, ensuring that each new edge does not induce a cycle in the graph. Edges are added in order of largest to smallest improvement in likelihood. Demonstrate that it produces a tree rather than a forest. Give its running time in big-O notation. Furthermore, prove or argue that it produces the Chow-Liu tree with maximum likelihood over all trees (hint: consider how this relates to Prim’s algorithm for minimum spanning trees).