**Active Learning for
Structure in Bayesian Networks**

Simon Tong | |

Daphne Koller | |

Stanford University |

Scientific understanding » causal model | ||

Understand process of cancer formation & progress | ||

How do we choose experiments wisely to gain the most understanding from them | ||

Limited number of experiments on laboratory mice | ||

Which experiments should we perform? | ||

Given: | ||

Set of domain variables | ||

Experiment: | ||

Can control some query variables Q by setting Q := q | ||

Obtain a response X=x — a complete assignment to the other variables | ||

Goal: | ||

Learn a causal model — structure of causal Bayesian network |

A model | ||

Our current hypothesis | ||

A measure of quality | ||

How good our model is |

Bayesian Estimation | ||

Maintain distribution over graphs and
params P(G,q_{G}) |
||

Given a new data instance (Q:=q, x): | |||

Update P(G,q_{G}) to get posterior P(G,q_{G }| Q:=q, x) |
|||

P(G,q_{G }| Q:=q, x) is
our new model |
|||

A distribution over graphs and parameters | |||

Given a model P(G,q_{G}) in principle we can obtain P(X®Y) |
|||

Want to be as certain about the structure as possible | |

Use the loss function: H(B«A) = |

In general: | |

Want to be as certain about edges as possible | |

Choose query that will improve expected posterior loss as much as possible: | |

Restrict to networks consistent with a fixed total ordering p of the nodes* | ||

A®B Þ A p B | ||

Each node X_{i }has a set W_{i}
of at most k possible candidate parents** |
||

Number of allowed structures still exponential |

Choice of parents for node X now independent of choice of parents for other nodes | ||

Intuitively, because cycles no longer an issue | ||

Can manipulate P(G | D, p) in closed form | ||

Can compute H_{p}(A«B) efficiently |

Theorem: | |

where both y and f have efficiently computable closed form expressions |

Each term y, f is a potential function over a small subset of variables | |

Desired result is sum over all variables except Q | |

Can use standard inference techniques |

We can closed this efficiently in closed form |

Sample orderings* | ||

Use MCMC Metropolis Algorithm | ||

Suppose want to compute expected posterior loss of asking query Q:=q | ||

For each ordering, compute expected quality | ||

Average over all sampled orderings |

Sample orderings | |

Compute ExPLoss for each query | |

Choose query that minimizes ExPLoss |

Cancer | ||

Network | ||

5 nodes | ||

Candidate parents of X: | ||

k £ 5 parents for each node X | ||

nodes with highest individual mutual information with X | ||

MCMC for sampling orderings | ||

Maintain 50 – 75 chains | ||

Compared Active with Random and Uniform | ||

Framework for active learning for causal structure | ||

Theoretical analysis supported by experimental results | ||

Efficient algorithm | ||

Closed form distribution for fixed ordering | ||

MCMC samples of orderings | ||

Use of graphical model inference to perform exponential computation | ||

Future work: missing data, continuous variables |