Active Learning for Structure in Bayesian Networks
Simon Tong
Daphne Koller
Stanford University

Motivation
Scientific understanding » causal model
Understand process of cancer formation & progress
How do we choose experiments wisely to gain the most understanding from them
Limited number of experiments on laboratory mice
Which experiments should we perform?

Bayesian Networks*

Bayesian Networks

Bayesian Networks

Bayesian Networks

Bayesian Networks

Formal Problem Statement
Given:
Set of domain variables
Experiment:
Can control some query variables Q by setting Q := q
Obtain a response X=x — a complete assignment to the other variables
Goal:
Learn a causal model — structure of causal Bayesian network

How Can Interventions Help?

Active Learning

Active Learning: Our method
A model
Our current hypothesis
A measure of quality
How good our model is

Structure Estimation
Bayesian Estimation
Maintain distribution over graphs and params P(G,qG)

Updating the Model
Given a new data instance (Q:=q, x):
Update P(G,qG) to get posterior P(G,qG | Q:=q, x)
P(G,qG | Q:=q, x) is our new model
A distribution over graphs and parameters
Given a model P(G,qG) in principle we can obtain P(X®Y)

Model Quality I
Want to be as certain about the structure as possible
Use the loss function: H(B«A) =

Model Quality II
In general:
Want to be as certain about edges as possible

Expected posterior loss
Choose query that will improve expected posterior loss as much as possible:

Fixed Ordering
Restrict to networks consistent with a fixed total ordering p of the nodes*
A®B Þ A p B
Each node Xi has a set Wi of at most k possible candidate parents**
Number of allowed structures still exponential

Fixed Ordering
Choice of parents for node X now independent of choice of parents for other nodes
Intuitively, because cycles no longer an issue
Can manipulate P(G | D, p) in closed form
Can compute Hp(A«B) efficiently

Expected Posterior Loss
Theorem:
    where both y and f have efficiently computable closed form expressions

Inference in Graphical Model
Each term y, f is a potential function over a small subset of variables
Desired result is sum over all variables except Q
Can use standard inference techniques

Expected Posterior Loss
We can closed this efficiently in closed form

Unrestricted Orderings
Sample orderings*
Use MCMC Metropolis Algorithm
Suppose want to compute expected posterior loss of asking query Q:=q
For each ordering, compute expected quality
Average over all sampled orderings

The Algorithm
Sample orderings
Compute ExPLoss for each query
Choose query that minimizes ExPLoss

Experiments
Cancer
   Network
5 nodes
Candidate parents of X:
k £ 5 parents for each node X
nodes with highest individual mutual information with X
MCMC for sampling orderings
Maintain 50 – 75 chains
Compared Active with Random and Uniform

Cancer Network: Root Control

Cancer Network: Interventions

Car Network: Interventions

Original Cancer Network

Random Samples

Uniform Intervention

Active Learning

Conclusions
Framework for active learning for causal structure
Theoretical analysis supported by experimental results
Efficient algorithm
Closed form distribution for fixed ordering
MCMC samples of orderings
Use of graphical model inference to perform exponential computation
Future work: missing data, continuous variables