Active Learning for
Structure in Bayesian Networks
|
|
|
Simon Tong |
|
Daphne Koller |
|
|
|
Stanford University |
Motivation
|
|
|
|
Scientific understanding » causal model |
|
Understand process of cancer formation
& progress |
|
|
|
How do we choose experiments wisely to
gain the most understanding from them |
|
Limited number of experiments on
laboratory mice |
|
Which experiments should we perform? |
|
|
Bayesian Networks*
Bayesian Networks
Bayesian Networks
Bayesian Networks
Bayesian Networks
Formal Problem Statement
|
|
|
|
Given: |
|
Set of domain variables |
|
|
|
Experiment: |
|
Can control some query variables Q by
setting Q := q |
|
Obtain a response X=x — a complete
assignment to the other variables |
|
|
|
Goal: |
|
Learn a causal model — structure of causal
Bayesian network |
How Can Interventions
Help?
Active Learning
Active Learning: Our
method
|
|
|
|
A model |
|
Our current hypothesis |
|
|
|
A measure of quality |
|
How good our model is |
Structure Estimation
|
|
|
|
Bayesian Estimation |
|
Maintain distribution over graphs and
params P(G,qG) |
|
|
|
|
Updating the Model
|
|
|
|
|
Given a new data instance (Q:=q, x): |
|
Update P(G,qG) to get posterior P(G,qG | Q:=q, x) |
|
P(G,qG | Q:=q, x) is
our new model |
|
A distribution over graphs and
parameters |
|
|
|
Given a model P(G,qG) in principle we can obtain P(X®Y) |
|
|
|
|
Model Quality I
|
|
|
Want to be as certain about the structure
as possible |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use the loss function: H(B«A) = |
Model Quality II
|
|
|
In general: |
|
|
|
|
|
|
|
|
|
Want to be as certain about edges as
possible |
|
|
Expected posterior loss
|
|
|
Choose query that will improve expected
posterior loss as much as possible: |
|
|
|
|
|
|
|
|
|
|
Fixed Ordering
|
|
|
|
Restrict to networks consistent with a fixed
total ordering p of the nodes* |
|
A®B Þ A p B |
|
|
|
Each node Xi has a set Wi
of at most k possible candidate parents** |
|
|
|
Number of allowed structures still
exponential |
Fixed Ordering
|
|
|
|
Choice of parents for node X now
independent of choice of parents for other nodes |
|
Intuitively, because cycles no longer
an issue |
|
|
|
Can manipulate P(G | D, p) in closed form |
|
Can compute Hp(A«B) efficiently |
Expected Posterior Loss
|
|
|
Theorem: |
|
|
|
|
|
|
|
|
|
|
|
where both y and f
have efficiently computable closed form expressions |
Inference in Graphical
Model
|
|
|
Each term y, f is a potential function
over a small subset of variables |
|
Desired result is sum over all
variables except Q |
|
Can use standard inference techniques |
Expected Posterior Loss
|
|
|
We can closed this efficiently in
closed form |
Unrestricted Orderings
|
|
|
|
Sample orderings* |
|
Use MCMC Metropolis Algorithm |
|
|
|
Suppose want to compute expected
posterior loss of asking query Q:=q |
|
For each ordering, compute expected
quality |
|
Average over all sampled orderings |
The Algorithm
|
|
|
Sample orderings |
|
Compute ExPLoss for each query |
|
Choose query that minimizes ExPLoss |
Experiments
|
|
|
|
Cancer |
|
Network |
|
5 nodes |
|
|
|
Candidate parents of X: |
|
k £ 5 parents for each node X |
|
nodes with highest individual mutual
information with X |
|
|
|
MCMC for sampling orderings |
|
Maintain 50 – 75 chains |
|
|
|
Compared Active with Random and Uniform |
|
|
Cancer Network: Root
Control
Cancer Network:
Interventions
Car Network:
Interventions
Original Cancer Network
Random Samples
Uniform Intervention
Active Learning
Conclusions
|
|
|
|
Framework for active learning for
causal structure |
|
Theoretical analysis supported by experimental
results |
|
|
|
Efficient algorithm |
|
Closed form distribution for fixed
ordering |
|
MCMC samples of orderings |
|
Use of graphical model inference to
perform exponential computation |
|
|
|
Future work: missing data, continuous
variables |