Active Learning for Structure in Bayesian Networks

Simon Tong

Daphne Koller

Stanford University

Motivation

Scientific understanding » causal model

Understand process of cancer formation & progress

How do we choose experiments wisely to gain the most understanding from them

Limited number of experiments on laboratory mice

Which experiments should we perform?

Formal Problem Statement

Given:

Set of domain variables

Experiment:

Can control some query variables Q by setting Q := q

Obtain a response X=x — a complete assignment to the other variables

Goal:

Learn a causal model — structure of causal Bayesian network

How Can Interventions Help?

Active Learning

Active Learning: Our method

A model

Our current hypothesis

A measure of quality

How good our model is

Structure Estimation

Bayesian Estimation

Maintain distribution over graphs and params P(G,q_G)

Updating the Model

Given a new data instance (Q:=q, x):

Update P(G,q_G) to get posterior P(G,q_G| Q:=q, x)

P(G,q_G| Q:=q, x) is our new model

A distribution over graphs and parameters

Given a model P(G,q_G) in principle we can obtain P(X®Y)

Model Quality I

Want to be as certain about the structure as possible

Use the loss function: H(B«A) =

Model Quality II

In general:

Want to be as certain about edges as possible

Expected posterior loss

Choose query that will improve expected posterior loss as much as possible:

Fixed Ordering

Restrict to networks consistent with a fixed total ordering p of the nodes*

A®B Þ A p B

Each node X_ihas a set W_i of at most k possible candidate parents**

Number of allowed structures still exponential

Fixed Ordering

Choice of parents for node X now independent of choice of parents for other nodes

Intuitively, because cycles no longer an issue

Can manipulate P(G | D, p) in closed form

Can compute H_p(A«B) efficiently

Expected Posterior Loss

Theorem:

where both y and f have efficiently computable closed form expressions

Inference in Graphical Model

Each term y, f is a potential function over a small subset of variables

Desired result is sum over all variables except Q

Can use standard inference techniques

Expected Posterior Loss

We can closed this efficiently in closed form

Unrestricted Orderings

Sample orderings*

Use MCMC Metropolis Algorithm

Suppose want to compute expected posterior loss of asking query Q:=q

For each ordering, compute expected quality

Average over all sampled orderings

The Algorithm

Sample orderings

Compute ExPLoss for each query

Choose query that minimizes ExPLoss

Experiments

Cancer

Network

5 nodes

Candidate parents of X:

k £ 5 parents for each node X

nodes with highest individual mutual information with X

MCMC for sampling orderings

Maintain 50 – 75 chains

Compared Active with Random and Uniform

Cancer Network: Root Control

Cancer Network: Interventions

Car Network: Interventions

Original Cancer Network

Random Samples

Uniform Intervention

Active Learning

Conclusions

Framework for active learning for causal structure

Theoretical analysis supported by experimental results

Efficient algorithm

Closed form distribution for fixed ordering

MCMC samples of orderings

Use of graphical model inference to perform exponential computation

Future work: missing data, continuous variables


	Scientific understanding » causal model
		Understand process of cancer formation & progress

	How do we choose experiments wisely to gain the most understanding from them
		Limited number of experiments on laboratory mice
		Which experiments should we perform?


	Given:
		Set of domain variables

	Experiment:
		Can control some query variables Q by setting Q := q
		Obtain a response X=x — a complete assignment to the other variables

	Goal:
		Learn a causal model — structure of causal Bayesian network


	A model
		Our current hypothesis

	A measure of quality
		How good our model is


	Bayesian Estimation
		Maintain distribution over graphs and params P(G,q_G)


Given a new data instance (Q:=q, x):
	Update P(G,q_G) to get posterior P(G,q_G\| Q:=q, x)
	P(G,q_G\| Q:=q, x) is our new model
		A distribution over graphs and parameters

Given a model P(G,q_G) in principle we can obtain P(X®Y)


	Want to be as certain about the structure as possible








	Use the loss function: H(B«A) =


	Choose query that will improve expected posterior loss as much as possible:


	Restrict to networks consistent with a fixed total ordering p of the nodes*
		A®B Þ A p B

	Each node X_ihas a set W_i of at most k possible candidate parents**

	Number of allowed structures still exponential


	Choice of parents for node X now independent of choice of parents for other nodes
		Intuitively, because cycles no longer an issue

	Can manipulate P(G \| D, p) in closed form
	Can compute H_p(A«B) efficiently


	Theorem:





	where both y and f have efficiently computable closed form expressions


	Each term y, f is a potential function over a small subset of variables
	Desired result is sum over all variables except Q
	Can use standard inference techniques


	Sample orderings*
		Use MCMC Metropolis Algorithm

	Suppose want to compute expected posterior loss of asking query Q:=q
		For each ordering, compute expected quality
		Average over all sampled orderings


	Sample orderings
	Compute ExPLoss for each query
	Choose query that minimizes ExPLoss


	Cancer
	Network
		5 nodes

	Candidate parents of X:
		k £ 5 parents for each node X
		nodes with highest individual mutual information with X

	MCMC for sampling orderings
		Maintain 50 – 75 chains

	Compared Active with Random and Uniform


	Framework for active learning for causal structure
		Theoretical analysis supported by experimental results

	Efficient algorithm
		Closed form distribution for fixed ordering
		MCMC samples of orderings
		Use of graphical model inference to perform exponential computation

	Future work: missing data, continuous variables