Scaling up
n KL divergence decomposes with the network:
n Leads to DRisk decomposing with the network
n Need small approximation*: Pq(ui | q) = Pqx(ui | q)