Kullback–Leibler (KL) divergence of two Gaussians
In variational inference, we usually assume variational distribution of posterior distribution to be Gaussian. In the end, in many contexts, we may get to compute KL divergence of two Gaussians and this is tractable (in the process of maximizing variational lower bound). Note that KL divergence is a measure of how the distributions are similar. In this article, we prove the computation of KL divergence of two Gaussians is indeed tractable.
Assume two random variables that follow different Gaussian distributions.
Note that the assumption can be relaxed to random vectors with multivariate Gaussian distributions. Then following procedure of computing KL divergence of and is done.
Comments