KL divergence of two Gaussian distributions
For a multivariate Guassian random variable x∼N(μ,Σ),the probability density function is
p(x)=(2π)n/2∣Σ∣1/21exp(−21(x−μ)TΣ−1(x−μ)) - μ is n-dimensional mean vector
- Σ is n×n covariance matrix, Σ=E[(x−μ)(x−μ)T]
Two multivariate Guassian distributions p(x)=N(μ1,Σ1) and q(x)=N(μ2,Σ2).
DKL(p(x)∣∣q(x))=Ep(x)[logq(x)p(x)]=Ep(x)[logp(x)−logq(x)]=Ep(x)[−log(2π)n/2−21log∣Σ1∣−21(x−μ1)TΣ1−1(x−μ1)+log(2π)n/2+21log∣Σ2∣+21(x−μ2)TΣ2−1(x−μ2)]=21(log∣Σ1∣∣Σ2∣−Ep(x)[(x−μ1)TΣ1−1(x−μ1)]+Ep(x)[(x−μ2)TΣ2−1(x−μ2)])=21(log∣Σ1∣∣Σ2∣−Ep(x)[tr((x−μ1)TΣ1−1(x−μ1))]+Ep(x)[tr((x−μ2)TΣ2−1(x−μ2))])=21(log∣Σ1∣∣Σ2∣−Ep(x)[tr(Σ1−1(x−μ1)(x−μ1)T)]+Ep(x)[tr(Σ2−1(x−μ2)(x−μ2)T)])=21(log∣Σ1∣∣Σ2∣−tr(Σ1−1Ep(x)[(x−μ1)(x−μ1)T])+tr(Σ2−1Ep(x)[(x−μ2)(x−μ2)T]))=21(log∣Σ1∣∣Σ2∣−tr(Σ1−1Σ1)+tr(Σ2−1Ep(x)[(xxT−2xμ2T+μ2μ2T)]))=21(log∣Σ1∣∣Σ2∣−n+tr(Σ2−1Ep(x)[(Σ1+2xμ1T−μ1μ1T−2xμ2T+μ2μ2T)]))=21(log∣Σ1∣∣Σ2∣−n+tr(Σ2−1(Σ1+2μ1μ1T−μ1μ1T−2μ1μ2T+μ2μ2T)))=21(log∣Σ1∣∣Σ2∣−n+tr(Σ2−1Σ1)+tr(Σ2−1(μ1μ1T−2μ1μ2T+μ2μ2T)))=21(log∣Σ1∣∣Σ2∣−n+tr(Σ2−1Σ1)+tr(Σ2−1(μ1−μ2)(μ1−μ2)T))=21(log∣Σ1∣∣Σ2∣−n+tr(Σ2−1Σ1)+tr((μ1−μ2)TΣ2−1(μ1−μ2)))=21(log∣Σ1∣∣Σ2∣−n+tr(Σ2−1Σ1)+(μ1−μ2)TΣ2−1(μ1−μ2)) From