one-to-many mapping.
Giveing input variables x, and output variables y, assume latent variabes z can generate y from x, the probability graphical graphs like this x→z→y.
if we use qϕ(z∣x) to approximate pθ(z∣x)
We want to maximize the log likelihood,
logpθ(y∣x)=log∫p(y,z∣x)dz=log∫p(y∣z,x)p(z∣x)dz=log∫q(z∣x)q(z∣x)p(y∣z,x)p(z∣x)dz≥∫q(z∣x)logq(z∣x)p(y∣z,x)p(z∣x)dz=Eq(z∣x)[logp(y∣z,x)−logp(z∣x)q(z∣x)]=Eq(z∣x)[logp(y∣z,x)]−KL(q(z∣x)∣∣p(z∣x)) Because y is conditioned on z, thus p(y∣z,x)=p(y∣z).
logp(y∣x)=Eq(z∣x)[logp(y∣z)]−KL(q(z∣x)∣∣p(z∣x))