Skip to main content

head2head

one-to-many mapping. Giveing input variables x\bold{x}, and output variables y\boldsymbol{y}, assume latent variabes z\bold{z} can generate y\bold{y} from x\bold{x}, the probability graphical graphs like this xzy\bold{x} \rightarrow \bold{z} \rightarrow \bold{y}.

if we use qϕ(zx)q_{\boldsymbol{\phi}}(\bold{z}|\bold{x}) to approximate pθ(zx)p_{\boldsymbol{\theta}}(\boldsymbol{\bold{z}|\bold{x}})

We want to maximize the log likelihood,

logpθ(yx)=logp(y,zx)dz=logp(yz,x)p(zx)dz=logq(zx)p(yz,x)p(zx)q(zx)dzq(zx)logp(yz,x)p(zx)q(zx)dz=Eq(zx)[logp(yz,x)logq(zx)p(zx)]=Eq(zx)[logp(yz,x)]KL(q(zx)p(zx))\begin{aligned} \log p_{\boldsymbol{\theta}}(\bold{y}|\bold{x}) &= \log \int p(\bold{y}, \bold{z}|\bold{x}) d\bold{z} \\ &= \log \int p(\bold{y}| \bold{z}, \bold{x}) p(\bold{z}| \bold{x}) d \bold{z} \\ &= \log \int q(\bold{z}|\bold{x}) \frac{p(\bold{y}| \bold{z}, \bold{x}) p(\bold{z}| \bold{x})}{q(\bold{z}|\bold{x})} d \bold{z} \\ &\geq \int q(\bold{z}|\bold{x}) \log \frac{p(\bold{y}| \bold{z}, \bold{x}) p(\bold{z}| \bold{x})}{q(\bold{z}|\bold{x})} d \bold{z}\\ &= \mathbb{E}_{ q(\bold{z}|\bold{x})}\Bigg[\log p(\bold{y}| \bold{z}, \bold{x}) - \log \frac{q(\bold{z}|\bold{x})}{p(\bold{z}| \bold{x})}\Bigg] \\ &= \mathbb{E}_{ q(\bold{z}|\bold{x})} [\log p(\bold{y}| \bold{z}, \bold{x})] - KL(q(\bold{z}|\bold{x}) || p(\bold{z}| \bold{x})) \end{aligned}

Because y\bold{y} is conditioned on z\bold{z}, thus p(yz,x)=p(yz)p(\bold{y}| \bold{z}, \bold{x})=p(\bold{y}| \bold{z}).

logp(yx)=Eq(zx)[logp(yz)]KL(q(zx)p(zx))\log p(\bold{y}|\bold{x}) = \mathbb{E}_{ q(\bold{z}|\bold{x})} [\log p(\bold{y}| \bold{z})] - KL(q(\bold{z}|\bold{x}) || p(\bold{z}| \bold{x}))