Diffusion Probabilistic Models:Theory and ApplicationsFan BaoTsinghua University1By Fan Bao, Tsinghua UniversityDiffusion Probabilistic Models (DPMs)Ho et al. Denoising diffusion probabilistic models (DDPM), Neurips 2020.Song et al. Score-based generative modeling through stochastic differential equations, ICLR 2021.Bao et al. Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models, ICLR 2022.Bao et al. Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models, ICML 2022.2By Fan Bao, Tsinghua University𝑥0𝑥1𝑥𝑁…Transition of diffusion: 𝑞 𝑥𝑛 𝑥𝑛−1 = 𝑁( 𝛼𝑛𝑥𝑛−1, 𝛽𝑛𝐼)𝛼𝑛 = 1 − 𝛽𝑛≈ 𝑁(0, 𝐼)𝑥2Diffusion process: 𝑞 𝑥0, … , 𝑥𝑁 = 𝑞 𝑥0 𝑞 𝑥1 𝑥0 … 𝑞(𝑥𝑁|𝑥𝑁−1)• Diffusion process gradually injects noise to data• Described by a Markov chain: 𝑞 𝑥0, … , 𝑥𝑁 = 𝑞 𝑥0 𝑞 𝑥1 𝑥0 … 𝑞(𝑥𝑁|𝑥𝑁−1)Demo Images from Song et al. Score-based generative modeling through stochastic differential equations, ICLR 2021.3By Fan Bao, Tsinghua University𝑥0𝑥1𝑥𝑁…Transition of denoising: 𝑞 𝑥𝑛−1 𝑥𝑛 =?≈ 𝑁(0, 𝐼)𝑥2= 𝑞 𝑥0|𝑥1 … 𝑞 𝑥𝑁−1 𝑥𝑁 𝑞(𝑥𝑁)• Diffusion process in the reverse direction ⇔ denoising process• Reverse factorization: 𝑞 𝑥0, … , 𝑥𝑁 = 𝑞 𝑥0|𝑥1 … 𝑞 𝑥𝑁−1 𝑥𝑁 𝑞(𝑥𝑁)4By Fan Bao, Tsinghua UniversityDiffusion process: 𝑞 𝑥0, … , 𝑥𝑁 = 𝑞 𝑥0 𝑞 𝑥1 𝑥0 … 𝑞(𝑥𝑁|𝑥𝑁−1)𝑥0𝑥1𝑥𝑁…Transition of denoising: 𝑞 𝑥𝑛−1 𝑥𝑛 =?≈ 𝑁(0, 𝐼)𝑥2The model: 𝑝 𝑥0, … , 𝑥𝑁 = 𝑝 𝑥0|𝑥1 … 𝑝 𝑥𝑁−1 𝑥𝑁 𝑝(𝑥𝑁)Model transition: 𝑝 𝑥𝑛−1 𝑥𝑛 = 𝑁(𝜇𝑛 𝑥𝑛 , Σ𝑛(𝑥𝑛))approximate• Approximate diffusion process in the reverse direction5By Fan Bao, Tsinghua UniversityDiffusion process: 𝑞 𝑥0, … , 𝑥𝑁 = 𝑞 𝑥0 𝑞 𝑥1 𝑥0 … 𝑞(𝑥𝑁|𝑥𝑁−1)= 𝑞 𝑥0|𝑥1 … 𝑞 𝑥𝑁−1 𝑥𝑁 𝑞(𝑥𝑁)• We hope 𝑞 𝑥0, … , 𝑥𝑁 ≈ 𝑝 𝑥0, … , 𝑥𝑁𝑝 𝑥𝑛−1 𝑥𝑛 = 𝑁(𝜇𝑛 𝑥𝑛 , Σ𝑛(𝑥𝑛))• Achieved by minimizing their KL divergence (i.e., maximizing the ELBO)min𝜇𝑛,Σ𝑛 𝐾𝐿(𝑞(𝑥0:𝑁)||𝑝 𝑥0:𝑁 ) ⇔ max𝜇𝑛,Σ𝑛 E𝑞 log𝑝(𝑥0:𝑁)𝑞(𝑥1:𝑁|𝑥0)min KLmax ELBO6By Fan Bao, Tsinghua UniversityWhat is the optimal solution?7By Fan Bao, Tsinghua UniversityTheorem (The optimal solution under scalar variance, i.e., Σ𝑛 𝑥𝑛 = 𝜎𝑛2𝐼)The optimal solution to min𝜇𝑛 ⋅ ,𝜎𝑛2 𝐾𝐿(𝑞(𝑥0:𝑁)||𝑝 𝑥0:𝑁 ) is𝜇𝑛∗ 𝑥𝑛 =1𝛼𝑛 𝑥𝑛 + 𝛽𝑛∇ log 𝑞𝑛(𝑥𝑛) ,𝜎𝑛∗2 =𝛽𝑛𝛼𝑛 (1 − 𝛽𝑛E𝑞𝑛(𝑥𝑛)∇ log 𝑞𝑛 𝑥𝑛2𝑑).3 key steps in proof:➢ Moment matching➢ Law of total variance➢ Score representation of moments of 𝑞(𝑥0|𝑥𝑛)Bao et al. Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models, ICLR 2022.Noise prediction form:∇ log 𝑞𝑛(𝑥𝑛) = −1ഥ𝛽𝑛 E𝑞 𝑥0 𝑥𝑛 [𝜖𝑛]Estimated by predicting noiseParameterization of 𝝁𝒏 ⋅ : 𝜇𝑛 𝑥𝑛 =1𝛼𝑛 𝑥𝑛 − 𝛽𝑛1ഥ𝛽𝑛 Ƹ𝜖𝑛(𝑥𝑛)8By Fan Bao, Tsinghua UniversityTheorem (...