Flow matching, Score-based Generative Model, Schrödinger Bridge and Optimal Transport

Reference

Denoising Diffusion Probabilistic Models

Diffusion Schrödinger Bridge Matching

Simplified Diffusion Schrödinger Bridge

Speed-accuracy relations for diffusion models: Wisdom from nonequilibrium thermodynamics and optimal transport

Adversarial Schrödinger Bridge Matching

Introduction

Generative models resonate with two deep principles: the thermodynamics of entropy and the mathematics of optimal transport.

Symbols and Preliminaries

Probability Space

  • is a probability space.
  • is the measurable space.
  • A random variable is a measurable map , i.e.
  • The distribution (pushforward measure) of is
  • If is absolutely continuous w.r.t. the Lebesgue measure , then there exists a density function such that
  • Notation: we write ; if admits a density, we often abbreviate .

Stochastic Process

  • A stochastic process is a time-parametrized family of random variables
  • The marginal distribution at time is

Standard Brownian Motion (Wiener Process)

A -dimensional standard Brownian motion with respect to a filtration satisfies: 1. almost surely. 2. For , the increment . 3. The increments are independent of (independent increments). 4. The paths are almost surely continuous.

Itô Integral and Itô’s Lemma

Quadratic Variation.
For a continuous semimartingale , the quadratic variation is defined as where is a partition of and the limit is in probability.

For one-dimensional Brownian motion , In higher dimensions, for , where denotes the quadratic covariation.


Itô Integral.
Let be a Brownian motion and let be an adapted process with Then the Itô integral is defined as It satisfies the Itô isometry:


Itô’s Lemma (Itô formula, one-dimensional).
Suppose satisfies the SDE and let be (twice continuously differentiable in and once in ). Then


Multidimensional Itô’s Lemma.
If satisfies and is , then


Remark.
- The term (or in higher dimensions) arises from the quadratic variation of Brownian motion, i.e. .
- This correction term is what distinguishes Itô calculus from classical calculus and is fundamental in stochastic analysis.

Other

The Kullback–Leibler (KL) divergence between two probability measures are: where the is the Radon-Nikodym derivative.

For KL divergence between Gaussian distribution, we have

Diffusion Models

DDPM

How to sample from ?

We define a markovian stochastic process as the forward process, such that: where .

We call as a variance schedule.

By induction, we know ,

So we can sample directly.

Assume , we can regard

Now given , we want sample from .

Calculate the posterior probability density: Which is impractical to calculate since we has to integrate over .

Instead we consider So is still a Gaussian distribution. Recall

Note that .

.

.

Now, to train the , fix the variance , let model predict the .

we minimize the KL divergence: Or let the model predict the noise , .

And minimize This reparametrization setting generally yields better performance.

Score-based Generative Model (SGM)

We view noise injection as an SDE and learn the score of the noisy marginal . Sampling is done by integrating a reverse-time SDE (or its ODE counterpart), where the score guides the dynamics back to data.

Forward SDE

Let solve: with , diffusion scale . Denote .


DDPM as VP-SDE

Set the Variance-Preserving (VP) SDE with integratable and .

This linear SDE has the explicit solution where .

Hence the marginal conditional matches DDPM.

Claim. The DDPM forward chain with is an Euler-Maruyama discretization of the VP-SDE with piecewise-constant and share the same marginals .

Variance-Exploding SDE (VE-SDE)

Alternatively, set with and . Then So , i.e. Gaussian smoothing.

Reverse-time SDE and Probability Flow ODE

The reverse SDE (Anderson, 1982) is where is a backward Wiener process. Replacing with a learned gives a generative sampler.

The equivalent deterministic probability flow ODE is This is the continuous analogue of DDIM.

Noise Prediction vs. Score Prediction

For VP, Thus noise prediction and score prediction are equivalent:

An alternative data predictor:

Denoising Score Matching (DSM)

Training objective:

For VP, choosing reduces DSM to the DDPM noise-prediction MSE.

Derivation and Intuition

TODO: Weak form.

Flow Matching Models

See Flow Matching

Schrödinger Bridge Models

Relation with Entropy-Regularized Optimal Transport

IPF (DSB, S-DSB)

IMF

D-IMF (ADSB)

CDSB

TODO.


Flow matching, Score-based Generative Model, Schrödinger Bridge and Optimal Transport
https://notdesigned.github.io/2025/09/18/Flow-matching-Score-based-Generative-Model-Schrodinger-Bridge-and-Optimal-Transport/
Author
Luocheng Liang
Posted on
September 18, 2025
Licensed under