Given two probability measures and on measurable spaces and , respectively, the
Optimal Transport Problem seeks to find a transport
plan that minimizes the cost of
transporting mass from to . The cost function is typically
defined as a function , which quantifies the cost of moving a unit mass
from point to point .
The problem can be formulated as follows (Monge’s formulation): where is a
transport map that pushes
forward to , i.e., .
Such a transport map is called
an optimal transport map if it minimizes the cost
function, although it is not always guaranteed to exist since the
pushforward is not necessarily possible.
More generally, we can express the problem as (Kantorovich’s
formulation): where is
the set of all joint distributions of and .
Dual Formulation
The primal problem is a convex optimization problem since it is
linear w.r.t. and the feasible
set is convex.
The constraint is that must
be a joint distribution of and
, which can be expressed as:
where and are the spaces of continuous
functions on and , respectively.
This form allows us to derive the dual formulation of the optimal
transport problem, whose Lagrangian is given by:
Thus the dual problem can be expressed as:
For any point , if , then
the value of
can be made arbitrarily to
by increasing .
Therefore, the optimal solution must satisfy:
And the dual problem now becomes:
Or more compactly:
The strong duality holds under mild conditions, such and are compact and is continuous. More generally, can be a lower semicontinuous function
and exist a feasible with
finite cost.
Wasserstein Distance
When is a
metric, the optimal transport problem induces the Wasserstein
distance: This distance is a metric on the space of probability measures
with finite -th moment, and it
satisfies the properties of a metric: 1.
Non-negativity: for all . 2. Identity of indiscernibles: if and only if . 3. Symmetry:
. 4.
Triangle inequality: for all .