Flow Matching/en

    From Marovi AI
    Other languages:
    Article
    Topic area generative-models
    Prerequisites Diffusion Models, Optimal Transport, Neural Ordinary Differential Equations


    Overview

    Flow Matching is a simulation-free training framework for Continuous Normalizing Flows in which a neural network is regressed directly onto a target time-dependent vector field that transports a simple prior distribution into the data distribution. Introduced by Lipman, Chen, Ben-Hamu, Nickel, and Le in 2022, it generalizes and unifies several earlier approaches, including Score Matching and Rectified Flow, and it has become a leading paradigm for image, video, audio, and molecular generative modeling. Compared with classical maximum likelihood training of flows, which requires expensive trajectory simulation, and with Denoising Diffusion Probabilistic Models, which require a stochastic differential equation formulation, Flow Matching offers a deterministic and conceptually simple alternative: pick a probability path between noise and data, derive the vector field that generates it, and learn that vector field with a mean-squared-error loss.

    Intuition

    A continuous normalizing flow describes a curve in distribution space, transporting samples from an initial probability density $ p_0 $ at time $ t=0 $ to a target density $ p_1 $ at time $ t=1 $ via an ordinary differential equation (ODE). For each time $ t \in [0,1] $, a Vector Field $ u_t(x) $ specifies the instantaneous velocity at position $ x $; integrating this vector field along time pushes samples from the prior to the data distribution.

    The central challenge is that, in general, we cannot directly observe a vector field that transports a tractable prior into the empirical data distribution. Flow Matching sidesteps this by constructing the path piecewise, conditioned on individual data points. For a fixed data sample $ x_1 $, it is easy to write down a smooth path from a noise sample to $ x_1 $ and read off the velocity that generates it. Averaging these per-sample velocities under the joint sampling of noise and data yields the unconditional velocity field that drives the entire population from prior to data. The remarkable insight is that regressing a neural network onto the conditional velocities recovers the unconditional one in expectation, eliminating the need to ever evaluate the marginal density.

    Probability Paths and Vector Fields

    A probability path is a time-indexed family of densities $ \{p_t\}_{t \in [0,1]} $ with $ p_0 $ a chosen prior (typically a standard Gaussian Distribution) and $ p_1 $ the data distribution. A vector field $ u_t $ generates the path when the Continuity Equation holds:

    $ {\displaystyle \frac{\partial p_t(x)}{\partial t} + \nabla \cdot (p_t(x)\, u_t(x)) = 0.} $

    Equivalently, samples drawn from $ p_0 $ and evolved by the ODE $ dx/dt = u_t(x) $ are distributed according to $ p_t $ at every intermediate time. Multiple vector fields can generate the same path, so additional structure (such as straightness or optimality with respect to a transport cost) is needed to single out a preferred one.

    Conditional Flow Matching

    Direct regression onto $ u_t $ is infeasible because $ u_t $ depends on the unknown marginal density. The Conditional Flow Matching (CFM) objective resolves this by conditioning on a target sample $ x_1 $. For a chosen conditional path $ p_t(x \mid x_1) $ (for example, a Gaussian whose mean linearly interpolates from $ 0 $ at $ t=0 $ to $ x_1 $ at $ t=1 $) and its generating conditional vector field $ u_t(x \mid x_1) $, the loss is

    $ {\displaystyle \mathcal{L}_{\mathrm{CFM}}(\theta) = \mathbb{E}_{t,\, x_1,\, x \sim p_t(\cdot \mid x_1)}\!\left[\, \lVert v_\theta(t, x) - u_t(x \mid x_1) \rVert^2 \right],} $

    where $ v_\theta $ is the learned vector field, $ t $ is sampled uniformly on $ [0,1] $, and $ x_1 $ is sampled from the data. Lipman et al. proved that this objective has the same gradient with respect to $ \theta $ as regression onto the marginal $ u_t $, even though the marginal is intractable. The crucial design choice is the conditional path; popular choices include variance-preserving Gaussian paths, variance-exploding paths, and the Optimal Transport-displacement linear interpolant $ x_t = (1-t)\, x_0 + t\, x_1 $, which yields the strikingly simple regression target $ u_t(x \mid x_0, x_1) = x_1 - x_0 $.

    Training and Inference

    Training requires only sampling a time, a noise vector, and a data point; computing the conditional velocity in closed form; and minimizing the squared error. There is no need to simulate the ODE during training, no auxiliary score network, and no variational lower bound to track. Mini-batches consist of independent triples $ (t, x_0, x_1) $ with $ x_0 $ drawn from the prior and $ x_1 $ drawn from the dataset.

    At inference, samples are generated by integrating the learned ODE $ dx/dt = v_\theta(t, x) $ from $ t=0 $ to $ t=1 $ with an initial condition drawn from the prior. Any black-box ODE solver may be used; common choices include adaptive Runge-Kutta methods and fixed-step Euler or Heun's Method integrators. Because Flow Matching trained with linear (optimal-transport) interpolants tends to produce nearly straight trajectories, sample generation often requires only a handful of solver steps, in contrast to diffusion models that may need tens to hundreds.

    Variants

    Several variants of Flow Matching adjust either the conditional path, the coupling between $ x_0 $ and $ x_1 $, or the training procedure:

    • Rectified Flow (Liu et al., 2022) trains with the same linear interpolant as OT-CFM and then iteratively retrains the model on its own straightened trajectories, producing increasingly straight flows that admit one- or few-step sampling.
    • Stochastic Interpolants (Albergo and Vanden-Eijnden, 2023) generalize the framework to allow stochastic dynamics, unifying flow- and diffusion-based generative modeling under a single interpolant formalism.
    • Optimal Transport Conditional Flow Matching (Tong et al., 2023) replaces the independent coupling of $ x_0 $ and $ x_1 $ with a mini-batch Optimal Transport coupling, sharpening the alignment between noise and data and reducing path curvature.
    • Multisample Flow Matching (Pooladian et al., 2023) develops a related batch-coupling perspective and provides theoretical analysis of the resulting estimators.
    • Riemannian Flow Matching extends the construction to data on manifolds, replacing Euclidean interpolation with geodesic interpolation and using manifold-aware ODE integrators.
    • Discrete Flow Matching adapts the framework to categorical data via continuous-time Markov chains in place of ODEs.

    Comparison with Diffusion Models

    Diffusion models and Flow Matching are closely related: both learn a time-dependent transformation from noise to data, and both can be cast as regression problems against a target field. The differences are in the choice of process and parameterization. Diffusion models are formulated through stochastic forward and reverse processes and learn the Score Function $ \nabla \log p_t(x) $; their training corresponds to a specific variance-preserving Gaussian path within the Flow Matching family. Flow Matching is purely deterministic at the ODE level, treats the path as a free design choice, and parameterizes the velocity rather than the score. Empirically, OT-style Flow Matching produces straighter trajectories and enables faster sampling, while diffusion's stochasticity can improve sample diversity in certain regimes. Score-based diffusion samplers can be reinterpreted as ODE integrators of a probability-flow ODE, exposing a precise mathematical bridge between the two families.

    Limitations

    Flow Matching inherits the standard difficulties of Continuous Normalizing Flows: ODE integration at inference can be costly when trajectories are curved or stiff, exact log-likelihood computation requires the Hutchinson Trace Estimator or expensive Jacobian evaluations, and high-dimensional manifolds may require careful prior choice to avoid wasted modeling capacity. The framework also assumes that a tractable conditional path is available, which is straightforward in Euclidean space but more delicate on manifolds, graphs, or discrete spaces. Conditioning, Classifier-Free Guidance, and likelihood-free evaluation transfer from diffusion to Flow Matching, but careful adaptation is sometimes needed because the underlying object is a vector field rather than a score.

    Applications

    Flow Matching has been applied to high-resolution image generation, including text-to-image models that scale OT-CFM to billions of parameters; speech and audio synthesis, where straight trajectories enable real-time generation; protein and molecular structure generation on $ \mathrm{SE}(3) $ manifolds; and trajectory generation in robotics. Many recent large-scale generative systems adopt rectified-flow or OT-CFM training because of its simplicity and few-step inference profile.

    References

    Cite error: <ref> tag with name "lipman2022" defined in <references> has group attribute "" which does not appear in prior text.
    Cite error: <ref> tag with name "liu2022" defined in <references> has group attribute "" which does not appear in prior text.
    Cite error: <ref> tag with name "albergo2023" defined in <references> has group attribute "" which does not appear in prior text.
    Cite error: <ref> tag with name "tong2023" defined in <references> has group attribute "" which does not appear in prior text.
    Cite error: <ref> tag with name "pooladian2023" defined in <references> has group attribute "" which does not appear in prior text.