Sobolev Diffusion Policy

We present Sobolev diffusion policy (SDP), a novel framework to combine the strengths of policy learning and trajectory optimization effectively. On the one hand, we build upon diffusion policy, an expressive imitation learning method based on diffusion probabilistic generative models. On the other hand, we use gradient-based trajectory optimization solvers to generate locally optimal trajectories and leverage their associated feedback gains to enrich Sobolev training with first-order information. Combining both, we introduce a first-order loss for diffusion-based policies. The framework alternates between collecting trajectories using a solver warmstarted by the policy and training. Through comprehensive experiments, we demonstrate how the Sobolev component significantly reduces the number of trajectories required for the policy to converge globally. First-order information both avoids overfitting, despite the use of very few samples, and mitigates the compounding error issue of imitation-based policies, even when predicting torques for tasks requiring high-frequency control. We benchmark the benefits of SDP on various robotics tasks of increasing complexity. In particular, SDP shows to be stable over extended horizons, with fewer diffusion steps, shrinking the overall rollout time compared to vanilla diffusion models. And when used to compute initial guesses for trajectory optimization, it reduces the solving time by a factor of 2 to 20.

Domaines

Fichier principal

Sobolev Diffusion Policy.pdf (1.17 Mo)

Origine	Fichiers produits par l'(les) auteur(s)
Licence	CC BY 4.0 - Attribution

Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-05179357

Soumis le : mercredi 23 juillet 2025-13:16:47

Dernière modification le : mercredi 7 janvier 2026-15:47:24

Archivage à long terme le : vendredi 24 octobre 2025-19:10:20

Dates et versions

hal-05179357 , version 1 (23-07-2025)

Licence

CC BY 4.0 - Attribution

Identifiants

HAL Id : hal-05179357 , version 1

Citer

Théotime Le Hellard, Franki Nguimatsia Tiofack, Quentin Le Lidec, Justin Carpentier. Sobolev Diffusion Policy. 2025. ⟨hal-05179357⟩

Exporter

Collections

392 Consultations

695 Téléchargements