Pré-Publication, Document De Travail Année : 2025

Sobolev Diffusion Policy

Résumé

We present Sobolev diffusion policy (SDP), a novel framework to combine the strengths of policy learning and trajectory optimization effectively. On the one hand, we build upon diffusion policy, an expressive imitation learning method based on diffusion probabilistic generative models. On the other hand, we use gradient-based trajectory optimization solvers to generate locally optimal trajectories and leverage their associated feedback gains to enrich Sobolev training with first-order information. Combining both, we introduce a first-order loss for diffusion-based policies. The framework alternates between collecting trajectories using a solver warmstarted by the policy and training. Through comprehensive experiments, we demonstrate how the Sobolev component significantly reduces the number of trajectories required for the policy to converge globally. First-order information both avoids overfitting, despite the use of very few samples, and mitigates the compounding error issue of imitation-based policies, even when predicting torques for tasks requiring high-frequency control. We benchmark the benefits of SDP on various robotics tasks of increasing complexity. In particular, SDP shows to be stable over extended horizons, with fewer diffusion steps, shrinking the overall rollout time compared to vanilla diffusion models. And when used to compute initial guesses for trajectory optimization, it reduces the solving time by a factor of 2 to 20.

Fichier principal
Vignette du fichier
Sobolev Diffusion Policy.pdf (1.17 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
Licence

Dates et versions

hal-05179357 , version 1 (23-07-2025)

Licence

Identifiants

  • HAL Id : hal-05179357 , version 1

Citer

Théotime Le Hellard, Franki Nguimatsia Tiofack, Quentin Le Lidec, Justin Carpentier. Sobolev Diffusion Policy. 2025. ⟨hal-05179357⟩
392 Consultations
695 Téléchargements

Partager

  • More