Enhancing Shortcut Models with Cumulative Self-Consistency Loss for One-Step Diffusion

Abstract

Although iterative denoising (i.e., diffusion/flow) methods offer strong generative performance, they suffer from low generation efficiency, requiring hundreds of steps of network forward passes to simulate a single sample. Mitigating this requires taking larger step-sizes during simulation, thereby allowing one- or few-step generation. Recently proposed shortcut model learns larger step-sizes by enforcing alignment between its direction and the path defined by a base many-step flow-matching model through a self-consistency loss. However, its generation quality is significantly lower than the base model. In this paper, we interpret the self-consistency loss through the lens of optimal control by formulating the few-step generation as a controlled base generative process. This perspective enables us to develop a general cumulative self-consistency loss that penalizes the misalignment at both the current step and future steps along the trajectory. This encourages the model to take larger step-sizes that not only align with the base model at the current time step but also guide subsequent steps towards high-quality generation. Furthermore, we draw a connection between our approach and reinforcement learning, potentially opening the door to a new set of approaches for few-step generation. Extensive experiments show that we significantly improve one- and few-step generation quality under the same training budget.

Publication
In International Conference in Learning and Representation (ICLR) 2026
Sandesh Ghimire
Staff Researcher and Engineer

My research interests include machine learning, computer vision and medical imaging.