PID-controlled Langevin Dynamics for Faster Sampling of Generative Models

💐 Paper accepted at NeurIPS 2025

Hongyi Chen1,3*, Jianhai Shu2*, Jingtao Ding2†, Yong Li2, Xiao-Ping Zhang1†

1 Shenzhen Key Laboratory of Ubiquitous Data Enabling Laboratory, Shenzhen International Graduate School, Tsinghua University,
2 Department of Electronic Engineering, Tsinghua University,
3 Pengcheng Laboratory.

arXivPaper NeurIPSNeurIPS Code
PID-controlled systems eliminate errors at steady-state through the integral term while dampening overshoot via the derivative term.

🔍 Highlights

⚙️ Algorithm Workflow

The PID-controlled Langevin dynamics update is given by:

\[ x_{t+1}=x_t+\epsilon\Big(k_p\nabla_{x}U_\theta(x_t)+\frac{k_i}{t}\sum_{s=0}^{t}\nabla_{x}U_\theta(x_s)+k_d(\nabla_{x}U_\theta(x_t)-\nabla_xU_\theta(x_{t-1}))\Big)+\sqrt{2\epsilon}\,\xi_t, \]
where $k_p,k_i,k_d$ are the coefficients of the proportional, integral, and derivative gains, $U_{\theta}(\cdot)$ is the energy function, and $\xi_t\sim\mathcal{N}(\mathbf{0},\mathbf{I})$.

PIDLD Algorithm Flowchart
1Require: Score function \( \nabla_x U_\theta(x)=\nabla_x\log p_\theta(x)=\nabla_x(-f_\theta(x)) \); number of steps \(T\); step size \(\epsilon\); control parameters \(k_p,k_i,k_d\); decay rate \(\gamma < 1\); initial point \(x_0\).
2Initialize integral term \(I_0=0\).
3Compute initial score \(s_0=\nabla_x U_\theta(x_0)\).
4For \(t=0\) to \(T-1\) do
5\(s_t=\nabla_x U_\theta(x_t)\).
6\(P_t = s_t\) (Proportional term)
7\(I_t = \frac{1}{t+1}\big(I_{t-1}\cdot t + s_t\big)\) (Integral term)
8\(D_t = s_t - s_{t-1}\) (Derivative term)
9Compute control signal: \(u_t = k_p P_t + k_i I_t + k_d D_t\).
10Update state:
\[ x_{t+1} = x_t + \epsilon \cdot u_t + \sqrt{2\epsilon}\,\xi_t,\quad \xi_t \sim \mathcal{N}(0,I) \]
11Decay integral gain: \(k_i = k_i \cdot \gamma\).
12End for
13Return: \(\hat{x} = x_T\).

📊 Experiments

In the experiments, we evaluated our method against standard Langevin sampling on mainstream generative models (SGM, EBM). We first used toy experiments on 2-d point datasets to validate the effectiveness of integral and derivative terms, and then used benchmark datasets in classical image generation tasks and reasoning tasks to further demonstrate the superiority of our method.

Toy Experiments

In the toy experiments, we sample points from a 2-d Gaussian mixture distribution using PID-controlled Langevin dynamics with different control parameters, and compare the sampling speed and quality.

Adding $I$ term, $D$ term, or both $I$ and $D$ terms, would significantly accelerate sampling process and produce lower KL divergence.
Increasing derivative term coefficient $k_d$ produces lower KL divergence.
Increasing integral term coefficient $k_i$ produces lower KL divergence, but excessive $k_i$ may lead to rebounding phenomenon.
Decaying integral term coefficient $k_i$ ensures smoother transition from early exploration to later convergence, alleviating the rebounding phenomenon.
Effect of $k_i$ and $k_d$ on bias. Adding $I$ term (left) excels in mitigating bias.
Effect of $k_i$ and $k_d$ on oscillation. Adding $D$ term (right) excels in reducing oscillation.

Computer Vision Tasks

For the computer vision task, we use score-based generative model (NCSNv2) and energy-based model (IGEBM) to sample images from CIFAR10 and CelebA datasets. Our proposed PIDLD method outperforms all the baselines across different settings.

CIFAR10 samples with 25 sampling steps using score-based generative model.
CelebA samples with 25 sampling steps using score-based generative model.
FID comparison for SGM and EBM models across different NFEs on CIFAR10 and CelebA dataset.
Dataset SGM EBM
CIFAR10 NFEs 25×1 100×1 232×1 232×3 232×5 10 20 30 40
Vanilla 46.8 17.2 16.0 12.8 12.5 135.8 58.1 40.3 35.3
Ours 18.3 12.1 11.7 11.6 11.4 99.0 46.1 32.8 33.2
CelebA NFEs 50×1 250×1 500×1 500×3 500×5 15 20 25 30
Vanilla 25.0 13.6 14.0 11.3 9.5 109.1 63.5 41.3 35.4
Ours 8.0 5.7 5.9 5.9 5.6 58.0 38.9 32.2 30.0
Ablation study of PIDLD under image sampling tasks. P+I+D denotes the complete model, while P+I, P+D, and P represent models with the derivative term, integral term, or both terms removed, respectively. Figures on the bar indicate performance improvements of each ablation compared to P.

Reasoning Tasks

For the reasoning task, we use energy-based model (IRED) to sample solutions for Sudoku and connectivity tasks. Our proposed PIDLD method outperforms all the baselines across different settings.

Accuracy comparison for baseline and our models across Sudoku and Connectivity tasks under different NFEs.
Task EBM Accuracy (%)
Sudoku NFEs 5 10 15 30 40 80
Vanilla 45.99 51.00 50.93 50.77 53.63 55.02
MILD 49.75 54.82 53.55 55.25 56.56 56.64
Ours 50.54 55.48 55.55 55.94 57.02 56.64
Connectivity NFEs 1 2 3 4 5 10
Vanilla 86.16 87.22 87.22 87.48 87.38 87.49
MILD 86.16 88.54 89.21 89.75 90.15 90.33
Ours 86.16 91.32 92.31 92.82 92.95 93.28
Ablation study of PIDLD under reasoning tasks. P+I+D denotes the complete model, while P+I, P+D, and P represent models with the derivative term, integral term, or both terms removed, respectively. Percentages indicate performance improvements of each ablation compared to P.

📚 Citation

If you find the idea useful for your research, please consider citing

@inproceedings{chen2025pidcontrolled, title={{PID}-controlled Langevin Dynamics for Faster Sampling on Generative Models}, author={Hongyi Chen and Jianhai Shu and Jingtao Ding and Yong Li and Xiao-Ping Zhang}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, url={https://openreview.net/forum?id=y9LHDCKeeN} }