PID-controlled Langevin Dynamics for Faster Sampling of Generative Models
💐 Paper accepted at NeurIPS 2025
Hongyi Chen1,3*,
Jianhai Shu2*,
Jingtao Ding2†,
Yong Li2,
Xiao-Ping Zhang1†
1 Shenzhen Key Laboratory of Ubiquitous Data Enabling Laboratory, Shenzhen International Graduate School, Tsinghua University,
2 Department of Electronic Engineering, Tsinghua University,
3 Pengcheng Laboratory.
🔍 Highlights
- Control-theoretic insight: Reinterprets Langevin dynamics as a feedback control system, where energy gradients act as feedback signals.
- PID-enhanced sampling: Integrates Proportional, Integral, and Derivative control terms into Langevin updates:
- P-term: basic gradient guidance;
- I-term: accumulates historical gradients for momentum-like acceleration;
- D-term: anticipates gradient trends for adaptive stabilization.
- Plug-and-play compatibility:
- Requires no retraining or prior information;
- Integrates with any Langevin-based sampler directly (EBM, SGM, diffusion, etc.).
- Significant speedup:
- Achieves up to 10× faster sampling while maintaining or improving generation quality across image and reasoning tasks.
⚙️ Algorithm Workflow
The PID-controlled Langevin dynamics update is given by:
\[
x_{t+1}=x_t+\epsilon\Big(k_p\nabla_{x}U_\theta(x_t)+\frac{k_i}{t}\sum_{s=0}^{t}\nabla_{x}U_\theta(x_s)+k_d(\nabla_{x}U_\theta(x_t)-\nabla_xU_\theta(x_{t-1}))\Big)+\sqrt{2\epsilon}\,\xi_t,
\]
where $k_p,k_i,k_d$ are the coefficients of the proportional, integral, and derivative gains, $U_{\theta}(\cdot)$ is the energy function, and $\xi_t\sim\mathcal{N}(\mathbf{0},\mathbf{I})$.
PIDLD Algorithm Flowchart
1Require: Score function \( \nabla_x U_\theta(x)=\nabla_x\log p_\theta(x)=\nabla_x(-f_\theta(x)) \); number of steps \(T\); step size \(\epsilon\); control parameters \(k_p,k_i,k_d\); decay rate \(\gamma < 1\); initial point \(x_0\).
2Initialize integral term \(I_0=0\).
3Compute initial score \(s_0=\nabla_x U_\theta(x_0)\).
4For \(t=0\) to \(T-1\) do
5\(s_t=\nabla_x U_\theta(x_t)\).
6\(P_t = s_t\) (Proportional term)
7\(I_t = \frac{1}{t+1}\big(I_{t-1}\cdot t + s_t\big)\) (Integral term)
8\(D_t = s_t - s_{t-1}\) (Derivative term)
9Compute control signal: \(u_t = k_p P_t + k_i I_t + k_d D_t\).
10Update state:
\[
x_{t+1} = x_t + \epsilon \cdot u_t + \sqrt{2\epsilon}\,\xi_t,\quad \xi_t \sim \mathcal{N}(0,I)
\]
11Decay integral gain: \(k_i = k_i \cdot \gamma\).
12End for
13Return: \(\hat{x} = x_T\).
📊 Experiments
In the experiments, we evaluated our method against standard Langevin sampling on mainstream generative models (SGM, EBM). We first used toy experiments on 2-d point datasets to validate the effectiveness of integral and derivative terms, and then used benchmark datasets in classical image generation tasks and reasoning tasks to further demonstrate the superiority of our method.
Toy Experiments
In the toy experiments, we sample points from a 2-d Gaussian mixture distribution using PID-controlled Langevin dynamics with different control parameters, and compare the sampling speed and quality.
Computer Vision Tasks
For the computer vision task, we use score-based generative model (NCSNv2) and energy-based model (IGEBM) to sample images from CIFAR10 and CelebA datasets. Our proposed PIDLD method outperforms all the baselines across different settings.
FID comparison for SGM and EBM models across different NFEs on CIFAR10 and CelebA dataset.
| Dataset |
|
SGM |
EBM |
| CIFAR10 |
NFEs |
25×1 |
100×1 |
232×1 |
232×3 |
232×5 |
10 |
20 |
30 |
40 |
| Vanilla |
46.8 |
17.2 |
16.0 |
12.8 |
12.5 |
135.8 |
58.1 |
40.3 |
35.3 |
| Ours |
18.3 |
12.1 |
11.7 |
11.6 |
11.4 |
99.0 |
46.1 |
32.8 |
33.2 |
| CelebA |
NFEs |
50×1 |
250×1 |
500×1 |
500×3 |
500×5 |
15 |
20 |
25 |
30 |
| Vanilla |
25.0 |
13.6 |
14.0 |
11.3 |
9.5 |
109.1 |
63.5 |
41.3 |
35.4 |
| Ours |
8.0 |
5.7 |
5.9 |
5.9 |
5.6 |
58.0 |
38.9 |
32.2 |
30.0 |
Reasoning Tasks
For the reasoning task, we use energy-based model (IRED) to sample solutions for Sudoku and connectivity tasks. Our proposed PIDLD method outperforms all the baselines across different settings.
Accuracy comparison for baseline and our models across Sudoku and Connectivity tasks under different NFEs.
| Task |
EBM Accuracy (%) |
| Sudoku |
NFEs |
5 |
10 |
15 |
30 |
40 |
80 |
| Vanilla |
45.99 |
51.00 |
50.93 |
50.77 |
53.63 |
55.02 |
| MILD |
49.75 |
54.82 |
53.55 |
55.25 |
56.56 |
56.64 |
| Ours |
50.54 |
55.48 |
55.55 |
55.94 |
57.02 |
56.64 |
| Connectivity |
NFEs |
1 |
2 |
3 |
4 |
5 |
10 |
| Vanilla |
86.16 |
87.22 |
87.22 |
87.48 |
87.38 |
87.49 |
| MILD |
86.16 |
88.54 |
89.21 |
89.75 |
90.15 |
90.33 |
| Ours |
86.16 |
91.32 |
92.31 |
92.82 |
92.95 |
93.28 |
📚 Citation
If you find the idea useful for your research, please consider citing
@inproceedings{chen2025pidcontrolled,
title={{PID}-controlled Langevin Dynamics for Faster Sampling on Generative Models},
author={Hongyi Chen and Jianhai Shu and Jingtao Ding and Yong Li and Xiao-Ping Zhang},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=y9LHDCKeeN}
}