PID-controlled Langevin Dynamics for Faster Sampling of Generative Models

💐 Paper accepted at NeurIPS 2025

Hongyi Chen^1,3*, Jianhai Shu^2*, Jingtao Ding^2†, Yong Li², Xiao-Ping Zhang^1†

¹ Shenzhen Key Laboratory of Ubiquitous Data Enabling Laboratory, Shenzhen International Graduate School, Tsinghua University,
² Department of Electronic Engineering, Tsinghua University,
³ Pengcheng Laboratory.

Paper

NeurIPS Code

PID-controlled systems eliminate errors at steady-state through the integral term while dampening overshoot via the derivative term.

🔍 Highlights

Control-theoretic insight: Reinterprets Langevin dynamics as a feedback control system, where energy gradients act as feedback signals.
PID-enhanced sampling: Integrates Proportional, Integral, and Derivative control terms into Langevin updates:
- P-term: basic gradient guidance;
- I-term: accumulates historical gradients for momentum-like acceleration;
- D-term: anticipates gradient trends for adaptive stabilization.
Plug-and-play compatibility:
- Requires no retraining or prior information;
- Integrates with any Langevin-based sampler directly (EBM, SGM, diffusion, etc.).
Significant speedup:
- Achieves up to 10× faster sampling while maintaining or improving generation quality across image and reasoning tasks.

⚙️ Algorithm Workflow

The PID-controlled Langevin dynamics update is given by:

\[ x_{t+1}=x_t+\epsilon\Big(k_p\nabla_{x}U_\theta(x_t)+\frac{k_i}{t}\sum_{s=0}^{t}\nabla_{x}U_\theta(x_s)+k_d(\nabla_{x}U_\theta(x_t)-\nabla_xU_\theta(x_{t-1}))\Big)+\sqrt{2\epsilon}\,\xi_t, \]

where $k_p,k_i,k_d$ are the coefficients of the proportional, integral, and derivative gains, $U_{\theta}(\cdot)$ is the energy function, and $\xi_t\sim\mathcal{N}(\mathbf{0},\mathbf{I})$.

PIDLD Algorithm Flowchart

1Require: Score function $ \nabla_x U_\theta(x)=\nabla_x\log p_\theta(x)=\nabla_x(-f_\theta(x)) $; number of steps $T$; step size $\epsilon$; control parameters $k_p,k_i,k_d$; decay rate $\gamma < 1$; initial point $x_0$.

2Initialize integral term $I_0=0$.

3Compute initial score $s_0=\nabla_x U_\theta(x_0)$.

4For $t=0$ to $T-1$ do

5$s_t=\nabla_x U_\theta(x_t)$.

6$P_t = s_t$ (Proportional term)

7$I_t = \frac{1}{t+1}\big(I_{t-1}\cdot t + s_t\big)$ (Integral term)

8$D_t = s_t - s_{t-1}$ (Derivative term)

9Compute control signal: $u_t = k_p P_t + k_i I_t + k_d D_t$.

10Update state:

\[ x_{t+1} = x_t + \epsilon \cdot u_t + \sqrt{2\epsilon}\,\xi_t,\quad \xi_t \sim \mathcal{N}(0,I) \]

11Decay integral gain: $k_i = k_i \cdot \gamma$.

12End for

13Return: $\hat{x} = x_T$.

📊 Experiments

In the experiments, we evaluated our method against standard Langevin sampling on mainstream generative models (SGM, EBM). We first used toy experiments on 2-d point datasets to validate the effectiveness of integral and derivative terms, and then used benchmark datasets in classical image generation tasks and reasoning tasks to further demonstrate the superiority of our method.

Toy Experiments

In the toy experiments, we sample points from a 2-d Gaussian mixture distribution using PID-controlled Langevin dynamics with different control parameters, and compare the sampling speed and quality.

Adding $I$ term, $D$ term, or both $I$ and $D$ terms, would significantly accelerate sampling process and produce lower KL divergence.

Increasing derivative term coefficient $k_d$ produces lower KL divergence.

Increasing integral term coefficient $k_i$ produces lower KL divergence, but excessive $k_i$ may lead to rebounding phenomenon.

Decaying integral term coefficient $k_i$ ensures smoother transition from early exploration to later convergence, alleviating the rebounding phenomenon.

Effect of $k_i$ and $k_d$ on bias. Adding $I$ term (left) excels in mitigating bias.

Effect of $k_i$ and $k_d$ on oscillation. Adding $D$ term (right) excels in reducing oscillation.

Computer Vision Tasks

For the computer vision task, we use score-based generative model (NCSNv2) and energy-based model (IGEBM) to sample images from CIFAR10 and CelebA datasets. Our proposed PIDLD method outperforms all the baselines across different settings.

CIFAR10 samples with 25 sampling steps using score-based generative model.

CelebA samples with 25 sampling steps using score-based generative model.

**FID comparison for SGM and EBM models across different NFEs on CIFAR10 and CelebA dataset.**
Dataset		SGM					EBM
Dataset	CIFAR10	NFEs	25×1	100×1	232×1	232×3	232×5	10	20	30	40
Vanilla		46.8	17.2	16.0	12.8	12.5	135.8	58.1	40.3	35.3
Ours		18.3	12.1	11.7	11.6	11.4	99.0	46.1	32.8	33.2
CelebA	NFEs	50×1	250×1	500×1	500×3	500×5	15	20	25	30
	Vanilla	25.0	13.6	14.0	11.3	9.5	109.1	63.5	41.3	35.4
	Ours	8.0	5.7	5.9	5.9	5.6	58.0	38.9	32.2	30.0

Ablation study of PIDLD under image sampling tasks. P+I+D denotes the complete model, while P+I, P+D, and P represent models with the derivative term, integral term, or both terms removed, respectively. Figures on the bar indicate performance improvements of each ablation compared to P.

Reasoning Tasks

For the reasoning task, we use energy-based model (IRED) to sample solutions for Sudoku and connectivity tasks. Our proposed PIDLD method outperforms all the baselines across different settings.

**Accuracy comparison for baseline and our models across Sudoku and Connectivity tasks under different NFEs.**
Task	EBM Accuracy (%)
Sudoku	NFEs	5	10	15	30	40	80
	Vanilla	45.99	51.00	50.93	50.77	53.63	55.02
	MILD	49.75	54.82	53.55	55.25	56.56	56.64
	Ours	50.54	55.48	55.55	55.94	57.02	56.64
Connectivity	NFEs	1	2	3	4	5	10
	Vanilla	86.16	87.22	87.22	87.48	87.38	87.49
	MILD	86.16	88.54	89.21	89.75	90.15	90.33
	Ours	86.16	91.32	92.31	92.82	92.95	93.28

Ablation study of PIDLD under reasoning tasks. P+I+D denotes the complete model, while P+I, P+D, and P represent models with the derivative term, integral term, or both terms removed, respectively. Percentages indicate performance improvements of each ablation compared to P.

📚 Citation

If you find the idea useful for your research, please consider citing

@inproceedings{chen2025pidcontrolled, title={{PID}-controlled Langevin Dynamics for Faster Sampling on Generative Models}, author={Hongyi Chen and Jianhai Shu and Jingtao Ding and Yong Li and Xiao-Ping Zhang}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, url={https://openreview.net/forum?id=y9LHDCKeeN} }