WeightFlow: Learning Stochastic Dynamics via Evolving Weight of Neural Network

💐 Paper accepted as AAAI'26 Oral

Ruikun Li1, Jiazhen Liu2, Huandong Wang2*, Qingmin Liao1, Yong Li2

1 Shenzhen International Graduate School, Tsinghua University,
2 Department of Electronic Engineering, BNRist, Tsinghua University

* Corresponding Author (wanghuandong@tsinghua.edu.cn)

arXivPaper Appendix Code
WeightFlow Conceptual Overview
Learning stochastic dynamics: WeightFlow projects the evolution of probability distributions (middle) from stochastic state trajectories (top) into a continuous path in the parameterized weight space of a neural network (bottom).

🔍 Highlights

⚙️ WeightFlow Framework

WeightFlow models the neural network weights as a graph and employs a graph neural differential equation to learn the continuous dynamics of this weight graph. The framework consists of two main parts:

1. Backbone ($\theta_t$): A backbone network with parameters $\theta_t$ models the static probability distribution at time $t$ using an autoregressive factorization:

\[ p(x,t)=p_{\theta_{t}}(x)=\prod_{i=1}^{d}p_{\theta_{t}}(x_{i}|x_{1},...,x_{i-1}) \]

2. Hypernetwork ($g_{\phi}$): A graph hypernetwork $g_{\phi}$ then models the continuous evolution of these weights $\theta_t$ as a Controlled Differential Equation (CDE):

\[ \theta_{\tau}=\theta_{0}+\int_{0}^{\tau}g_{\phi}(\theta_{t},t)\frac{dZ_{t}}{dt}dt \]

WeightFlow Framework Diagram
The framework of WeightFlow, illustrating the backbone, weight graph, path projection, and the Graph Neural Differential Equation.

📊 Experimental Results

We empirically evaluate WeightFlow on a diverse set of simulated and real-world stochastic dynamics, demonstrating its superior performance and robustness.

Simulated Datasets: Discrete State Systems

We first benchmarked WeightFlow against several state-of-the-art baselines on five discrete stochastic systems. As shown in Table 1, WeightFlow significantly outperforms all baselines, improving the Wasserstein (W) and Jensen-Shannon (JSD) distances by 32.04% and 53.99% on average, respectively.

Table 1: Statistic results on various stochastic dynamical systems. (All values $\times 10^{-1}$)
Model Epidemic Toggle Switch Signalling Cascade1 Signalling Cascade2 Ecological Evolution
$\mathcal{W} \downarrow$ $JSD \downarrow$ $\mathcal{W} \downarrow$ $JSD \downarrow$ $\mathcal{W} \downarrow$ $JSD \downarrow$ $\mathcal{W} \downarrow$ $JSD \downarrow$ $\mathcal{W} \downarrow$ $JSD \downarrow$
Latent SDE 3.14±0.25 4.22±0.26 2.34±0.15 1.27±0.12 3.04±0.17 0.85±0.14 3.59±0.13 1.02±0.06 8.04±0.33 3.52±0.23
Neural MJP 1.88±0.14 1.61±0.14 2.13±0.26 0.94±0.14 1.69±0.15 0.30±0.04 1.68±0.11 0.36±0.01 1.68±0.18 0.51±0.03
T-IB 2.62±0.17 3.52±0.29 1.59±0.20 0.88±0.11 1.66±0.16 0.32±0.04 2.16±0.17 0.40±0.03 2.17±0.24 0.56±0.06
NLSB 3.27±0.28 1.65±0.14 2.97±0.30 1.32±0.20 1.50±0.10 0.39±0.05 1.83±0.15 0.48±0.05 3.09±0.26 2.80±0.32
DeepRUOT 1.78±0.13 1.08±0.09 1.37±0.17 0.77±0.05 0.52±0.02 0.07±0.00 0.51±0.01 0.08±0.00 3.27±0.31 2.47±0.36
WeightFlow (Ours) 1.10±0.14 0.34±0.01 0.82±0.07 0.33±0.02 0.48±0.03 0.04±0.00 0.49±0.07 0.06±0.01 0.51±0.07 0.12±0.02

In the ecological evolution system (visualized below), a 2D genetic phenotype (Locus 1, Locus 2) evolves towards a global peak on a fitness landscape. WeightFlow accurately predicts the distribution's evolution, capturing both macroscopic landscape shifts and fine-grained local dynamics.

Ecological Evolution Results
Joint and marginal distributions predicted by WeightFlow over time on the Ecological Evolution system.

Real-world Datasets: Real-World Single-Cell Data

We also evaluated WeightFlow on high-dimensional, continuous-space, single-cell differentiation datasets. The visualization of the pancreatic β-cell differentiation path shows our model's predictions. WeightFlow is significantly more accurate for higher-order moments like skewness and kurtosis, reproducing fine-grained distribution structures.

Beta-cell Differentiation Results
Weight prediction for β-cell differentiation, showing continuous evolution and comparison to DeepRUOT.
Table 2: Statistical results on real-world cell datasets.
Model $\beta$-cell Embryoid
$\mathcal{W} \downarrow$ $MMD \downarrow$ $\mathcal{W} \downarrow$ $MMD \downarrow$
NLSB 11.18±0.22 0.07±0.01 14.39±0.40 0.10±0.03
RUOT 10.99±0.20 0.06±0.01 14.71±0.49 0.15±0.03
WeightFlow (Ours) 9.73±0.27 0.02±0.01 14.18±0.43 0.03±0.01

Ablation Studies

We performed ablation studies to validate key design choices of WeightFlow.

Component Ablations

Autoregressive Order Ablation
Autoregressive Order: A random order performs similarly to the sequential one, confirming robustness.
Backbone Architecture Ablation
Backbone Architecture: Performance is similar for both GRU and Transformer backbones.
Sequential Aligning Ablation
Sequential Aligning: Disabling the warm-start strategy (w/o Aligned) leads to significant performance degradation.

Sensitivity Analysis

We analyzed WeightFlow's sensitivity to various hyperparameters, demonstrating its robustness.

Impact of Backbone, Data, and Path Dimension

Backbone Size Sensitivity
Backbone Size: A small hidden dimension (e.g., 8) is sufficient.
Data Ratio Sensitivity
Data Size: Performance is stable even with only 20% of the data.
Path Dimension Sensitivity
Path Dimension: A 1-dim path is sufficient to capture dynamics.

Time and Space Cost Analysis

WeightFlow is designed to be scalable and efficient. The backbone's size is independent of the system's dimension $d$, with $O(L)$ space complexity (where $L$ is states per dimension) and $O(d)$ inference time. The hypernetwork's complexity $O(N_{nodes}^2)$ is also independent of $d$. This design effectively avoids the curse of dimensionality.

Inference Time vs Error
Inference Time vs. Error: WeightFlow achieves the Pareto frontier, offering the best trade-off compared to baselines.
Model Size Scaling
Model Size: The parameter size of both the backbone and hypernetwork scales only linearly with the number of candidate states, $L$.

📚 Citation

If you find our work useful for your research, please consider citing:

@article{li2025weightflow, title={WeightFlow: Learning Stochastic Dynamics via Evolving Weight of Neural Network}, author={Li, Ruikun and Liu, Jiazhen and Wang, Huandong and Liao, Qingmin and Li, Yong}, journal={arXiv preprint arXiv:2508.00451}, year={2025} }