src.model.relative_model module#

class src.model.relative_model.RelativeModel(*args: Any, **kwargs: Any)[source]#

Bases: Model

Relative-coordinate model.

Inherits from Model. The main differences are: 1. It relies more on relative position and velocity instead of embedding

absolute position directly.

  1. It uses Fourier positional encoding to encode location explicitly.

3. Vehicle features are also processed in relative coordinates. This often improves generalization across scenes with different coordinate systems.

__init__(args)[source]#

Initialize model layers and embedding modules.

Parameters:

args (Namespace) – Configuration object containing: - model_dim (int): Internal model feature dimension. - map_feature_dim (int): Intermediate map feature dimension. - lstm_layer_num (int): Number of LSTM layers for temporal data. - head_num (int): Number of attention heads. - attention_layer_num (int): Number of Transformer decoder layers. - latent_token_num (int): Number of latent tokens used to compress map features. - dropout (float): Dropout ratio. - pred_step (int): Prediction horizon. - use_spatial_anchor (bool): Whether to enhance map positional encoding with spatial anchors.

set_ped_embedding(pos: torch.FloatTensor, vel: torch.FloatTensor, hst: torch.FloatTensor, des: torch.FloatTensor, spd: torch.FloatTensor)[source]#

Compute and store the joint pedestrian embedding.

This method maps pedestrian position, velocity, history, destination, and desired speed into a high-dimensional space and sums them to form the initial pedestrian feature vector. It also computes positional encoding.

Unlike Model, RelativeModel does not use the absolute coordinates in pos directly. Instead, it encodes pos with FourierPositionalEncoding.

Parameters:
  • pos (torch.FloatTensor) – Current pedestrian coordinates (x, y). Shape: (batch_size, num_peds, 2)

  • vel (torch.FloatTensor) – Current pedestrian velocity (vx, vy). Shape: (batch_size, num_peds, 2)

  • hst (torch.FloatTensor) – Pedestrian history trajectory sequence. Shape: (batch_size, num_peds, hist_step, 2)

  • des (torch.FloatTensor) – Pedestrian destination coordinates. Shape: (batch_size, num_peds, 2)

  • spd (torch.FloatTensor) – Pedestrian desired speed scalar. Shape: (batch_size, num_peds, 1)

Side Effects:

Sets self.ped_embedding: fused pedestrian features. Sets self.pos: cached current positions for later map indexing. Sets self.pe: positional encoding features.

set_veh_embedding(veh: torch.FloatTensor)[source]#

Compute and store vehicle feature embeddings.

Processes vehicle trajectory history with an LSTM. If the current scene contains no vehicles, NaN padding is inserted automatically.

Parameters:

veh (torch.FloatTensor) – Vehicle history trajectory sequence. Shape: (batch_size, num_vehs, hist_step + 1, 2)

Side Effects:

Sets self.veh_embedding: vehicle feature vectors.

forward(denoise_t: torch.LongTensor, noisy_acc: torch.FloatTensor, ped_length: torch.LongTensor, veh_length: torch.LongTensor, timer: NamedTimer | None = None)[source]#

Forward pass: predict denoised trajectories from noisy inputs.

This method must be called after the set_*_embedding methods. It fuses the following information through Transformer decoder layers: 1. Diffusion timestep embedding (t) 2. Current noisy acceleration embedding (x_t) 3. Social interaction (Ped-Ped Attention) 4. Pedestrian-vehicle interaction (Ped-Veh Attention) 5. Environment interaction (Ped-Map Attention)

Parameters:
  • denoise_t (torch.LongTensor) – Current diffusion timestep t. Shape: (batch_size,)

  • noisy_acc (torch.FloatTensor) – Noisy future acceleration sequence, the diffusion input x_t. Shape: (batch_size, num_peds, pred_step, 2)

  • ped_length (torch.LongTensor) – Number of valid pedestrians per sample in the batch, used for masking. Shape: (batch_size,)

  • veh_length (torch.LongTensor) – Number of valid vehicles per sample in the batch, used for masking. Shape: (batch_size,)

  • timer (NamedTimer, optional) – Timer object for performance profiling.

Returns:

Model prediction.

If args.predict_noise is True, this is the predicted noise epsilon; otherwise it is the predicted original signal x_0 in acceleration space. Shape: (batch_size, num_peds, pred_step, 2)

Return type:

torch.FloatTensor