src.dataset.base_dataset module#

class src.dataset.base_dataset.RasterizedMap(map: numpy.array | None = None, xmin: float | None = None, ymin: float | None = None, xmax: float | None = None, ymax: float | None = None)[source]#

Bases: object

Rasterized map data structure.

map#

2D raster map array, where 0 usually means walkable area and 1 means obstacle.

Type:

np.array

xmin#

Minimum x value of the map in world coordinates.

Type:

float

ymin#

Minimum y value of the map in world coordinates.

Type:

float

xmax#

Maximum x value of the map in world coordinates.

Type:

float

ymax#

Maximum y value of the map in world coordinates.

Type:

float

map: numpy.array = None#
xmin: float = None#
ymin: float = None#
xmax: float = None#
ymax: float = None#
__init__(map: numpy.array | None = None, xmin: float | None = None, ymin: float | None = None, xmax: float | None = None, ymax: float | None = None) None#
exception src.dataset.base_dataset.EmptyDatasetError[source]#

Bases: BaseException

class src.dataset.base_dataset.BaseDataset(*args: Any, **kwargs: Any)[source]#

Bases: Dataset

Base class for all pedestrian-trajectory prediction datasets.

Provides shared logic for sample splitting, resampling, coordinate normalization, cache management, and the PyTorch DataLoader collate_fn. Dataset-specific loading logic is implemented by subclasses through load_data.

__init__(name: str, args: Namespace, df_data: pandas.DataFrame, map_data: RasterizedMap | None = None)[source]#

Initialize the dataset.

Parameters:
  • name (str) – Dataset name, e.g. “eth” or “zara01”.

  • args (Namespace) – Global configuration containing fps, hist_step, pred_step, and related fields.

  • df_data (pd.DataFrame) – DataFrame containing all trajectory data. Required columns: [‘f’, ‘id’, ‘x’, ‘y’, ‘type’]. - f: frame index - id: trajectory ID - x, y: coordinates - type: ‘pedestrian’ or ‘vehicle’

  • map_data (RasterizedMap, optional) – Scene map data.

classmethod load_data(args) BaseDataset[source]#

[Abstract method] Load data from a file path and return a dataset instance.

Subclasses must implement this method to handle dataset-specific raw formats.

Parameters:

args (Namespace) – Global configuration.

Returns:

Loaded dataset instance.

Return type:

BaseDataset

Raises:

NotImplementedError – Raised when a subclass does not implement this method.

split_samples(df_data, use_tqdm=True)[source]#

Split continuous trajectory data into sliding-window samples for training and evaluation.

Samples are generated according to args.hist_step, args.pred_step, and args.skip_step. Each sample contains the pedestrian and vehicle history, future labels, and context for the current scene.

Parameters:
  • df_data (pd.DataFrame) – DataFrame containing complete trajectories.

  • use_tqdm (bool, optional) – Whether to show a progress bar.

Returns:

List of sample dictionaries, including:
  • pos: current positions (#ped, 2)

  • vel: current velocities (#ped, 2)

  • des: destinations (#ped, 2)

  • spd: desired speeds (#ped, 1)

  • hst: history trajectories (#ped, hist_step, 2)

  • veh: vehicle history (#veh, hist_step+1, 2)

  • future_acc: future acceleration labels (#ped, pred_step, 2)

Return type:

List[dict]

static collate_fn(batch)[source]#

Custom DataLoader collate function for padding variable-length sequences.

Parameters:

batch (List[dict]) – Sample list returned by __getitem__.

Returns:

Batched tensors after padding and stacking.

Includes keys such as ‘pos’, ‘vel’, ‘ped_length’, and ‘veh_length’. Padding values are usually 0 for coordinates or -1 for IDs.

Return type:

dict

static resample_dataframe(df_data, raw_fps=30, target_fps=2.5)[source]#

Resample trajectory data to the target frame rate expected by the model.

Parameters:
  • df_data (pd.DataFrame) – Raw trajectory data.

  • raw_fps (float) – Original frame rate.

  • target_fps (float) – Target frame rate.

Returns:

Resampled DataFrame with interpolated coordinates and updated frame indices.

Return type:

pd.DataFrame

static normalize_xy(df_data, map_data)[source]#

Apply Z-score-style normalization to coordinate data.

The current implementation uses std = 1.0, so it effectively centers only. The commented code shows how full standard-deviation scaling would work.

Parameters:
  • df_data (pd.DataFrame) – Trajectory data.

  • map_data (RasterizedMap) – Map data.

Returns:

(normalized_df_data, updated_map_data)

Return type:

tuple

static save_cache(obj, cache_path)[source]#

Serialize and save a dataset object to disk cache.

static load_cache(cache_path)[source]#

Load a dataset object from disk cache.