src.utils.auto_gpu module#

class src.utils.auto_gpu.AutoGPU[source]#

Bases: object

Automatic GPU memory manager used to select a GPU with sufficient free memory.

__init__()[source]#

Initialize AutoGPU and get the currently visible CUDA device list.

static allocate_gpu(device, memory_MB: int, block_MB: int | None = None)[source]#

[Internal method] Allocate placeholder memory on the target device.

This is used to verify that memory is truly available by actually allocating it, or to proactively reserve GPU memory.

Parameters:
  • device (str or torch.device) – Target device.

  • memory_MB (int) – Amount of memory to allocate in MB.

  • block_MB (int, optional) – Block size. If None, allocate in one shot.

Returns:

References to the allocated tensors.

Return type:

torch.Tensor or List[torch.Tensor]

choice_gpu(memory_MB, interval=600, force=True)[source]#

Select a GPU with enough free memory.

This method not only queries nvidia-smi, but also tries to allocate memory to verify actual availability. If all GPUs are busy and force=True, it blocks and waits.

Parameters:
  • memory_MB (int) – Minimum memory required by the task in MB.

  • interval (int, optional) – Polling interval in seconds. Default is 600.

  • force (bool, optional) – Whether to wait until a GPU becomes available. If False and no GPU is available, returns “cpu”. Default is True.

Returns:

Selected device string, such as “cuda:0” or “cpu”.

Return type:

str

query_free_memory(gpu_id)[source]#
update_free_memory()[source]#