07 · Configure Scaling with ScalingConfig

07 · Configure Scaling with ScalingConfig#

The ScalingConfig tells Ray Train how many workers to launch and what resources each worker should use.

  • num_workers=8 → Run the training loop on 8 parallel workers. Each worker runs the same code on a different shard of the data.

  • use_gpu=True → Assign one GPU per worker. If you set this to False, each worker would train on CPU instead.

This declarative config is what allows Ray to handle cluster orchestration for you — you don’t need to manually start processes or set CUDA devices.

Later, we’ll pass this scaling_config into the TorchTrainer to launch distributed training.

# 07. Configure the scaling of the training job

# ScalingConfig defines how many parallel training workers Ray should launch
# and whether each worker should be assigned a GPU or CPU.
# → Each worker runs train_loop_ray_train(config) independently,
#    with Ray handling synchronization via DDP under the hood.

scaling_config = ScalingConfig(
    num_workers=8,   # Launch 8 training workers (1 process per worker)
    use_gpu=True     # Allocate 1 GPU to each worker
)

Docs on ScalingConfig can be found with the link in this sentence.

See docs on configuring scale and GPUs for more details.