07 · Configure Scaling with ScalingConfig#
The ScalingConfig tells Ray Train how many workers to launch and what resources each worker should use.
num_workers=8→ Run the training loop on 8 parallel workers. Each worker runs the same code on a different shard of the data.use_gpu=True→ Assign one GPU per worker. If you set this toFalse, each worker would train on CPU instead.
This declarative config is what allows Ray to handle cluster orchestration for you — you don’t need to manually start processes or set CUDA devices.
Later, we’ll pass this scaling_config into the TorchTrainer to launch distributed training.
# 07. Configure the scaling of the training job
# ScalingConfig defines how many parallel training workers Ray should launch
# and whether each worker should be assigned a GPU or CPU.
# → Each worker runs train_loop_ray_train(config) independently,
# with Ray handling synchronization via DDP under the hood.
scaling_config = ScalingConfig(
num_workers=8, # Launch 8 training workers (1 process per worker)
use_gpu=True # Allocate 1 GPU to each worker
)
Docs on ScalingConfig can be found with the link in this sentence.
See docs on configuring scale and GPUs for more details.