09 · Build the DataLoader with prepare_data_loader()

09 · Build the DataLoader with prepare_data_loader()#

Now let’s define a helper that builds the MNIST DataLoader and makes it Ray Train–ready.

  • Apply standard preprocessing:

    • ToTensor() → convert PIL images to PyTorch tensors

    • Normalize((0.5,), (0.5,)) → center and scale pixel values

  • Construct a PyTorch DataLoader with batching and shuffling.

  • Finally, wrap it with prepare_data_loader(), which automatically:

    • Moves each batch to the correct device (GPU or CPU).

    • Copies data from host memory to device memory as needed.

    • Injects a PyTorch DistributedSampler when running with multiple workers, so that each worker processes a unique shard of the dataset.

This utility lets you use the same DataLoader code whether you’re training on one GPU or many — Ray handles the distributed sharding and device placement for you.

# 09. Build a Ray Train–ready DataLoader for MNIST

def build_data_loader_ray_train(batch_size: int) -> torch.utils.data.DataLoader:
    # Define preprocessing: convert to tensor + normalize pixel values
    transform = Compose([ToTensor(), Normalize((0.5,), (0.5,))])
    # Load the MNIST training set from persistent cluster storage
    train_data = MNIST(
        root="/mnt/cluster_storage/data",
        train=True,
        download=True,
        transform=transform,
    )

    # Standard PyTorch DataLoader (batching, shuffling, drop last incomplete batch)
    train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True, drop_last=True)

    # prepare_data_loader():
    # - Adds a DistributedSampler when using multiple workers
    # - Moves batches to the correct device automatically
    train_loader = ray.train.torch.prepare_data_loader(train_loader)
    
    return train_loader

Ray Data integration

This step isn’t necessary if you are integrating your Ray Train workload with Ray Data. It’s especially useful if preprocessing is CPU-heavly and user wants to run preprocessing and training of separate instances.