Distributed training

Distributed training#

 

 

This tutorial executes a distributed training workload that connects the following heterogeneous workloads:

  • preprocess the dataset prior to training

  • distributed training with Ray Train and PyTorch with observability

  • evaluation (batch inference and eval logic)

  • save model artifacts to a model registry (MLOps)

Note: this tutorial doesn’t tune the model but see Ray Tune for experiment execution and hyperparameter tuning at any scale.

%%bash
pip install -q -r /home/ray/default/requirements.txt
pip install -q -e /home/ray/default/doggos
Successfully registered `ipywidgets, matplotlib` and 4 other packages to be installed on all cluster nodes.
View and update dependencies here: https://console.anyscale.com/cld_kvedZWag2qA8i5BjxUevf5i7/prj_cz951f43jjdybtzkx1s5sjgz99/workspaces/expwrk_1dp3fa7w5hu3i83ldsi7lqvp9t?workspace-tab=dependencies
Successfully registered `doggos` package to be installed on all cluster nodes.
View and update dependencies here: https://console.anyscale.com/cld_kvedZWag2qA8i5BjxUevf5i7/prj_cz951f43jjdybtzkx1s5sjgz99/workspaces/expwrk_1dp3fa7w5hu3i83ldsi7lqvp9t?workspace-tab=dependencies

Note: A kernel restart may be required for all dependencies to become available.

If using uv, then:

  1. Turn off the runtime dependencies (Dependencies tab up top > Toggle off Pip packages). And no need to run the pip install commands above.

  2. Change the python kernel of this notebook to use the venv (Click on base (Python x.yy.zz) on top right cordern of notebook > Select another Kernel > Python Environments... > Create Python Environment > Venv > Use Existing) and done! Now all the notebook’s cells will use the virtual env.

  3. Change the py executable to use uv run instead of python by adding this line after importing ray.

import os
os.environ.pop("RAY_RUNTIME_ENV_HOOK", None)
import ray
ray.init(runtime_env={"py_executable": "uv run", "working_dir": "/home/ray/default"})
%load_ext autoreload
%autoreload all
import os
import ray
import sys

sys.path.append(os.path.abspath("../doggos/"))
# If using UV
# os.environ.pop("RAY_RUNTIME_ENV_HOOK", None)
# Enable Ray Train v2. It's too good to wait for public release!
os.environ["RAY_TRAIN_V2_ENABLED"] = "1"
ray.init(
    # connect to existing ray runtime (from previous notebook if still running)
    address=os.environ.get("RAY_ADDRESS", "auto"),
    runtime_env={
        "env_vars": {"RAY_TRAIN_V2_ENABLED": "1"},
        # "py_executable": "uv run", # if using uv
        # "working_dir": "/home/ray/default",  # if using uv
    },
)
%%bash
# This will be removed once Ray Train v2 is enabled by default.
echo "RAY_TRAIN_V2_ENABLED=1" > /home/ray/default/.env
# Load env vars in notebooks.
from dotenv import load_dotenv

load_dotenv()
True