KubeRay integration with MCAD (Multi-Cluster-App-Dispatcher)¶
The multi-cluster-app-dispatcher is a Kubernetes controller providing mechanisms for applications to manage batch jobs in a single or multi-cluster environment. For more details please refer here.
MCAD allows you to deploy Ray cluster with a guarantee that sufficient resources are available in the cluster prior to actual pod creation in the Kubernetes cluster. It supports features such as:
- Integrates with upstream Kubernetes scheduling stack for features such co-scheduling, Packing on GPU dimension etc.
- Ability to wrap any Kubernetes objects.
- Increases control plane stability by JIT (Just-in Time) object creation.
- Queuing with policies.
- Quota management that goes across namespaces.
- Support for multiple Kubernetes clusters; dispatching jobs to any one of a number of Kubernetes clusters.
In order to queue Ray cluster(s) and
gang dispatch them when aggregated resources are available please create a KinD cluster using the instruction below and then refer to the setup KubeRay-MCAD integration on a Kubernetes Cluster or an OpenShift Cluster.
On OpenShift, MCAD and KubeRay are already part of the Open Data Hub Distributed Workload Stack. The stack provides a simple, user-friendly abstraction for scaling, queuing and resource management of distributed AI/ML and Python workloads. Please follow the Quick Start in the Distributed Workloads for installation.
Create KinD cluster¶
Note: Without Podman, a KinD worker node is allowed to see the cpu/memory resources on the host. In addition, this environment is created to run the tutorial on a resource-constrained local Kubernetes environment. It is not recommended for real workloads or production.Expect the Podman Machine running with the follow CPU and MEMORY resources
podman machine init --cpus 8 --memory 8196 podman machine start podman machine listCreate KinD cluster on the Podman Machine:
NAME VM TYPE CREATED LAST UP CPUS MEMORY DISK SIZE podman-machine-default* qemu 2 minutes ago Currently running 8 8.594GB 107.4GBCreating a KinD cluster should take less than 1 minute. Expect the output similar to:
KIND_EXPERIMENTAL_PROVIDER=podman kind create cluster
using podman due to KIND_EXPERIMENTAL_PROVIDER enabling experimental podman provider Creating cluster "kind" ... ✓ Ensuring node image (kindest/node:v1.26.3) 🖼 ✓ Preparing nodes 📦 ✓ Writing configuration 📜 ✓ Starting control-plane 🕹️ ✓ Installing CNI 🔌 ✓ Installing StorageClass 💾 Set kubectl context to "kind-kind" You can now use your cluster with: kubectl cluster-info --context kind-kind Have a nice day! 👋
Describe the single node cluster:
kubectl describe node kind-control-plane
memory in the
Allocatable section to be similar to:
Allocatable: cpu: 8 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8118372Ki pods: 110
Submitting KubeRay cluster to MCAD¶
After the KinD cluster is created using the instruction above, make sure to install the KubeRay-MCAD integration Prerequisites for KinD cluster.
Let's create two RayClusters using the AppWrapper custom resource(CR) on the same Kubernetes cluster. The AppWrapper is the custom resource definition provided by MCAD to dispatch resources and manage batch jobs on Kubernetes clusters.
- We submit the first RayCluster with the AppWrapper CR aw-raycluster.yaml:
kubectl create -f https://raw.githubusercontent.com/project-codeflare/multi-cluster-app-dispatcher/main/doc/usage/examples/kuberay/config/aw-raycluster.yaml
generictemplate. We also specified matching resources for each of the RayCluster Head node and worker node in the
custompodresources. The MCAD uses the
custompodresourcesto reserve the required resources to run the RayCluster without creating pending Pods.
Note: Within the same AppWrapper, you may also wrap any individual k8s resources (i.e. configMap, secret, etc) associated with this job as a generictemplate to be dispatched together with the RayCluster.
Check AppWrapper status by describing the job.
kubectl describe appwrapper raycluster-complete -n default
Status: stanza would show the
Running if the wrapped RayCluster has been deployed. The 2 Pods associated with the RayCluster were also created.
Status: Canrun: true Conditions: Last Transition Micro Time: 2023-08-29T02:50:18.829462Z Last Update Micro Time: 2023-08-29T02:50:18.829462Z Status: True Type: Init Last Transition Micro Time: 2023-08-29T02:50:18.829496Z Last Update Micro Time: 2023-08-29T02:50:18.829496Z Reason: AwaitingHeadOfLine Status: True Type: Queueing Last Transition Micro Time: 2023-08-29T02:50:18.842010Z Last Update Micro Time: 2023-08-29T02:50:18.842010Z Reason: FrontOfQueue. Status: True Type: HeadOfLine Last Transition Micro Time: 2023-08-29T02:50:18.902379Z Last Update Micro Time: 2023-08-29T02:50:18.902379Z Reason: AppWrapperRunnable Status: True Type: Dispatched Controllerfirsttimestamp: 2023-08-29T02:50:18.829462Z Filterignore: true Queuejobstate: Dispatched Sender: before manageQueueJob - afterEtcdDispatching State: Running Events: <none> (base) asmalvan@mcad-dev:~/mcad-kuberay$ kubectl get pod -n default NAME READY STATUS RESTARTS AGE raycluster-complete-head-9s4x5 1/1 Running 0 47s raycluster-complete-worker-small-group-4s6jv 1/1 Running 0 47s
- Let's submit another RayCluster with the AppWrapper CR and see it queued without creating pending Pods using the command:
Check the raycluster-complete-1 AppWrapper
kubectl create -f https://raw.githubusercontent.com/project-codeflare/multi-cluster-app-dispatcher/main/doc/usage/examples/kuberay/config/aw-raycluster-1.yamlThe
kubectl describe appwrapper raycluster-complete-1 -n default
Status:stanza should show the
Pendingif the wrapped object (RayCluster) has been queued. No pods from the second
AppWrapperwere created due to
Insufficient resources to dispatch AppWrapper.
Status: Conditions: Last Transition Micro Time: 2023-08-29T17:39:08.406401Z Last Update Micro Time: 2023-08-29T17:39:08.406401Z Status: True Type: Init Last Transition Micro Time: 2023-08-29T17:39:08.406452Z Last Update Micro Time: 2023-08-29T17:39:08.406451Z Reason: AwaitingHeadOfLine Status: True Type: Queueing Last Transition Micro Time: 2023-08-29T17:39:08.423208Z Last Update Micro Time: 2023-08-29T17:39:08.423208Z Reason: FrontOfQueue. Status: True Type: HeadOfLine Last Transition Micro Time: 2023-08-29T17:39:08.439753Z Last Update Micro Time: 2023-08-29T17:39:08.439753Z Message: Insufficient resources to dispatch AppWrapper. Reason: AppWrapperNotRunnable. Status: True Type: Backoff Controllerfirsttimestamp: 2023-08-29T17:39:08.406399Z Filterignore: true Queuejobstate: Backoff Sender: before ScheduleNext - setHOL State: Pending Events: <none>
We may manually check the allocated resources:
kubectl describe node kind-control-plane
Allocated resourcessection showed cpu Requests as 6050m(75%) therefore the remaining cpu resource did not satisfy the second AppWrapper.
Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 6050m (75%) 5200m (65%) memory 6824650Ki (84%) 6927050Ki (85%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%)
For example, observe the other RayCluster been created after deleting the first AppWrapper using:
kubectl delete appwrapper raycluster-complete -n default
Note: This would also simultaneously remove any K8s resources you may have wrapped as generictemplates within this AppWrapper.