Ray: 0.4 Release

We are pleased to announce the 0.4 release of Ray. This release introduces improvements to Ray’s scheduling, substantial backend improvements, and the start of Pandas on Ray, as well as many improvements to RLlib and Tune (you can read more about the improvements in RLlib in this blog post).

To upgrade to the latest version, run

pip install -U ray

Scheduling

This release includes two major changes to the scheduling behavior: spillback scheduling and support for custom resources.

Spillback Scheduling

Because Ray takes a bottom-up hierarchical approach to scheduling in which scheduling decisions are made by local schedulers on each machine (in order to avoid a centralized scheduling bottleneck), scheduling decisions are often made with a slightly stale view of the system state. As a consequence, it is possible for race conditions to occur and for too many tasks to be assigned to a single machine. For example, consider the case in which there are 100 GPUs scattered around the cluster and workers on the various machines submit a total of exactly 100 GPU tasks. If these tasks take a while to execute, then the desired behavior is for one task to be assigned to each GPU. However, without a single scheduling bottleneck like a centralized scheduler, too many tasks may be assigned to a single machine resulting in delays.

Spillback scheduling provides a mechanism for correcting for bad scheduling decisions. At a high-level, if a local scheduler decides that it does not want to execute a task, it can spill the task back to the global scheduler (or in principle to another local scheduler). This mechanism allows us to achieve perfect load balancing along with high task throughput.

Custom Resources

Remote functions and actors now support scheduling with arbitrary custom resource requirements. We can specify that a remote function requires 1 CPU, 2 GPUs, and 3 of some custom resource with syntax like the following.

@ray.remote(num_cpus=1, num_gpus=2, resources={'Custom': 3})
def f():
    pass

To tell Ray that a node has 6 of the custom resource, Ray should be started on that machine with a command like the following.

ray start ... --resources='{"Custom": 6}'

Custom resources can be used for a variety of different purposes. For example, the can be used to do bookkeeping of a concrete resource like memory, or to indicate that a particular dataset lives on a particular machine, or to give machines certain roles (such as a “parameter server” machine or a “worker” machine).

Libraries

This release also includes the start of Pandas on Ray, which is a project aimed at speeding up Pandas DataFrames. It also includes substantial improvements to RLlib, such as high-quality implementations of algorithms like Ape-X, and substantial improvements to Tune, such as implementations of state of the art algorithms like Population Based Training.