Ray Crash Course - Actors¶

Anyscale Academy

Using Ray tasks is great for distributing work around a cluster, but we've said nothing so far about managing distributed state, one of the big challenges in distributed computing. Ray tasks are great for stateless computation, but we need something for stateful computation.

Python classes are a familiar mechanism for encapsulating state. Just as Ray tasks extend the familiar concept of Python functions, Ray addresses stateful computation by extending classes to become Ray actors.

Tip: For more about Ray, see ray.io or the Ray documentation.

What We Mean by Distributed State¶

If you've worked with data processing libraries like Pandas or big data tools like Apache Spark, you know that they provide rich features for manipulating large, structured data sets, i.e., the analogs of tables in a database. Some tools even support partitioning of these data sets over clusters for scalability.

This isn't the kind of distributed "state" Ray addresses. Instead, it's the more open-ended graph of objects found in more general-purpose applications. For example, it could be the state of a game engine used in a reinforcement learning (RL) application or the total set of parameters in a giant neural network, some of which now have hundreds of millions of parameters.

Conway's Game of Life¶

Let's explore Ray's actor model using Conway's Game of Life, a famous cellular automaton.

Here is an example of a notable pattern of game evolution, Gospers glider gun:

Example Gospers glider gun

(credit: Lucas Vieira - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=101736)

We'll use an implementation of Conway's Game of Life as a nontrivial example of maintaining state, the current grid of living and dead cells. We'll see how to leverage Ray to scale it.

Note: Sadly, John Horton Conway, the inventor of this automaton, passed away from COVID-19 on April 11, 2020. This lesson is dedicated to Professor Conway.

Let's start with some imports

In [ ]:

import ray, time, statistics, sys, os
import numpy as np
import os
sys.path.append("..")         # For library helper functions
import ray, time, statistics, sys, os
import numpy as np
import os
sys.path.append("..")         # For library helper functions

I've never seen this done anywhere else, but our implementation of Game of Life doesn't just use 1 for living cells, it uses the number of iterations they've been alive, so 1-N. I'll exploit this when we graph the game.

In [ ]:

from game_of_life import Game, State, ConwaysRules
from game_of_life import Game, State, ConwaysRules

Utility functions for plotting using Holoviews and Bokeh, as well as running and timing games.

In [ ]:

from actor_lesson_util import new_game_of_life_graph, new_game_of_life_grid, run_games, run_ray_games, show_cmap
from actor_lesson_util import new_game_of_life_graph, new_game_of_life_grid, run_games, run_ray_games, show_cmap

The implementation is a bit long, so all the code is contained in game_of_life.py.

(You can also run that file as a standalone script from the command line, try python game_of_life.py --help. On MacOS and Linux machines, the script is executable, so you can omit the python).

The first class is the State, which encapsulates the board state as an N x N grid of cells, where N is specified by the user. (For simplicity, we just use square grids.) There are two ways to initialize the game, specifying a starting grid or a size, in which case the cells are set randomly. The sample below just shows the size option. State instances are immutable, because the Game (discussed below) keeps a sequence of them, representing the lifetime states of the game.

For smaller grids, it's often possible that the game reaches a terminal state where it stops evolving. Larger grids are more likely to exhibit different cyclic patterns that would evolve forever, thereby making those runs appear to be immortal, except they eventually get disrupted by evolving neighbors.

class State:
    def __init__(self, size = 10):  
        # The version in the file also lets you pass in a grid of initial cells.
        self.size = size
        self.grid = np.random.randint(2, size = size*size).reshape((size, size))

    def living_cells(self):
        cells = [(i,j) for i in range(self.size) for j in range(self.size) if self.grid[i][j] != 0]
        return zip(*cells)

Next, ConwaysRules encapsulates the logic of computing the new state of a game from the current state, using the update rules defined as follows:

Any live cell with fewer than two live neighbours dies, as if by underpopulation.
Any live cell with two or three live neighbours lives on to the next generation.
Any live cell with more than three live neighbours dies, as if by overpopulation.
Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction.

This class is stateless; step() is passed a State instance and it returns a new instance for the udpated state.

class ConwaysRules:
    def step(self, state):
        """
        Determine the next values for all the cells, based on the current
        state. Creates a new State with the changes.
        """
        new_grid = state.grid.copy()
        for i in range(state.size):
            for j in range(state.size):
                new_grid[i][j] = self.apply_rules(i, j, state)
        new_state = State(grid = new_grid)
        return new_state

    def apply_rules(self, i, j, state):
        # Compute and return the next state for grid[i][j]
        return ...

Finally, the game holds a sequence of states and the rules "engine".

class Game:
    def __init__(self, initial_state, rules):
        self.states = [initial_state]
        self.rules = rules

    def step(self, num_steps = 1):
        """Take 1 or more steps, returning a list of new states."""
        new_states = [self.rules.step(self.states[-1]) for _ in range(num_steps)]
        self.states.extend(new_states)
        return new_states

Okay, let's try it out!!

In [ ]:

steps     = 100                # Use a larger number for a long-running game.
game_size = 100
plot_size = 800
max_cell_age = 10              # clip the age of cells for graphing.
use_fixed_cell_sizes = True    # Keep the points the same size. Try False, too!
steps     = 100                # Use a larger number for a long-running game.
game_size = 100
plot_size = 800
max_cell_age = 10              # clip the age of cells for graphing.
use_fixed_cell_sizes = True    # Keep the points the same size. Try False, too!

For the graphs, we'll use a "greenish" background that looks good with RdYlBu color map.

However, if you have red-green color blindness, change the bgcolor string to white! Or, try the second combination with a custom color map cmap and background color white or darkgrey.

In [ ]:

# Color maps from Bokeh: 
cmap = 'RdYlBu' # others: 'Turbo' 'YlOrBr'
bgcolor = '#C0CfC8' # a greenish color, but not great for forms of red-green color blindness, where 'white' is better.

# A custom color map created at https://projects.susielu.com/viz-palette. Works best with white or dark grey background
#cmap=['#ffd700', '#ffb14e', '#fa8775', '#ea5f94', '#cd34b5', '#9d02d7', '#0000ff']
#bgcolor = 'darkgrey' # 'white'
# Color maps from Bokeh: 
cmap = 'RdYlBu' # others: 'Turbo' 'YlOrBr'
bgcolor = '#C0CfC8' # a greenish color, but not great for forms of red-green color blindness, where 'white' is better.

# A custom color map created at https://projects.susielu.com/viz-palette. Works best with white or dark grey background
#cmap=['#ffd700', '#ffb14e', '#fa8775', '#ea5f94', '#cd34b5', '#9d02d7', '#0000ff']
#bgcolor = 'darkgrey' # 'white'

In [ ]:

def new_game(game_size):
    initial_state = State(size = game_size)
    rules = ConwaysRules()
    game  = Game(initial_state=initial_state, rules=rules)
    return game
def new_game(game_size):
    initial_state = State(size = game_size)
    rules = ConwaysRules()
    game  = Game(initial_state=initial_state, rules=rules)
    return game

In [ ]:

game = new_game(10)
print(game.states[0])
game = new_game(10)
print(game.states[0])

Now let's create a graph for a game of life using the imported utility function, new_game_of_life_grid (with only one graph in the "grid" for now).

Note: It will be empty for now.

In [ ]:

_, graphs = new_game_of_life_grid(game_size, plot_size, x_grid=1, y_grid=1, shrink_factor=1.0,
                                  bgcolor=bgcolor, cmap=cmap,
                                  use_fixed_cell_sizes=use_fixed_cell_sizes, max_cell_age=max_cell_age)                                  
graphs[0]
_, graphs = new_game_of_life_grid(game_size, plot_size, x_grid=1, y_grid=1, shrink_factor=1.0,
                                  bgcolor=bgcolor, cmap=cmap,
                                  use_fixed_cell_sizes=use_fixed_cell_sizes, max_cell_age=max_cell_age)                                  
graphs[0]

To make sure we don't consume too much driver memory, since games can grow large, let's write a function, do_trial, to run the experiment, then when it returns, the games will go out of scope and their memory will be reclaimed. It will use a library function we imported, run_games and the new_game function above to do most of the work.

(You might wonder why we don't create the graphs inside the function. It's essentially impossible to show the grid before the games run and to do the update visualization after it's shown inside one function inside a notebook cell. We have to build the grid, render it separately, then call do_trial.)

In [ ]:

def do_trial(graphs, num_games=1, steps=steps, batch_size=1, game_size_for_each=game_size, pause_between_batches=0.0):
    games = [new_game(game_size_for_each) for _ in range(num_games)]
    return run_games(games, graphs, steps, batch_size, pause_between_batches)
def do_trial(graphs, num_games=1, steps=steps, batch_size=1, game_size_for_each=game_size, pause_between_batches=0.0):
    games = [new_game(game_size_for_each) for _ in range(num_games)]
    return run_games(games, graphs, steps, batch_size, pause_between_batches)

In [ ]:

%time num_games, steps, batch_size, duration = do_trial(graphs, steps=steps, pause_between_batches=0.1)
num_games, steps, batch_size, duration
%time num_games, steps, batch_size, duration = do_trial(graphs, steps=steps, pause_between_batches=0.1)
num_games, steps, batch_size, duration

If you can't see the plot or see it update, click here for a screen shot:

(Want to run longer? Pass a larger value for steps in the previous cell. 1000 takes several minutes, but you'll see interesting patterns develop.)

The first line of output is written by run_games, which is called by do_trial. The next two lines are output from the %time "magic". The fourth line shows the values returned by run_games through do_trial, which we'll use more fully in the exercise below.

How much time did it take? Note that there were steps*0.1 seconds of sleep time between steps, so the rest is compute time. Does that account for the difference between the user time and the wall time?

In [ ]:

steps*0.1
steps*0.1

Yes, this covers most of the extra wall time.

A point's color changed as it lived longer. Here is the color map used, where the top color corresponds to the longest-lived cells.

In [ ]:

show_cmap(cmap=cmap, max_index=max_cell_age)
show_cmap(cmap=cmap, max_index=max_cell_age)

If you can't see the color map in the previous cell output, click here for the color map RdYlBu.

You could experiment with different values for max_cell_age.

Mini Exercise: Change the value passed for use_fixed_cell_sizes to be False (in the cell that calls new_game_of_life_grid). Then rerun the %time do_trial() cell. What happens to the graph?

Running Lots of Games¶

Suppose we wanted to run many of these games at the same time. For example, we might use reinforcement learning to find the initial state that maximizes some reward, like the most live cells after N steps or for immortal games. You could try writing a loop that starts M games and run the previous step loop interleaving games. Let's try that, with smaller grids.

In [ ]:

x_grid = 5
y_grid = 3
shrink_factor = y_grid  # Instead of 1 N-size game, build N/shrinkfactor size games
small_game_size = round(game_size/shrink_factor)
x_grid = 5
y_grid = 3
shrink_factor = y_grid  # Instead of 1 N-size game, build N/shrinkfactor size games
small_game_size = round(game_size/shrink_factor)

First build a grid of graphs, like before:

In [ ]:

gridspace, all_graphs = new_game_of_life_grid(small_game_size, plot_size, x_grid, y_grid, shrink_factor,
                                  bgcolor=bgcolor, cmap=cmap,
                                  use_fixed_cell_sizes=use_fixed_cell_sizes, max_cell_age=max_cell_age)                                  
gridspace
gridspace, all_graphs = new_game_of_life_grid(small_game_size, plot_size, x_grid, y_grid, shrink_factor,
                                  bgcolor=bgcolor, cmap=cmap,
                                  use_fixed_cell_sizes=use_fixed_cell_sizes, max_cell_age=max_cell_age)                                  
gridspace

In [ ]:

%time num_games, steps, batch_size, duration = do_trial(all_graphs, num_games=x_grid*y_grid, steps=steps, batch_size=1, game_size_for_each=small_game_size, pause_between_batches=0.1)
%time num_games, steps, batch_size, duration = do_trial(all_graphs, num_games=x_grid*y_grid, steps=steps, batch_size=1, game_size_for_each=small_game_size, pause_between_batches=0.1)

In [ ]:

num_games, steps, batch_size, duration
num_games, steps, batch_size, duration

If you can't see the plot or see it update, click here for a screen shot:

colored background
white background (captured earlier in the run)

How much time did it take? You can perceive a "wave" across the graphs at each time step, because the games aren't running concurrently. Sometimes, a "spurt" of updates will happen, etc. Not ideal...

There were the same steps*0.1 seconds of sleep time between steps, not dependent on the number of games, so the rest is compute time.

Improving Performance with Ray.¶

Let's start Ray as before in the first lesson.

In [ ]:

ray.init(ignore_reinit_error=True)
ray.init(ignore_reinit_error=True)

Running on your laptop? Click the output of the next cell to open the Ray Dashboard.

If you are running on the Anyscale platform, use the dashboard URL provided to you.

Actors - Ray's Tool for Distributed State¶

Python is an object-oriented language. We often encapsulate bits of state in classes, like we did for State above. Ray leverages this familiar mechanism to manage distributed state.

Recall that adding the @ray.remote annotation to a function turned it into a task. If we use the same annotation on a Python class, we get an actor.

Why "Actor"¶

The Actor Model of Concurrency is almost 50 years old! It's a message-passing model, where autonomous blocks of code, the actors, receive messages from other actors asking them to perform work or return some results. Implementations provide thread safety while the messages are processed, one at a time. This means the user of an actor model implementation doesn't have to worry about writing thread-safe code. Because many messages might arrive while one is being processed, they are stored in a queue and processed one at a time, the order of arrival.

There are many other implementations of the actor model, including Erlang, the first system to create a production-grade implementation, initially used for telecom switches, and Akka, a JVM implementation inspired by Erlang.

Tip: The Ray Package Reference in the Ray Docs is useful for exploring the API features we'll learn.

Let's start by simply making Game an actor. We'll just subclass it and add @ray.remote to the subclass.

There's one other change we have to make; if we want to access the state and rules instances in an Actor, we can't just use mygame.state, for example, as you would normally do for Python instances. Instead, we have to add "getter" methods for them.

Here's our Game actor definition.

In [ ]:

@ray.remote
class RayGame(Game):
    def __init__(self, initial_state, rules):
        super().__init__(initial_state, rules)
        
    def get_states(self):
        return self.states
            
    def get_rules(self):
        return self.rules
@ray.remote
class RayGame(Game):
    def __init__(self, initial_state, rules):
        super().__init__(initial_state, rules)
        
    def get_states(self):
        return self.states
            
    def get_rules(self):
        return self.rules

To construct an instance and call methods, you use .remote as for tasks:

In [ ]:

def new_ray_game(game_size):
    initial_state = State(size = game_size)
    rules = ConwaysRules()
    ray_game_actor = RayGame.remote(initial_state, rules)   # Note that .remote(...) is used to construct the instance.
    return ray_game_actor
def new_ray_game(game_size):
    initial_state = State(size = game_size)
    rules = ConwaysRules()
    ray_game_actor = RayGame.remote(initial_state, rules)   # Note that .remote(...) is used to construct the instance.
    return ray_game_actor

We'll use the following function to try out the implementation, but then take the Ray actor out of scope when we're done. This is because actors remain pinned to a worker as long as the driver (this notebook) has a reference to them. We don't want that wasted space...

In [ ]:

def try_ray_game_actor():
    ray_game_actor = new_ray_game(small_game_size)
    print(f'Actor for game: {ray_game_actor}')
    init_states = ray.get(ray_game_actor.step.remote())
    print(f'\nInitial state:\n{init_states[0]}')
    new_states = ray.get(ray_game_actor.step.remote())
    print(f'\nState after step #1:\n{new_states[0]}')
try_ray_game_actor()
def try_ray_game_actor():
    ray_game_actor = new_ray_game(small_game_size)
    print(f'Actor for game: {ray_game_actor}')
    init_states = ray.get(ray_game_actor.step.remote())
    print(f'\nInitial state:\n{init_states[0]}')
    new_states = ray.get(ray_game_actor.step.remote())
    print(f'\nState after step #1:\n{new_states[0]}')
try_ray_game_actor()

Key Points: To summarize:

Declare an actor by annotating a class with @ray.remote, just like declaring a task from a function.

Add accessor methods for any data members that you need to read or write, because using direct access, such as my_game.state, doesn't work for actors.

Construct actor instances with my_instance = MyClass.remote(...).

Call methods with my_instance.some_method.remote(...).

Use ray.get() and ray.wait() to retrieve results, just like you do for task results.

Tip: If you start getting warnings about lots of Python processes running or you have too many actors scheduled, you can safely ignore these messages for now, but the performance measurements below won't be as accurate.

Okay, now let's repeat our grid experiment with a Ray-enabled Game of Life. Let's define a helper function, do_ray_trial, which is analogous to do_trial above. It encapsulates some of the steps, for the same reasons mentioned above; so that our actors go out of scope and the worker slots are reclaimed when the function call returns.

We call a library function run_ray_games to run these games. It's somewhat complicated, because it uses ray.wait() to process updates as soon as they are available, and also has hooks for batch processing and running without graphing (see below).

We'll create the graphs separately and pass them into do_ray_trial.

In [ ]:

def do_ray_trial(graphs, num_games=1, steps=steps, batch_size=1, game_size_for_each=game_size, pause_between_batches=0.0):
    game_actors = [new_ray_game(game_size_for_each) for _ in range(num_games)]
    return run_ray_games(game_actors, graphs, steps, batch_size, pause_between_batches)
def do_ray_trial(graphs, num_games=1, steps=steps, batch_size=1, game_size_for_each=game_size, pause_between_batches=0.0):
    game_actors = [new_ray_game(game_size_for_each) for _ in range(num_games)]
    return run_ray_games(game_actors, graphs, steps, batch_size, pause_between_batches)

In [ ]:

ray_gridspace, ray_graphs = new_game_of_life_grid(small_game_size, plot_size, x_grid, y_grid, shrink_factor,
                                  bgcolor=bgcolor, cmap=cmap,
                                  use_fixed_cell_sizes=use_fixed_cell_sizes, max_cell_age=max_cell_age)                                  
ray_gridspace
ray_gridspace, ray_graphs = new_game_of_life_grid(small_game_size, plot_size, x_grid, y_grid, shrink_factor,
                                  bgcolor=bgcolor, cmap=cmap,
                                  use_fixed_cell_sizes=use_fixed_cell_sizes, max_cell_age=max_cell_age)                                  
ray_gridspace

In [ ]:

%time do_ray_trial(ray_graphs, num_games=x_grid*y_grid, steps=steps, batch_size=1, game_size_for_each=small_game_size, pause_between_batches=0.1)
%time do_ray_trial(ray_graphs, num_games=x_grid*y_grid, steps=steps, batch_size=1, game_size_for_each=small_game_size, pause_between_batches=0.1)

(Can't see the image? It's basically the same as the previous grid example.)

How did your times compare? For example, using a recent model MacBook Pro laptop, this run took roughly 19 seconds vs. 21 seconds for the previous run without Ray. That's not much of an improvement. Why?

In fact, updating the graphs causes enough overhead to remove most of the speed advantage of using Ray. We also sleep briefly between generations for nicer output. However, using Ray does produce smoother graph updates.

So, if we want to study more performance optimizations, we should remove the graphing overhead, which we'll do for the rest of this lesson.

Let's run the two trials without graphs and compare the performance. We'll use no pauses between "batches" and run the same number of games as the number of CPU (cores) Ray says we have. This is actually the number of workers Ray started for us and 2x the number of actual cores:

In [ ]:

num_cpus_float = ray.cluster_resources()['CPU']
num_cpus_float
num_cpus_float = ray.cluster_resources()['CPU']
num_cpus_float

As soon as you start the next two cell, switch to the Ray Dashboard and watch the CPU utilization. You'll see the Ray workers are idle, because we aren't using them right now, but the total CPU utilization will be about well under 100%. For example, on a four-core laptop, the total CPU utilization will be 20-25% or roughly 1/4th capacity.

Why? We're running the whole computation in the Python process for this notebook, which only utilizes one core.

In [ ]:

%time do_trial(None, num_games=round(num_cpus_float), steps=steps, batch_size=1, game_size_for_each=game_size, pause_between_batches=0.0)
%time do_trial(None, num_games=round(num_cpus_float), steps=steps, batch_size=1, game_size_for_each=game_size, pause_between_batches=0.0)

Now use Ray. Again, as soon as you start the next cell, switch to the Ray Dashboard and watch the CPU utilization. Now, the Ray workers will be utilized (but not 100%) and the total CPU utilization will be higher. You'll probably see 70-80% utilization.

Hence, now we're running on all cores.

In [ ]:

%time do_ray_trial(None, num_games=round(num_cpus_float), steps=steps, batch_size=1, game_size_for_each=game_size, pause_between_batches=0.0)
%time do_ray_trial(None, num_games=round(num_cpus_float), steps=steps, batch_size=1, game_size_for_each=game_size, pause_between_batches=0.0)

So, using Ray does help when running parallel games. On a typical laptop, the performance boost is about 2-3 times better. It's not 15 times better (the number of concurrent games), because the computation is CPU intensive for each game with frequent memory access, so all the available cores are fully utilized. We would see much more impressive improvements on a cluster with a lot of CPU cores when running a massive number of games.

Notice the times for user and total times reported for the non-Ray and Ray runs (which are printed by the %time "magic"). They are only measuring the time for the notebook Python process, i.e., our "driver" program, not the whole application. Without Ray, all the work is done in this process, as we said previously, so the user and total times roughly equal the wall clock time. However, for Ray, these times are very low; the notebook is mostly idle, while the work is done in the separate Ray worker processes.

More about Actors¶

Let's finish with a discussion of additional important information about actors, including recapping some points mentioned above.

Actor Scheduling and Lifetimes¶

For the most part, when Ray runs actor code, it uses the same task mechanisms we discussed in the Ray Tasks lesson. Actor constructor and method invocations work just like task invocations. However, there are a few notable differences:

Once a task finishes, it is removed from the worker that executed it, while an actor is pinned to the worker until all Python references to it in the driver program are out of scope. That is, the usual garbage collection mechanism in Python determines when an actor is no longer needed and is removed from a worker. The reason the actor must remain in memory is because it holds state that might be needed, whereas tasks are stateless.
Currently, each actor instance uses tens of MB of memory overhead. Hence, just as you should avoid having too many fine-grained tasks, you should avoid too many actor instances. (Reducing the overhead per actor is an ongoing improvement project.)

We explore actor scheduling and lifecycles in much greater depth in lesson 03: Ray Internals in the Advanced Ray tutorial.

Durability of Actor State¶

At this time, Ray provides no built-in mechanism for persisting actor state, i.e., writing to disk or a database in case of process failure. Hence, if a worker or whole server goes down with actor instances, their state is lost.

This is an area where Ray will evolve and improve in the future. For now, an important design consideration is to decide when you need to checkpoint state and to use an appropriate mechanism for this purpose. Some of the Ray APIs explored in other tutorials have built-in checkpoint features, such as for saving snapshots of trained models to a file system.

Extra - Does It Help to Run with Larger Batch Sizes?¶

You can read this section but choose to skip running the code for time's sake. The outcomes are discussed at the end.

You'll notice that we defined run_games and do_trial, as well as run_ray_games and do_ray_trial to take an optional batch_size that defaults to 1. The idea is that maybe running game steps in batches, rather than one step at a time, will improve performance (but look less pleasing in the graphs).

This concept works in some contexts, such as minimizing the number of messages sent in networks (that is, fewer, but larger payloads), but it actually doesn't help a lot here, because each game is played in a single process, whether using Ray or not (at least as currently implemented...). Batching reduces the number of method invocations, but it's not an important amount of overhead in our case.

Let's confirm our suspicion about batching, that it doesn't help a lot.

Let's time several batch sizes without and with Ray. We'll run several times with each batch size to get an informal sense of the variation possible.

Once again, watch the Ray Dashboard while the next two code cells run.

In [ ]:

for batch in [1, 10, 25, 50]:
    for run in [0, 1]:
        do_trial(graphs = None, num_games=1, steps=steps, batch_size=batch, game_size_for_each=game_size, pause_between_batches=0.0)
for batch in [1, 10, 25, 50]:
    for run in [0, 1]:
        do_trial(graphs = None, num_games=1, steps=steps, batch_size=batch, game_size_for_each=game_size, pause_between_batches=0.0)

There isn't a significant difference based on batch size.

What about Ray? If we're running just one game, the results should be about the same.

In [ ]:

for batch in [1, 10, 25, 50]:
    for run in [0, 1]:
        do_ray_trial(graphs = None, num_games=1, steps=steps, batch_size=batch, game_size_for_each=game_size, pause_between_batches=0.0)
for batch in [1, 10, 25, 50]:
    for run in [0, 1]:
        do_ray_trial(graphs = None, num_games=1, steps=steps, batch_size=batch, game_size_for_each=game_size, pause_between_batches=0.0)

With Ray's background activity, there is likely to be a little more variation in the numbers, but the conclusion is the same; the batch size doesn't matter because no additional exploitation of asynchronous computing is used.

Exercises¶

When we needed to run multiple games concurrently as fast as possible, Ray was an easy win. If we graphed them while running, the wall-clock time is about the same, due to the graphics overhead, but the graphs updated more smoothly and each one looked independent.

Just as for Ray tasks, actors add some overhead, so there will be a crossing point for small problems where the concurrency provided by Ray won't be as beneficial. This exercise uses a simple actor example to explore this tradeoff.

See the solutions notebook for a discussion of questions posed in this exercise.

Exercise 1¶

Let's investigate Ray Actor performance. Answers to the questions posed here are in the solutions notebook.

Consider the following class and actor, which simulate a busy process using time.sleep():

In [32]:

class Counter:
    """Remember how many times ``next()`` has been called."""
    def __init__(self, pause):
        self.count = 0
        self.pause = pause
    def next(self):
        time.sleep(self.pause)
        self.count += 1
        return self.count
class Counter:
    """Remember how many times ``next()`` has been called."""
    def __init__(self, pause):
        self.count = 0
        self.pause = pause
    def next(self):
        time.sleep(self.pause)
        self.count += 1
        return self.count

In [33]:

@ray.remote
class RayCounter(Counter):
    """Remember how many times ``next()`` has been called."""
    def __init__(self, pause):
        super().__init__(pause)
    def get_count(self):
        return self.count
@ray.remote
class RayCounter(Counter):
    """Remember how many times ``next()`` has been called."""
    def __init__(self, pause):
        super().__init__(pause)
    def get_count(self):
        return self.count

Recall that for an actor we need an accessor method to get the current count.

Here are methods to time them.

In [34]:

def counter_trial(count_to, num_counters = 1, pause = 0.01):
    print('not ray: count_to = {:5d}, num counters = {:4d}, pause = {:5.3f}: '.format(count_to, num_counters, pause), end='')
    start = time.time()
    counters = [Counter(pause) for _ in range(num_counters)]
    for i in range(num_counters):
        for n in range(count_to):
            counters[i].next()
    duration = time.time() - start
    print('time = {:9.5f} seconds'.format(duration))
    return count_to, num_counters, pause, duration
def counter_trial(count_to, num_counters = 1, pause = 0.01):
    print('not ray: count_to = {:5d}, num counters = {:4d}, pause = {:5.3f}: '.format(count_to, num_counters, pause), end='')
    start = time.time()
    counters = [Counter(pause) for _ in range(num_counters)]
    for i in range(num_counters):
        for n in range(count_to):
            counters[i].next()
    duration = time.time() - start
    print('time = {:9.5f} seconds'.format(duration))
    return count_to, num_counters, pause, duration

In [35]:

def ray_counter_trial(count_to, num_counters = 1, pause = 0.01):
    print('ray:     count_to = {:5d}, num counters = {:4d}, pause = {:5.3f}: '.format(count_to, num_counters, pause), end='')
    start = time.time()
    final_count_futures = []
    counters = [RayCounter.remote(pause) for _ in range(num_counters)]
    for i in range(num_counters):
        for n in range(count_to):
            counters[i].next.remote()
        final_count_futures.append(counters[i].get_count.remote())
    ray.get(final_count_futures)  # Discard result, but wait until finished!
    duration = time.time() - start
    print('time = {:9.5f} seconds'.format(duration))
    return count_to, num_counters, pause, duration
def ray_counter_trial(count_to, num_counters = 1, pause = 0.01):
    print('ray:     count_to = {:5d}, num counters = {:4d}, pause = {:5.3f}: '.format(count_to, num_counters, pause), end='')
    start = time.time()
    final_count_futures = []
    counters = [RayCounter.remote(pause) for _ in range(num_counters)]
    for i in range(num_counters):
        for n in range(count_to):
            counters[i].next.remote()
        final_count_futures.append(counters[i].get_count.remote())
    ray.get(final_count_futures)  # Discard result, but wait until finished!
    duration = time.time() - start
    print('time = {:9.5f} seconds'.format(duration))
    return count_to, num_counters, pause, duration

Let's get a sense of what the performance looks like:

In [36]:

count_to = 10
for num_counters in [1, 2, 3, 4]:
    counter_trial(count_to, num_counters, 0.0)
for num_counters in [1, 2, 3, 4]:
    counter_trial(count_to, num_counters, 0.1)
for num_counters in [1, 2, 3, 4]:
    counter_trial(count_to, num_counters, 0.2)
count_to = 10
for num_counters in [1, 2, 3, 4]:
    counter_trial(count_to, num_counters, 0.0)
for num_counters in [1, 2, 3, 4]:
    counter_trial(count_to, num_counters, 0.1)
for num_counters in [1, 2, 3, 4]:
    counter_trial(count_to, num_counters, 0.2)

not ray: count_to =    10, num counters =    1, pause = 0.000: time =   0.00039 seconds
not ray: count_to =    10, num counters =    2, pause = 0.000: time =   0.00007 seconds
not ray: count_to =    10, num counters =    3, pause = 0.000: time =   0.00053 seconds
not ray: count_to =    10, num counters =    4, pause = 0.000: time =   0.00071 seconds
not ray: count_to =    10, num counters =    1, pause = 0.100: time =   1.03061 seconds
not ray: count_to =    10, num counters =    2, pause = 0.100: time =   2.06037 seconds
not ray: count_to =    10, num counters =    3, pause = 0.100: time =   3.07844 seconds
not ray: count_to =    10, num counters =    4, pause = 0.100: time =   4.12418 seconds
not ray: count_to =    10, num counters =    1, pause = 0.200: time =   2.02095 seconds
not ray: count_to =    10, num counters =    2, pause = 0.200: time =   4.04027 seconds
not ray: count_to =    10, num counters =    3, pause = 0.200: time =   6.08269 seconds
not ray: count_to =    10, num counters =    4, pause = 0.200: time =   8.11051 seconds

When there is no sleep pause, the results are almost instaneous. For nonzero pauses, the times scale linearly in the pause size and the number of Counter instances. This is expected, since Counter and counter_trail are completely synchronous.

What about for Ray?

In [37]:

count_to = 10
for num_counters in [1, 2, 3, 4]:
    ray_counter_trial(count_to, num_counters, 0.0)
for num_counters in [1, 2, 3, 4]:
    ray_counter_trial(count_to, num_counters, 0.1)
for num_counters in [1, 2, 3, 4]:
    ray_counter_trial(count_to, num_counters, 0.2)
count_to = 10
for num_counters in [1, 2, 3, 4]:
    ray_counter_trial(count_to, num_counters, 0.0)
for num_counters in [1, 2, 3, 4]:
    ray_counter_trial(count_to, num_counters, 0.1)
for num_counters in [1, 2, 3, 4]:
    ray_counter_trial(count_to, num_counters, 0.2)

ray:     count_to =    10, num counters =    1, pause = 0.000:

2021-10-18 06:45:20,430	INFO services.py:1252 -- View the Ray dashboard at http://127.0.0.1:8265

time =   2.64261 seconds
ray:     count_to =    10, num counters =    2, pause = 0.000: time =   0.75267 seconds
ray:     count_to =    10, num counters =    3, pause = 0.000: time =   0.72335 seconds
ray:     count_to =    10, num counters =    4, pause = 0.000: time =   0.75695 seconds
ray:     count_to =    10, num counters =    1, pause = 0.100: time =   1.71324 seconds
ray:     count_to =    10, num counters =    2, pause = 0.100: time =   1.75841 seconds
ray:     count_to =    10, num counters =    3, pause = 0.100: time =   1.77187 seconds
ray:     count_to =    10, num counters =    4, pause = 0.100: time =   1.78451 seconds
ray:     count_to =    10, num counters =    1, pause = 0.200: time =   2.75683 seconds
ray:     count_to =    10, num counters =    2, pause = 0.200: time =   2.74667 seconds
ray:     count_to =    10, num counters =    3, pause = 0.200: time =   2.77803 seconds
ray:     count_to =    10, num counters =    4, pause = 0.200: time =   2.79148 seconds

Ray has higher overhead, so the zero-pause times for RayCounter are much longer than for Counter, but the times are roughly independent of the number of counters, because the instances are now running in parallel unlike before. However, the times per counter still grow linearly in the pause time and they are very close to the the times per counter for Counter instances. Here's a repeat run to show what we mean:

In [38]:

count_to=10
num_counters = 1 
for pause in range(0,6):
    counter_trial(count_to, num_counters, pause*0.1)
    ray_counter_trial(count_to, num_counters, pause*0.1)
count_to=10
num_counters = 1 
for pause in range(0,6):
    counter_trial(count_to, num_counters, pause*0.1)
    ray_counter_trial(count_to, num_counters, pause*0.1)

not ray: count_to =    10, num counters =    1, pause = 0.000: time =   0.00024 seconds
ray:     count_to =    10, num counters =    1, pause = 0.000: time =   0.72115 seconds
not ray: count_to =    10, num counters =    1, pause = 0.100: time =   1.01609 seconds
ray:     count_to =    10, num counters =    1, pause = 0.100: time =   1.80469 seconds
not ray: count_to =    10, num counters =    1, pause = 0.200: time =   2.01647 seconds
ray:     count_to =    10, num counters =    1, pause = 0.200: time =   2.74465 seconds
not ray: count_to =    10, num counters =    1, pause = 0.300: time =   3.03504 seconds
ray:     count_to =    10, num counters =    1, pause = 0.300: time =   3.73886 seconds
not ray: count_to =    10, num counters =    1, pause = 0.400: time =   4.01807 seconds
ray:     count_to =    10, num counters =    1, pause = 0.400: time =   4.74108 seconds
not ray: count_to =    10, num counters =    1, pause = 0.500: time =   5.01897 seconds
ray:     count_to =    10, num counters =    1, pause = 0.500: time =   5.75326 seconds

Ignoring pause = 0, can you explain why the Ray times are almost, but slightly larger than the non-ray times consistently? Study the implementations for ray_counter_trial and RayCounter. What code is synchronous and blocking vs. concurrent? In fact, is there any code that is actually concurrent when you have just one instance of Counter or RayCounter?

To finish, let's look at the behavior for smaller pause steps, 0.0 to 0.1, and plot the times.

In [39]:

count_to=10
num_counters = 1 
pauses=[]
durations=[]
ray_durations=[]
for pause in range(0,11):
    pauses.append(pause*0.01)
    _, _, _, duration = counter_trial(count_to, num_counters, pause*0.01)
    durations.append(duration)
    _, _, _, duration = ray_counter_trial(count_to, num_counters, pause*0.01)
    ray_durations.append(duration)
count_to=10
num_counters = 1 
pauses=[]
durations=[]
ray_durations=[]
for pause in range(0,11):
    pauses.append(pause*0.01)
    _, _, _, duration = counter_trial(count_to, num_counters, pause*0.01)
    durations.append(duration)
    _, _, _, duration = ray_counter_trial(count_to, num_counters, pause*0.01)
    ray_durations.append(duration)

not ray: count_to =    10, num counters =    1, pause = 0.000: time =   0.00023 seconds
ray:     count_to =    10, num counters =    1, pause = 0.000: time =   0.70282 seconds
not ray: count_to =    10, num counters =    1, pause = 0.010: time =   0.12034 seconds
ray:     count_to =    10, num counters =    1, pause = 0.010: time =   0.82181 seconds
not ray: count_to =    10, num counters =    1, pause = 0.020: time =   0.22484 seconds
ray:     count_to =    10, num counters =    1, pause = 0.020: time =   0.92805 seconds
not ray: count_to =    10, num counters =    1, pause = 0.030: time =   0.32882 seconds
ray:     count_to =    10, num counters =    1, pause = 0.030: time =   1.06352 seconds
not ray: count_to =    10, num counters =    1, pause = 0.040: time =   0.42238 seconds
ray:     count_to =    10, num counters =    1, pause = 0.040: time =   1.13944 seconds
not ray: count_to =    10, num counters =    1, pause = 0.050: time =   0.52658 seconds
ray:     count_to =    10, num counters =    1, pause = 0.050: time =   1.26892 seconds
not ray: count_to =    10, num counters =    1, pause = 0.060: time =   0.63016 seconds
ray:     count_to =    10, num counters =    1, pause = 0.060: time =   1.36237 seconds
not ray: count_to =    10, num counters =    1, pause = 0.070: time =   0.73195 seconds
ray:     count_to =    10, num counters =    1, pause = 0.070: time =   1.48872 seconds
not ray: count_to =    10, num counters =    1, pause = 0.080: time =   0.82932 seconds
ray:     count_to =    10, num counters =    1, pause = 0.080: time =   1.61250 seconds
not ray: count_to =    10, num counters =    1, pause = 0.090: time =   0.92547 seconds
ray:     count_to =    10, num counters =    1, pause = 0.090: time =   1.62813 seconds
not ray: count_to =    10, num counters =    1, pause = 0.100: time =   1.01302 seconds
ray:     count_to =    10, num counters =    1, pause = 0.100: time =   1.73065 seconds

In [40]:

from bokeh_util import two_lines_plot  # utility we used in the previous lesson
from bokeh.plotting import show, figure
from bokeh.layouts import gridplot
from bokeh_util import two_lines_plot  # utility we used in the previous lesson
from bokeh.plotting import show, figure
from bokeh.layouts import gridplot

Loading BokehJS ...

In [41]:

two_lines = two_lines_plot(
    "Pause vs. Execution Times (Smaller Is Better)", 'Pause', 'Time', 'No Ray', 'Ray', 
    pauses, durations, pauses, ray_durations,
    x_axis_type='linear', y_axis_type='linear')
show(two_lines, plot_width=800, plot_height=400)
two_lines = two_lines_plot(
    "Pause vs. Execution Times (Smaller Is Better)", 'Pause', 'Time', 'No Ray', 'Ray', 
    pauses, durations, pauses, ray_durations,
    x_axis_type='linear', y_axis_type='linear')
show(two_lines, plot_width=800, plot_height=400)

(Can't see the plot? Click here for a screen shot.)

Once past zero pauses, the Ray overhead is constant. It doesn't grow with the pause time. Can you explain why it doesn't grow?

Run the next cell when you are finished with this notebook:

In [ ]:

ray.shutdown()  # "Undo ray.init()". Terminate all the processes started in this notebook.
ray.shutdown()  # "Undo ray.init()". Terminate all the processes started in this notebook.

The next lesson, Why Ray?, takes a step back and explores the origin and motivations for Ray, and Ray's growing ecosystem of libraries and tools.