3. Lazy execution mode#
In Ray Data, operations are not executed immediately. Most transformations are lazy, meaning they build up an execution plan rather than running right away.
The execution plan is only executed when you call a method that materializes or consumes the dataset.
To materialize a small subset of the data, you can use the take_batch method.
batch = ds.take_batch(batch_size=3)
batch
Let’s visualize an example image:
img = batch["image"][0]
title = batch["path"][0]
plt.title(title)
plt.axis("off")
plt.imshow(img, cmap='gray')
Note on execution triggering methods in Ray Dataset
To determine if an operation will trigger execution, look for the methods with the ConsumptionAPI decorator in the Dataset.py.
These categories of operations trigger execution (with some examples):
Method designed to consume Datasets for writing:
Method designed to consume Datasets for distributed training:
Methods that attempt to show data, for example:
Aggregations, which attempt to reduce a dataset to a single value per column:
Another way to trigger execution is to explicitly call materialize(). This will execute the underlying plan and generate the entire data blocks onto the cluster’s memory.