Intro to Ray Data: Ray Data + Unstructured Data#
© 2025, Anyscale. All Rights Reserved
💻 Launch Locally: You can run this notebook locally, but performance will be reduced.
🚀 Launch on Cloud: A Ray Cluster (Click here to easily start a Ray cluster on Anyscale) is recommended to run this notebook.
This notebook will provide an overview of Ray Data and how to use it to read, transform and write data in a distributed manner.
Here is the roadmap for this notebook:
- When and why to use Ray Data?
- How to work with Ray Data
- Loading data
- Ray Data key concepts
- Lazy execution mode
- Transforming data
- Stateful transformations with Ray Actors
- Materializing data
- Data operations: grouping, aggregation, and shuffling
- Persisting data
- Ray Data in production
Imports
import subprocess
import torch
import matplotlib.pyplot as plt
import numpy as np
from torchvision.transforms import Compose, ToTensor, Normalize
import ray