0. What is Ray Data?#
Ray Data is a distributed data processing library that provides a Python API for parallel data processing.
It is built on top of Ray, a fast and simple framework for building and running distributed applications. Ray Data is designed to be easy to use, scalable, and fault-tolerant.
1. How to Use Ray Data?#
You typically should use the Ray Data API in this way:
Create a Ray Dataset from external storage or in-memory data.
Apply transformations to the data.
Write the outputs to external storage or feed the outputs to training workers.