0. What is Ray Data?

0. What is Ray Data?#

Ray Data is a distributed data processing library that provides a Python API for parallel data processing.

It is built on top of Ray, a fast and simple framework for building and running distributed applications. Ray Data is designed to be easy to use, scalable, and fault-tolerant.

1. How to Use Ray Data?#

You typically should use the Ray Data API in this way:

  1. Create a Ray Dataset from external storage or in-memory data.

  2. Apply transformations to the data.

  3. Write the outputs to external storage or feed the outputs to training workers.