Key Concepts

Deployments

Deployments are the central concept in Ray Serve. They allow you to define and update your business logic or models that will handle incoming requests as well as how this is exposed over HTTP or in Python.

A deployment is defined using @serve.deployment <ray.serve.api.deployment> on a Python class (or function for simple use cases). You can specify arguments to be passed to the constructor when you call Deployment.deploy(), shown below.

@serve.deployment
class MyFirstDeployment:
  # Take the message to return as an argument to the constructor.
  def __init__(self, msg):
      self.msg = msg

  def __call__(self, request):
      return self.msg

  def other_method(self, arg):
      return self.msg

MyFirstDeployment.deploy("Hello world!")

Deployments can be exposed in two ways: over HTTP or in Python via the servehandle-api. By default, HTTP requests will be forwarded to the __call__ method of the class (or the function) and a Starlette Request object will be the sole argument. You can also define a deployment that wraps a FastAPI app for more flexible handling of HTTP requests. See serve-fastapi-http for details.

Replicas

A deployment consists of a number of replicas, which are individual copies of the function or class that are started in separate Ray Actors (processes).