Example: Getting Structured JSON Output

Example: Getting Structured JSON Output#

Many applications need consistent, parseable output from LLMs. Ray Serve LLM supports structured output generation, ensuring your model returns data in the exact format you need.

Why Structured Output Matters#

Consistent Format: Guaranteed JSON structure for downstream processing
Integration Ready: Easy to parse and use in applications
Reliability: Reduces parsing errors and improves system robustness
Type Safety: Enforces data types and required fields

Example: Car type description#

Let’s deploy a model. It is recommended to research the performance of your model in structured output benchmarks.

# serve_my_qwen.yaml
applications:
- name: json-mode-app
  route_prefix: "/"
  import_path: ray.serve.llm:build_openai_app
  args:
    llm_configs:
      - model_loading_config:
          model_id: my-qwen
          model_source: Qwen/Qwen2.5-3B-Instruct
        accelerator_type: L4
        ### Uncomment if your model is gated and need your Huggingface Token to access it
        #runtime_env:
        #  env_vars:
        #    HF_TOKEN: <YOUR-TOKEN-HERE>
        engine_kwargs:
          max_model_len: 8192

!serve run serve_my_qwen.yaml --non-blocking

Using Structured Output#

Now let’s test our structured output model with some product descriptions:

#json_method1.py
from openai import OpenAI
from pydantic import BaseModel
from enum import Enum

client = OpenAI(base_url="http://localhost:8000/v1", api_key="FAKE_KEY")

# (Optional) We use Pydantic model to handle schema definition/validation
class CarType(str, Enum):
    sedan = "sedan"
    suv = "SUV"
    truck = "Truck"
    coupe = "Coupe"

class CarDescription(BaseModel):
    brand: str
    model: str
    car_type: CarType

# 1. Define your schema
json_schema = CarDescription.model_json_schema()

# 2. Send a request
response = client.chat.completions.create(
    model="my-qwen",
    messages=[
        {
            "role": "user",
            "content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's",
        }
    ],
    # 3. Set `response_format` of type `json_schema`
    response_format= {
        "type": "json_schema",
        # 4. Provide `name`and `schema` (both required)
        "json_schema": {
            "name": "car-description", # arbitrary
            "schema": json_schema # your JSON schema
        },
    }
)

print(response.choices[0].message.content)

Expected Output#

The model will return a consistent JSON structure like:

{
  "brand": "Lexus",
  "model": "IS F",
  "car_type": "SUV"
}

Shutdown

!serve shutdown -y

Key Benefits#

Guaranteed Structure: Always returns valid JSON matching your schema
Type Safety: Enforces data types (strings, numbers, arrays)
Required Fields: Ensures all specified fields are present
Easy Integration: Directly usable in applications without parsing

Learn More#

For comprehensive structured output guides, see:

LLM deployment with structured output on Anyscale - Complete guide with all output formats
Request structured output (vLLM documentation) - Complete guide on vLLM API for structured outputs