Example: Getting Structured JSON Output#
Many applications need consistent, parseable output from LLMs. Ray Serve LLM supports structured output generation, ensuring your model returns data in the exact format you need.
Why Structured Output Matters#
Consistent Format: Guaranteed JSON structure for downstream processing
Integration Ready: Easy to parse and use in applications
Reliability: Reduces parsing errors and improves system robustness
Type Safety: Enforces data types and required fields
Example: Car type description#
Let’s deploy a model. It is recommended to research the performance of your model in structured output benchmarks.
# serve_my_qwen.yaml
applications:
- name: json-mode-app
route_prefix: "/"
import_path: ray.serve.llm:build_openai_app
args:
llm_configs:
- model_loading_config:
model_id: my-qwen
model_source: Qwen/Qwen2.5-3B-Instruct
accelerator_type: L4
### Uncomment if your model is gated and need your Huggingface Token to access it
#runtime_env:
# env_vars:
# HF_TOKEN: <YOUR-TOKEN-HERE>
engine_kwargs:
max_model_len: 8192
!serve run serve_my_qwen.yaml --non-blocking
Using Structured Output#
Now let’s test our structured output model with some product descriptions:
#json_method1.py
from openai import OpenAI
from pydantic import BaseModel
from enum import Enum
client = OpenAI(base_url="http://localhost:8000/v1", api_key="FAKE_KEY")
# (Optional) We use Pydantic model to handle schema definition/validation
class CarType(str, Enum):
sedan = "sedan"
suv = "SUV"
truck = "Truck"
coupe = "Coupe"
class CarDescription(BaseModel):
brand: str
model: str
car_type: CarType
# 1. Define your schema
json_schema = CarDescription.model_json_schema()
# 2. Send a request
response = client.chat.completions.create(
model="my-qwen",
messages=[
{
"role": "user",
"content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's",
}
],
# 3. Set `response_format` of type `json_schema`
response_format= {
"type": "json_schema",
# 4. Provide `name`and `schema` (both required)
"json_schema": {
"name": "car-description", # arbitrary
"schema": json_schema # your JSON schema
},
}
)
print(response.choices[0].message.content)
Expected Output#
The model will return a consistent JSON structure like:
{
"brand": "Lexus",
"model": "IS F",
"car_type": "SUV"
}
Shutdown
!serve shutdown -y
Key Benefits#
Guaranteed Structure: Always returns valid JSON matching your schema
Type Safety: Enforces data types (strings, numbers, arrays)
Required Fields: Ensures all specified fields are present
Easy Integration: Directly usable in applications without parsing
Learn More#
For comprehensive structured output guides, see:
LLM deployment with structured output on Anyscale - Complete guide with all output formats
Request structured output (vLLM documentation) - Complete guide on vLLM API for structured outputs