/responses [Beta]
LiteLLM provides a BETA endpoint in the spec of OpenAI's /responses API
| Feature | Supported | Notes |
|---|---|---|
| Cost Tracking | ✅ | Works with all supported models |
| Logging | ✅ | Works across all integrations |
| End-user Tracking | ✅ | |
| Streaming | ✅ | |
| Fallbacks | ✅ | Works between supported models |
| Loadbalancing | ✅ | Works between supported models |
| Supported LiteLLM Versions | 1.63.8+ | |
| Supported LLM providers | openai |
Usage​
Create a model response​
- LiteLLM SDK
- OpenAI SDK with LiteLLM Proxy
Non-streaming​
import litellm
# Non-streaming response
response = litellm.responses(
model="o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100
)
print(response)
Streaming​
import litellm
# Streaming response
response = litellm.responses(
model="o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True
)
for event in response:
print(event)
First, add this to your litellm proxy config.yaml:
model_list:
- model_name: o1-pro
litellm_params:
model: openai/o1-pro
api_key: os.environ/OPENAI_API_KEY
Start your LiteLLM proxy:
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
Then use the OpenAI SDK pointed to your proxy:
Non-streaming​
from openai import OpenAI
# Initialize client with your proxy URL
client = OpenAI(
base_url="http://localhost:4000", # Your proxy URL
api_key="your-api-key" # Your proxy API key
)
# Non-streaming response
response = client.responses.create(
model="o1-pro",
input="Tell me a three sentence bedtime story about a unicorn."
)
print(response)
Streaming​
from openai import OpenAI
# Initialize client with your proxy URL
client = OpenAI(
base_url="http://localhost:4000", # Your proxy URL
api_key="your-api-key" # Your proxy API key
)
# Streaming response
response = client.responses.create(
model="o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True
)
for event in response:
print(event)