API Documentation
Sage inference proxy — OpenAI-compatible chat completions with passkey auth, usage tracking, and credit-based rate limiting.
Overview
Sage is an AI inference proxy deployed as a Cloudflare Worker. It provides OpenAI-compatible /v1/chat/completions and Anthropic-compatible /v1/messages endpoints, proxying requests to DeepSeek, OpenAI, Anthropic, and Groq based on the model name. The chat endpoint auto-detects modality — text, vision, image generation, TTS, and sound effects — routing to the right model without client-side model selection. Authentication uses WebAuthn passkeys (Touch ID, Face ID, security keys) with session cookies and API keys.
Base URL: https://sage-api.devblocktechnologies.com
Provider Routing
Sage automatically routes your request to the correct upstream provider based on the model field in the request body.
| Model Prefix | Upstream Provider | Endpoint |
|---|---|---|
gpt-*, o1-*, o3-*, o4-* | OpenAI | api.openai.com/v1/chat/completions |
claude-* | Anthropic | api.anthropic.com/v1/messages |
llama-*, mixtral-*, gemma-*, deepseek-r1 | Groq | api.groq.com/openai/v1/chat/completions |
deepseek-* | DeepSeek | api.deepseek.com/v1/chat/completions |
Modality Auto-Detection
The /v1/chat/completions endpoint auto-detects non-text modalities from your messages — no separate endpoints needed. Send a regular chat message and Sage routes to the right backend automatically.
| Pattern (last user message) | Auto-detected As | Backend | Credit Cost |
|---|---|---|---|
"generate an image of...", "draw a...", "create a picture..." | Text-to-image | CF Workers AI (SDXL) | 0.5 |
"speak...", "say...", "read aloud...", "narrate..." | Text-to-speech | ElevenLabs | 1.0 |
"generate a sound of...", "create a sound effect of...", "make an sfx of..." | Sound effect | ElevenLabs | 1.0 |
Message contains image_url content parts | Vision | GPT-4o (auto-routed) | per-token |
The response comes back in standard chat.completion JSON format with the generated media as a data URI or descriptive markdown. You can still use the explicit /v1/images/generations, /v1/audio/speech, and /v1/audio/generation endpoints for fine-grained control.
Models & Pricing
Sage offers a credit-based pricing model. Each model has a per-request credit cost. See the billing page for current credit bundles.
| Model | Provider | Cost |
|---|---|---|
deepseek-v4-flash | DeepSeek | 0.0004 credits/token |
deepseek-v4-pro | DeepSeek | 0.002 credits/token |
deepseek-reasoner | DeepSeek | 0.002 credits/token |
claude-sonnet-4-20250514 | Anthropic | 0.006 credits/token |
claude-haiku-3.5 | Anthropic | 0.002 credits/token |
o3-mini | OpenAI | 0.004 credits/token |
gpt-4o | OpenAI | 0.01 credits/token |
gpt-4o-mini | OpenAI | 0.0008 credits/token |
llama-3.3-70b-versatile | Groq | 0.0008 credits/token |
llama-3.1-8b-instant | Groq | 0.0004 credits/token |
POST /v1/images/generations Public
Generate images from text prompts. Powered by Cloudflare Workers AI using Stable Diffusion XL Lightning. Requires API key auth. Free tier: 10k neurons/day. 0.5 credits per generation.
Request body:
curl -X POST https://sage-api.devblocktechnologies.com/v1/images/generations \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{
"model": "cf/stable-diffusion-xl-lightning",
"prompt": "A serene mountain lake at sunset",
"n": 1
}'
Response
{
"data": [
{ "url": "https://..." }
]
}
POST /v1/images/analysis Public
Analyze images using vision models. Powered by Cloudflare Workers AI (Mistral Small 3.1 24B). Uses the OpenAI vision format — pass image_url content parts in messages. Requires API key auth. 0.5 credits per analysis.
Request:
curl -X POST https://sage-api.devblocktechnologies.com/v1/images/analysis \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{
"model": "cf/mistral-small-3.1-24b",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What's in this image?" },
{ "type": "image_url", "image_url": { "url": "https://example.com/photo.jpg" } }
]
}
]
}'
Response
{
"id": "chatcmpl-...",
"choices": [{
"message": {
"role": "assistant",
"content": "The image shows a mountain range..."
}
}]
}
POST /v1/audio/speech Public
Convert text to speech. Powered by ElevenLabs (Adam voice, eleven_multilingual_v2). Returns an audio/mpeg binary stream. Requires API key auth. 1 credit per request.
Request body:
curl -X POST https://sage-api.devblocktechnologies.com/v1/audio/speech \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{
"input": "Hello, welcome to Sage Inference API.",
"voice": "adam",
"model": "eleven_multilingual_v2"
}' \
--output speech.mp3
Response
Binary audio stream (audio/mpeg)
POST /v1/audio/generation Public
Generate sound effects from text descriptions. Powered by ElevenLabs sound generation API. Returns an audio/mpeg binary stream. Requires API key auth. 1 credit per request.
Request body:
curl -X POST https://sage-api.devblocktechnologies.com/v1/audio/generation \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{
"text": "Rain falling on a tin roof with distant thunder",
"duration_seconds": 5
}' \
--output effect.mp3
Response
Binary audio stream (audio/mpeg)
POST /v1/audio/transcriptions Public
Transcribe speech to text. Powered by Cloudflare Workers AI (OpenAI Whisper). Accepts multipart/form-data with an audio file or JSON with base64-encoded audio. Requires API key auth. 0.5 credits per transcription.
Request (multipart/form-data):
curl -X POST https://sage-api.devblocktechnologies.com/v1/audio/transcriptions \
-H "Authorization: Bearer sk-..." \
-F "file=@speech.mp3"
Or JSON with base64 audio:
curl -X POST https://sage-api.devblocktechnologies.com/v1/audio/transcriptions \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{
"audio": "//uQx...base64...",
"model": "whisper-1"
}'
Response
{
"text": "Hello, this is a test of the transcription service."
}
Error Codes
Sage returns standard HTTP status codes and JSON error responses.
| Code | Description |
|---|---|
400 | Bad request — invalid body or parameters |
401 | Unauthorized — missing or invalid API key or session |
402 | Payment Required — insufficient credits |
429 | Too Many Requests — rate limit exceeded |
500 | Internal server error — upstream provider failure |
Device Codes
Use device codes to authenticate desktop and CLI applications. Generate a code from your dashboard and enter it in the app.
# From the dashboard, click "Generate device code"
# Enter the 8-character code in your desktop app:
sage auth ********
GET /health Public
Health check endpoint. Returns database connectivity status.
curl /health
Response
{"ok":true,"uptime":1712345678000}