API Documentation

Sage inference proxy — OpenAI-compatible chat completions with passkey auth, usage tracking, and credit-based rate limiting.

Overview

Sage is an AI inference proxy deployed as a Cloudflare Worker. It provides OpenAI-compatible /v1/chat/completions and Anthropic-compatible /v1/messages endpoints, proxying requests to DeepSeek, OpenAI, Anthropic, and Groq based on the model name. The chat endpoint auto-detects modality — text, vision, image generation, TTS, and sound effects — routing to the right model without client-side model selection. Authentication uses WebAuthn passkeys (Touch ID, Face ID, security keys) with session cookies and API keys.

Base URL: https://sage-api.devblocktechnologies.com

Provider Routing

Sage automatically routes your request to the correct upstream provider based on the model field in the request body.

Model Prefix	Upstream Provider	Endpoint
`gpt-`, `o1-`, `o3-`, `o4-`	OpenAI	`api.openai.com/v1/chat/completions`
`claude-*`	Anthropic	`api.anthropic.com/v1/messages`
`llama-`, `mixtral-`, `gemma-*`, `deepseek-r1`	Groq	`api.groq.com/openai/v1/chat/completions`
`deepseek-*`	DeepSeek	`api.deepseek.com/v1/chat/completions`

Modality Auto-Detection

The /v1/chat/completions endpoint auto-detects non-text modalities from your messages — no separate endpoints needed. Send a regular chat message and Sage routes to the right backend automatically.

Pattern (last user message)	Auto-detected As	Backend	Credit Cost
`"generate an image of..."`, `"draw a..."`, `"create a picture..."`	Text-to-image	CF Workers AI (SDXL)	0.5
`"speak..."`, `"say..."`, `"read aloud..."`, `"narrate..."`	Text-to-speech	ElevenLabs	1.0
`"generate a sound of..."`, `"create a sound effect of..."`, `"make an sfx of..."`	Sound effect	ElevenLabs	1.0
Message contains `image_url` content parts	Vision	GPT-4o (auto-routed)	per-token

The response comes back in standard chat.completion JSON format with the generated media as a data URI or descriptive markdown. You can still use the explicit /v1/images/generations, /v1/audio/speech, and /v1/audio/generation endpoints for fine-grained control.

Models & Pricing

Sage offers a credit-based pricing model. Each model has a per-request credit cost. See the billing page for current credit bundles.

Model	Provider	Cost
`deepseek-v4-flash`	DeepSeek	0.0004 credits/token
`deepseek-v4-pro`	DeepSeek	0.002 credits/token
`deepseek-reasoner`	DeepSeek	0.002 credits/token
`claude-sonnet-4-20250514`	Anthropic	0.006 credits/token
`claude-haiku-3.5`	Anthropic	0.002 credits/token
`o3-mini`	OpenAI	0.004 credits/token
`gpt-4o`	OpenAI	0.01 credits/token
`gpt-4o-mini`	OpenAI	0.0008 credits/token
`llama-3.3-70b-versatile`	Groq	0.0008 credits/token
`llama-3.1-8b-instant`	Groq	0.0004 credits/token

POST /v1/images/generations Public

Generate images from text prompts. Powered by Cloudflare Workers AI using Stable Diffusion XL Lightning. Requires API key auth. Free tier: 10k neurons/day. 0.5 credits per generation.

POST /v1/images/generations

Request body:

curl -X POST https://sage-api.devblocktechnologies.com/v1/images/generations \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
  "model": "cf/stable-diffusion-xl-lightning",
  "prompt": "A serene mountain lake at sunset",
  "n": 1
}'

Response

{
  "data": [
    { "url": "https://..." }
  ]
}

POST /v1/images/analysis Public

Analyze images using vision models. Powered by Cloudflare Workers AI (Mistral Small 3.1 24B). Uses the OpenAI vision format — pass image_url content parts in messages. Requires API key auth. 0.5 credits per analysis.

POST /v1/images/analysis

Request:

curl -X POST https://sage-api.devblocktechnologies.com/v1/images/analysis \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
  "model": "cf/mistral-small-3.1-24b",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What's in this image?" },
        { "type": "image_url", "image_url": { "url": "https://example.com/photo.jpg" } }
      ]
    }
  ]
}'

Response

{
  "id": "chatcmpl-...",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "The image shows a mountain range..."
    }
  }]
}

POST /v1/audio/speech Public

Convert text to speech. Powered by ElevenLabs (Adam voice, eleven_multilingual_v2). Returns an audio/mpeg binary stream. Requires API key auth. 1 credit per request.

POST /v1/audio/speech

Request body:

curl -X POST https://sage-api.devblocktechnologies.com/v1/audio/speech \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
  "input": "Hello, welcome to Sage Inference API.",
  "voice": "adam",
  "model": "eleven_multilingual_v2"
}' \
  --output speech.mp3

Response

Binary audio stream (audio/mpeg)

POST /v1/audio/generation Public

Generate sound effects from text descriptions. Powered by ElevenLabs sound generation API. Returns an audio/mpeg binary stream. Requires API key auth. 1 credit per request.

POST /v1/audio/generation

Request body:

curl -X POST https://sage-api.devblocktechnologies.com/v1/audio/generation \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
  "text": "Rain falling on a tin roof with distant thunder",
  "duration_seconds": 5
}' \
  --output effect.mp3

Response

Binary audio stream (audio/mpeg)

POST /v1/audio/transcriptions Public

Transcribe speech to text. Powered by Cloudflare Workers AI (OpenAI Whisper). Accepts multipart/form-data with an audio file or JSON with base64-encoded audio. Requires API key auth. 0.5 credits per transcription.

POST /v1/audio/transcriptions

Request (multipart/form-data):

curl -X POST https://sage-api.devblocktechnologies.com/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-..." \
  -F "file=@speech.mp3"

Or JSON with base64 audio:

curl -X POST https://sage-api.devblocktechnologies.com/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
  "audio": "//uQx...base64...",
  "model": "whisper-1"
}'

Response

{
  "text": "Hello, this is a test of the transcription service."
}

Error Codes

Sage returns standard HTTP status codes and JSON error responses.

Code	Description
`400`	Bad request — invalid body or parameters
`401`	Unauthorized — missing or invalid API key or session
`402`	Payment Required — insufficient credits
`429`	Too Many Requests — rate limit exceeded
`500`	Internal server error — upstream provider failure

Device Codes

Use device codes to authenticate desktop and CLI applications. Generate a code from your dashboard and enter it in the app.

# From the dashboard, click "Generate device code"
# Enter the 8-character code in your desktop app:
sage auth ********

GET /health Public

Health check endpoint. Returns database connectivity status.

curl /health

Response

{"ok":true,"uptime":1712345678000}