API Documentation

Sage inference proxy — OpenAI-compatible chat completions with passkey auth, usage tracking, and credit-based rate limiting.

Overview

Sage is an AI inference proxy deployed as a Cloudflare Worker. It provides OpenAI-compatible /v1/chat/completions and Anthropic-compatible /v1/messages endpoints, proxying requests to DeepSeek, OpenAI, Anthropic, and Groq based on the model name. The chat endpoint auto-detects modality — text, vision, image generation, TTS, and sound effects — routing to the right model without client-side model selection. Authentication uses WebAuthn passkeys (Touch ID, Face ID, security keys) with session cookies and API keys.

Base URL: https://sage-api.devblocktechnologies.com

Provider Routing

Sage automatically routes your request to the correct upstream provider based on the model field in the request body.

Model PrefixUpstream ProviderEndpoint
gpt-*, o1-*, o3-*, o4-*OpenAIapi.openai.com/v1/chat/completions
claude-*Anthropicapi.anthropic.com/v1/messages
llama-*, mixtral-*, gemma-*, deepseek-r1Groqapi.groq.com/openai/v1/chat/completions
deepseek-*DeepSeekapi.deepseek.com/v1/chat/completions

Modality Auto-Detection

The /v1/chat/completions endpoint auto-detects non-text modalities from your messages — no separate endpoints needed. Send a regular chat message and Sage routes to the right backend automatically.

Pattern (last user message)Auto-detected AsBackendCredit Cost
"generate an image of...", "draw a...", "create a picture..."Text-to-imageCF Workers AI (SDXL)0.5
"speak...", "say...", "read aloud...", "narrate..."Text-to-speechElevenLabs1.0
"generate a sound of...", "create a sound effect of...", "make an sfx of..."Sound effectElevenLabs1.0
Message contains image_url content partsVisionGPT-4o (auto-routed)per-token

The response comes back in standard chat.completion JSON format with the generated media as a data URI or descriptive markdown. You can still use the explicit /v1/images/generations, /v1/audio/speech, and /v1/audio/generation endpoints for fine-grained control.

Models & Pricing

Sage offers a credit-based pricing model. Each model has a per-request credit cost. See the billing page for current credit bundles.

ModelProviderCost
deepseek-v4-flashDeepSeek0.0004 credits/token
deepseek-v4-proDeepSeek0.002 credits/token
deepseek-reasonerDeepSeek0.002 credits/token
claude-sonnet-4-20250514Anthropic0.006 credits/token
claude-haiku-3.5Anthropic0.002 credits/token
o3-miniOpenAI0.004 credits/token
gpt-4oOpenAI0.01 credits/token
gpt-4o-miniOpenAI0.0008 credits/token
llama-3.3-70b-versatileGroq0.0008 credits/token
llama-3.1-8b-instantGroq0.0004 credits/token

POST /v1/images/generations Public

Generate images from text prompts. Powered by Cloudflare Workers AI using Stable Diffusion XL Lightning. Requires API key auth. Free tier: 10k neurons/day. 0.5 credits per generation.

POST /v1/images/generations

Request body:

curl -X POST https://sage-api.devblocktechnologies.com/v1/images/generations \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
  "model": "cf/stable-diffusion-xl-lightning",
  "prompt": "A serene mountain lake at sunset",
  "n": 1
}'

Response

{
  "data": [
    { "url": "https://..." }
  ]
}

POST /v1/images/analysis Public

Analyze images using vision models. Powered by Cloudflare Workers AI (Mistral Small 3.1 24B). Uses the OpenAI vision format — pass image_url content parts in messages. Requires API key auth. 0.5 credits per analysis.

POST /v1/images/analysis

Request:

curl -X POST https://sage-api.devblocktechnologies.com/v1/images/analysis \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
  "model": "cf/mistral-small-3.1-24b",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What's in this image?" },
        { "type": "image_url", "image_url": { "url": "https://example.com/photo.jpg" } }
      ]
    }
  ]
}'

Response

{
  "id": "chatcmpl-...",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "The image shows a mountain range..."
    }
  }]
}

POST /v1/audio/speech Public

Convert text to speech. Powered by ElevenLabs (Adam voice, eleven_multilingual_v2). Returns an audio/mpeg binary stream. Requires API key auth. 1 credit per request.

POST /v1/audio/speech

Request body:

curl -X POST https://sage-api.devblocktechnologies.com/v1/audio/speech \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
  "input": "Hello, welcome to Sage Inference API.",
  "voice": "adam",
  "model": "eleven_multilingual_v2"
}' \
  --output speech.mp3

Response

Binary audio stream (audio/mpeg)

POST /v1/audio/generation Public

Generate sound effects from text descriptions. Powered by ElevenLabs sound generation API. Returns an audio/mpeg binary stream. Requires API key auth. 1 credit per request.

POST /v1/audio/generation

Request body:

curl -X POST https://sage-api.devblocktechnologies.com/v1/audio/generation \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
  "text": "Rain falling on a tin roof with distant thunder",
  "duration_seconds": 5
}' \
  --output effect.mp3

Response

Binary audio stream (audio/mpeg)

POST /v1/audio/transcriptions Public

Transcribe speech to text. Powered by Cloudflare Workers AI (OpenAI Whisper). Accepts multipart/form-data with an audio file or JSON with base64-encoded audio. Requires API key auth. 0.5 credits per transcription.

POST /v1/audio/transcriptions

Request (multipart/form-data):

curl -X POST https://sage-api.devblocktechnologies.com/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-..." \
  -F "file=@speech.mp3"

Or JSON with base64 audio:

curl -X POST https://sage-api.devblocktechnologies.com/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
  "audio": "//uQx...base64...",
  "model": "whisper-1"
}'

Response

{
  "text": "Hello, this is a test of the transcription service."
}

Error Codes

Sage returns standard HTTP status codes and JSON error responses.

CodeDescription
400Bad request — invalid body or parameters
401Unauthorized — missing or invalid API key or session
402Payment Required — insufficient credits
429Too Many Requests — rate limit exceeded
500Internal server error — upstream provider failure

Device Codes

Use device codes to authenticate desktop and CLI applications. Generate a code from your dashboard and enter it in the app.

# From the dashboard, click "Generate device code"
# Enter the 8-character code in your desktop app:
sage auth ********

GET /health Public

Health check endpoint. Returns database connectivity status.

curl /health

Response

{"ok":true,"uptime":1712345678000}