Quickstart
Get Kalibr routing your model calls in 5 minutes. Four steps.
Using a coding agent? Works with Claude Code, Cursor, Windsurf, OpenClaw, Hermes, or any agent that can read URLs. Copy this one line:
Read https://kalibr.systems/llms.txt and integrate Kalibr into this project.
Your agent reads the setup reference and handles install, credentials, and instrumentation automatically. Agents can also discover Kalibr autonomously via kalibr.systems/llms.txt, kalibr.systems/setup.txt, and kalibr.systems/.well-known/agent.json.
1 Install
pip install kalibr
npm install @kalibr/sdk
You also need provider SDKs for the models you want to route between:
# Install whichever providers you'll use
pip install openai # for gpt-4o, o1, etc.
pip install anthropic # for claude-sonnet, etc.
2 Set your keys
Option A — Link via terminal (recommended)
kalibr auth
# Opens your browser. Sign in or create an account, enter the code shown in your terminal.
# KALIBR_API_KEY and KALIBR_TENANT_ID saved to .env automatically.
Option B — Manual setup
Get your Kalibr credentials from dashboard.kalibr.systems/settings, then set them alongside your provider keys:
export KALIBR_API_KEY=sk_... # from your Kalibr dashboard
export KALIBR_TENANT_ID=your-tenant # from your Kalibr dashboard
export OPENAI_API_KEY=sk-... # if using OpenAI models
export ANTHROPIC_API_KEY=sk-ant-... # if using Anthropic models
export DEEPSEEK_API_KEY=sk-... # if using DeepSeek models (deepseek-chat, deepseek-reasoner)
export HF_API_TOKEN=hf_... # if using HuggingFace models (private or rate-limit bypass)
You need API keys for each provider in your paths. Using gpt-4o? You need OPENAI_API_KEY. Using claude-sonnet-4-20250514? You need ANTHROPIC_API_KEY.
3 Replace your LLM call
Before — hardcoded to one model:
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Extract the company: Hi from Stripe"}]
)
print(response.choices[0].message.content)
After — Kalibr picks the best model and learns from outcomes:
from kalibr import Router
router = Router(
goal="extract_company",
paths=["gpt-4o", "claude-sonnet-4-20250514"],
success_when=lambda output: len(output) > 0
)
response = router.completion(
messages=[{"role": "user", "content": "Extract the company: Hi from Stripe"}]
)
print(response.choices[0].message.content)
# That's it. Kalibr picked the model, made the call, and reported the outcome.
Before — hardcoded to one model:
import OpenAI from 'openai';
const client = new OpenAI();
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Extract the company: Hi from Stripe' }],
});
After — Kalibr picks the best model and learns from outcomes:
import { Router } from '@kalibr/sdk';
const router = new Router({
goal: 'extract_company',
paths: ['gpt-4o', 'claude-sonnet-4-20250514'],
successWhen: (output) => output.length > 0,
});
const response = await router.completion([
{ role: 'user', content: 'Extract the company: Hi from Stripe' }
]);
console.log(response.choices[0].message.content);
// That's it. Kalibr picked the model, made the call, and reported the outcome.
What changed: you swapped 3 lines. Router instead of OpenAI, router.completion() instead of client.chat.completions.create(). The response object is the same — response.choices[0].message.content works exactly like before.
success_when tells Kalibr how to auto-evaluate each response. For simple checks (non-empty, contains "@", valid JSON), this is all you need. For complex validation, skip success_when and call router.report() manually — see step 4.
Want finer-grained quality signals? Add score_when for continuous scoring (0.0-1.0). This lets Kalibr distinguish between "barely passed" and "excellent" — routing toward higher quality, not just success:
router = Router(
goal="extract_company",
paths=["gpt-4o", "claude-sonnet-4-20250514"],
success_when=lambda output: len(output) > 0,
score_when=lambda output: min(1.0, len(output) / 500), # quality score 0-1
)
Auto-scoring: When you omit both success_when and score_when, Kalibr still auto-scores every completion using built-in heuristics (response length, structure, finish reason). You get routing intelligence from day one with zero evaluation code. Add success_when or score_when when you want custom quality signals.
4 Check your dashboard
Go to dashboard.kalibr.systems. You should see:
- Your goal (
extract_company) registered - Which model Kalibr routed to
- The outcome (success/failure)
After 20-50 outcomes per path, routing stabilizes and Kalibr will favor the model that works best for your goal. Early on, it explores both paths to gather data.
That's it. You're routing.
Every call now goes through Kalibr. When a provider degrades, Kalibr reroutes to the one that's working — before your users notice.
Manual outcome reporting
If your success criteria is too complex for a lambda (needs API calls, multi-step checks, human review), skip success_when and call report() yourself:
router = Router(
goal="book_meeting",
paths=["gpt-4o", "claude-sonnet-4-20250514"]
# no success_when — we'll report manually
)
response = router.completion(messages=[...])
result = response.choices[0].message.content
# ... your validation logic ...
meeting_booked = check_calendar(result)
if meeting_booked:
router.report(success=True, score=0.9)
else:
router.report(success=False, score=0.1, reason="meeting not found in calendar")
const router = new Router({
goal: 'book_meeting',
paths: ['gpt-4o', 'claude-sonnet-4-20250514']
// no successWhen — we'll report manually
});
const response = await router.completion(messages);
const result = response.choices[0].message.content;
// ... your validation logic ...
const meetingBooked = await checkCalendar(result);
if (meetingBooked) {
await router.report(true);
} else {
await router.report(false, 'meeting not found in calendar');
}
Using a framework?
Kalibr integrates with LangChain, CrewAI, and OpenAI Agents SDK. See Framework Integrations for setup instructions.
Or if you just want tracing without routing, add one line to the top of your entry point:
import kalibr # must be the first import — auto-patches OpenAI, Anthropic, Google SDKs
All LLM calls are now traced automatically to your dashboard. No other code changes needed.
What just happened
- Kalibr registered your two paths (gpt-4o and claude-sonnet)
- On
completion(), Kalibr picked a model (exploring early on, exploiting the best one later) - The call went directly to the provider — Kalibr is not a proxy
- On
report()(or viasuccess_when/score_when), Kalibr recorded the outcome - Next time, Kalibr uses Thompson Sampling to make a better routing decision — using both binary success and continuous quality scores when available
Common mistakes
- Missing provider SDK — If you get "No module named 'anthropic'", run
pip install anthropic - Forgetting to report — Kalibr can't learn without outcomes. Always use
success_whenor callreport() - Wrong Python version — Kalibr requires Python 3.10+
- Missing KALIBR_TENANT_ID — Both
KALIBR_API_KEYandKALIBR_TENANT_IDare required
Next steps
- Core Concepts — goals, paths, how the routing algorithm works
- Framework Integrations — LangChain, CrewAI, OpenAI Agents SDK
- API Reference — full Router API
- Production Guide — error handling, multi-turn, threading