Quickstart

Get Kalibr routing your LLM calls in 5 minutes. Four steps.

1. Install

Python:

shell
pip install kalibr

TypeScript:

shell
npm install @kalibr/sdk

You also need the provider SDKs for whichever models you want to route between:

shell
# Install whichever providers you'll use
pip install openai        # for gpt-4o, o1, etc.
pip install anthropic     # for claude-sonnet, etc.

2. Set your keys

Option A: Link via terminal (recommended)

shell
kalibr auth
# Opens your browser. Sign in or create an account, enter the code shown in your terminal.
# KALIBR_API_KEY and KALIBR_TENANT_ID saved to .env automatically.

Option B: Manual setup

Get your Kalibr credentials from dashboard.kalibr.systems/settings, then set them alongside your provider keys:

shell
export KALIBR_API_KEY=sk_...           # from your Kalibr dashboard
export KALIBR_TENANT_ID=your-tenant    # from your Kalibr dashboard
export OPENAI_API_KEY=sk-...           # if using OpenAI models
export ANTHROPIC_API_KEY=sk-ant-...    # if using Anthropic models
export DEEPSEEK_API_KEY=sk-...         # if using DeepSeek models

You need API keys for each provider in your paths. Using gpt-4o? You need OPENAI_API_KEY. Using claude-sonnet-4-20250514? You need ANTHROPIC_API_KEY.

3. Replace your LLM call

Before (hardcoded to one model):

python
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize: ..."}]
)
print(response.choices[0].message.content)

After (Kalibr picks the best model and learns from outcomes):

python
import kalibr  # must be first import
from kalibr import Router

router = Router(
    goal="summarize",
    paths=["gpt-4o", "claude-sonnet-4-20250514", "deepseek-chat"],
    success_when=lambda out: len(out) > 50
)
response = router.completion(
    messages=[{"role": "user", "content": "Summarize: ..."}]
)
print(response.choices[0].message.content)
# router.report() is called automatically when success_when is set

The same thing in TypeScript:

typescript
import kalibr from "@kalibr/sdk";
import { Router } from "@kalibr/sdk";

const router = new Router({
  goal: "summarize",
  paths: ["gpt-4o", "claude-sonnet-4-20250514"],
  successWhen: (out) => out.length > 50,
});
const response = await router.completion([
  { role: "user", content: "Summarize: ..." }
]);
console.log(response.choices[0].message.content);

What changed: you replaced OpenAI with Router, and client.chat.completions.create() with router.completion(). The response object is the same. response.choices[0].message.content works exactly like before.

success_when tells Kalibr how to evaluate each response automatically. For simple checks (non-empty, contains a keyword, valid JSON), this is all you need. For complex validation, skip success_when and call router.report() manually instead.

i
Continuous scoring: Use score_when for a 0.0 to 1.0 quality signal instead of binary pass/fail. This lets Kalibr distinguish "barely passed" from "excellent" and route toward higher quality.

score_when=lambda out: min(1.0, len(out) / 800)

4. Check your dashboard

Go to dashboard.kalibr.systems. You should see:

After 20 to 50 outcomes per path, routing stabilizes and Kalibr favors the model that works best for your goal. Early on, it explores all paths to gather data.

i

That's it. You're routing.

Every call now goes through Kalibr. When a model degrades, Kalibr reroutes to one that's working, before your users notice.

Manual outcome reporting

If your success criteria is too complex for a lambda (needs API calls, multi-step validation, human review), skip success_when and call report() yourself:

Python:

python
router = Router(
    goal="book_meeting",
    paths=["gpt-4o", "claude-sonnet-4-20250514"]
)
response = router.completion(messages=[...])
result = response.choices[0].message.content
# your validation logic
meeting_booked = check_calendar(result)
router.report(success=meeting_booked)

TypeScript:

typescript
const router = new Router({
  goal: "book_meeting",
  paths: ["gpt-4o", "claude-sonnet-4-20250514"],
});
const response = await router.completion([
  { role: "user", content: "Book a meeting with..." }
]);
const result = response.choices[0].message.content;
const meetingBooked = await checkCalendar(result);
await router.report(meetingBooked);

Auto-healing

Pass healing=True to let Kalibr automatically recover from a failed call. If the response fails the success contract, Kalibr classifies the failure, repairs the meta prompt, or swaps to an alternative model, then retries. All within one call.

python
from kalibr import Router

router = Router(
    goal="summarize",
    paths=["gpt-4o", "claude-sonnet-4-20250514"],
    success_when=lambda out: len(out) > 50,
)

response = router.completion(
    messages=[{"role": "user", "content": "Summarize: ..."}],
    healing=True,
)

For finer control over retry behavior, pass a HealConfig:

python
from kalibr import Router, HealConfig

config = HealConfig(
    max_retries=2,
    gate2_enabled=True,
    meta_prompt_enabled=True,
)

response = router.completion(
    messages=[{"role": "user", "content": "Summarize: ..."}],
    healing=True,
    heal_config=config,
)

Multi-step pipelines

Use router.pipeline() to run an end-to-end workflow where each step routes, evals, and heals on its own. Set "chain": True on any step to feed the previous step's output into it.

python
result = router.pipeline(
    [
        {"goal": "research", "messages": [...]},
        {"goal": "outreach_generation", "messages": [...], "chain": True},
    ],
    healing=True,
    pipeline_id="my-pipeline",
)

Every step runs the full self-healing loop independently. If one step fails irrecoverably, the pipeline returns the partial result with the failure attached so you can decide what to do next.

Pipeline isolation with pipeline_id

Passing pipeline_id scopes outcome learning to that pipeline. Two agents that share a goal but live in different pipelines will not bleed routing signals into each other. Useful when one pipeline runs on production data and another runs on tests, or when separate teams share goals but want isolated bandits.

python
router.completion(
    messages=[...],
    healing=True,
    pipeline_id="sales-outreach-prod",
)

Use the same pipeline_id across the calls that should share a learning context, and a different one for anything you want kept separate.

Session-aware routing

Pass session_id to the Router to enable session-aware routing. When provided, the intelligence service tracks session momentum across calls and may escalate the model choice if the session is widening (the user is frustrated or not getting what they need).

python
from kalibr import Router

router = Router(
    goal="draft_email",
    paths=["gpt-4o-mini", "gpt-4o", "claude-sonnet-4-20250514"],
    session_id=user_session_id,
)
response = router.completion(messages=[...])

You can also set the KALIBR_SESSION_ID environment variable instead of passing it explicitly. The Router reads it as a fallback.

What just happened

Thread safety

Router instances are not thread-safe. Create one Router per request context, not one shared instance per process.

In async Python, create the Router inside your handler:

python
# Correct: one router per request
async def handle_request(messages):
    router = Router(goal="my_goal", paths=["gpt-4o-mini", "deepseek-chat"])
    return await router.completion(messages)

# Wrong: shared instance across concurrent requests
router = Router(goal="my_goal", paths=["gpt-4o-mini"])

async def handle_request(messages):
    return await router.completion(messages)

Router creation is cheap. The path registration call is async and non-blocking.

Common mistakes

Next steps