API Reference
Router
What this does: Routes requests to the best model based on learned outcomes. Works across any modality.
When to use: Create one Router per goal. Reuse for multiple requests in the same thread/async context.
from kalibr import Router
router = Router(
goal="extract_company", # required
paths=["gpt-4o", "claude-sonnet-4-20250514"], # required
success_when=lambda x: len(x) > 0, # optional — bool
score_when=lambda x: min(1.0, len(x) / 500), # optional — float 0-1
exploration_rate=0.1, # optional
auto_register=True, # optional, default True
)
import { Router } from '@kalibr/sdk';
const router = new Router({
goal: 'extract_company', // required
paths: ['gpt-4o', 'claude-sonnet-4-20250514'], // required
successWhen: (output) => output.length > 0, // optional — boolean
scoreWhen: (output) => Math.min(1.0, output.length / 500), // optional — float 0-1
explorationRate: 0.1, // optional
autoRegister: true, // optional, default true
});
Required arguments
| Argument | Type | Description |
|---|---|---|
goal | str / string | Name of the goal (e.g., "extract_company") |
paths | list / array | List of models or path configs |
Optional arguments
| Python | TypeScript | Type | Default | Description |
|---|---|---|---|---|
success_when |
successWhen |
callable / function | None | Function that takes output string and returns bool. Auto-calls report(). |
score_when |
scoreWhen |
callable / function | None | Function that takes output string and returns float (0.0-1.0). Provides continuous quality signal to routing. If success_when is also set, both are used. If only score_when is set, success is derived as score >= 0.5. Score is clamped to [0, 1]. |
exploration_rate |
explorationRate |
float / number | None | Override exploration rate (0.0-1.0) |
auto_register |
autoRegister |
bool / boolean | True | Register paths on init |
Thread Safety
Router is NOT thread-safe. Internal state (trace_id) will be corrupted if used across threads/async contexts.
- Python: Create one Router instance per thread.
- TypeScript: Create one Router instance per async context (request handler, worker, etc.)
Common mistakes
- Creating new Router per request – Reuse across requests
- Forgetting to call
report() - Using same goal for different tasks
- Using Router across threads/async contexts – Create separate instances
Router.completion()
What this does: Makes a completion request with intelligent routing.
When to call: Every time your agent needs a model response for this goal.
Example
response = router.completion(
messages=[
{"role": "system", "content": "Extract company names."},
{"role": "user", "content": "Hi, I'm Sarah from Stripe."}
],
max_tokens=100
)
print(response.choices[0].message.content)
const response = await router.completion(
[
{ role: 'system', content: 'Extract company names.' },
{ role: 'user', content: "Hi, I'm Sarah from Stripe." },
],
{ maxTokens: 100 }
);
console.log(response.choices[0].message.content);
Required arguments
| Argument | Type | Description |
|---|---|---|
messages | list / array | OpenAI-format messages |
Optional arguments
| Python | TypeScript | Type | Description |
|---|---|---|---|
force_model | forceModel | str / string | Override routing, use this model |
max_tokens | maxTokens | int / number | Maximum tokens in response |
**kwargs | options | any / object | Passed to provider (temperature, etc.) |
Common mistakes
- Passing model in kwargs (Kalibr picks the model; use
force_model/forceModelto override) - Not handling exceptions (provider errors still raise)
Exceptions
Provider errors propagate to caller:
- Python:
openai.OpenAIError,anthropic.AnthropicError - TypeScript:
OpenAI.APIError,Anthropic.APIError
Intelligence service failures do NOT raise – Router falls back to first path.
as_langchain()
Returns a LangChain-compatible LLM that uses Kalibr for routing. Use this to integrate with frameworks like CrewAI and LangChain.
from kalibr import Router
router = Router(
goal="my_task",
paths=["gpt-4o", "claude-sonnet-4-20250514"]
)
# Returns a LangChain BaseChatModel
llm = router.as_langchain()
# Use with any LangChain chain or CrewAI agent
from langchain_core.prompts import ChatPromptTemplate
chain = ChatPromptTemplate.from_template("{input}") | llm
Note: You still need to call router.report() after the chain completes to report the outcome.
Router.execute()
What this does: Routes any HuggingFace task with the same outcome-learning loop as completion(). Works for transcription, image generation, embeddings, classification, and all 17 HuggingFace task types.
When to call: When your agent needs to run a non-chat model task (audio, image, embedding, classification).
| Parameter | Type | Required | Description |
|---|---|---|---|
task | str | Yes | HuggingFace task type (e.g. "automatic_speech_recognition", "text_to_image") |
input_data | Any | Yes | Task-appropriate input (audio bytes, text prompt, image, etc.) |
**kwargs | No | Passed to HuggingFace provider |
Returns: Task-native response (transcription text, PIL image, embedding vector, classification labels, etc.)
Examples
# Transcription
result = router.execute(task="automatic_speech_recognition", input_data=audio_bytes)
# Image generation
result = router.execute(task="text_to_image", input_data="a product photo of a laptop")
# Embeddings
result = router.execute(task="feature_extraction", input_data="search query text")
# Classification
result = router.execute(task="text_classification", input_data="This product is amazing!")
Supported task types: chat_completion, text_generation, summarization, translation, fill_mask, table_question_answering, automatic_speech_recognition, text_to_speech, audio_classification, text_to_image, image_to_text, image_classification, image_segmentation, object_detection, feature_extraction, text_classification, token_classification
Router.report()
What this does: Reports outcome for the last completion. This is how Kalibr learns.
When to call: After you know whether the task succeeded or failed.
Example
# Success
router.report(success=True)
# Failure with reason
router.report(success=False, reason="invalid_json")
# Success with quality score
router.report(success=True, score=0.8)
# Failure with structured category
router.report(success=False, failure_category="timeout", reason="Provider timed out after 30s")
// Success
await router.report(true);
// Failure with reason
await router.report(false, 'invalid_json');
// Success with score
await router.report(true, undefined, 0.8);
Required arguments
| Argument | Type | Description |
|---|---|---|
success | bool / boolean | Whether the task succeeded |
Optional arguments
| Argument | Type | Description |
|---|---|---|
reason | str / string | Failure reason (for debugging) |
score | float / number | Quality score 0.0-1.0. Feeds directly into routing. A path that consistently scores 0.85 will be preferred over one scoring 0.6, even if both “succeed.” When provided, score is used as a continuous signal in Thompson Sampling — Beta(alpha + score, beta + (1 - score)) — giving finer-grained routing than boolean alone. |
failure_category | str / string | Structured failure category for clustering. Valid: timeout, context_exceeded, tool_error, rate_limited, validation_failed, hallucination_detected, user_unsatisfied, empty_response, malformed_output, auth_error, provider_error, unknown. Raises ValueError if invalid. |
Common mistakes
- Calling report() multiple times for one completion (second call is ignored)
- Not calling report() at all (routing never improves)
- Reporting success based on "response exists" instead of "task actually worked"
When to call
- After you know if task succeeded/failed
- Once per completion() call
- For multi-turn, report once at end
Validation
Calling report() without a prior completion() raises an error. Calling report() twice for the same completion logs a warning and ignores the second call.
get_policy()
Get the recommended path for a goal without making a completion call. Useful for inspecting what Kalibr would choose or for custom routing logic.
from kalibr import get_policy
policy = get_policy(goal="book_meeting")
print(policy["recommended_model"]) # "gpt-4o"
print(policy["recommended_tool"]) # "calendar_api"
print(policy["outcome_success_rate"]) # 0.87
print(policy["confidence"]) # 0.92
import { getPolicy } from '@kalibr/sdk';
const policy = await getPolicy({ goal: 'book_meeting' });
console.log(policy.recommendedModel); // "gpt-4o"
console.log(policy.recommendedTool); // "calendar_api"
console.log(policy.outcomeSuccessRate); // 0.87
console.log(policy.confidence); // 0.92
With Constraints
policy = get_policy(
goal="book_meeting",
constraints={
"max_cost_usd": 0.05,
"max_latency_ms": 2000,
"min_quality": 0.8
}
)
const policy = await getPolicy({
goal: 'book_meeting',
constraints: {
maxCostUsd: 0.05,
maxLatencyMs: 2000,
minQuality: 0.8,
},
});
Kalibr will only recommend paths that meet all constraints. If no paths meet the constraints, the response will indicate no recommendation is available.
Parameters
| Argument | Required | Description |
|---|---|---|
goal | Yes | Goal name |
constraints | No | Object with max_cost_usd, max_latency_ms, min_quality |
Returns
| Field | Description |
|---|---|
recommended_model | Best model for this goal |
recommended_tool | Best tool (if tools are tracked) |
recommended_params | Best parameters (if params are tracked) |
outcome_success_rate | Historical success rate for this path |
confidence | Statistical confidence (0-1) |
alternatives | Other viable paths ranked by performance |
decide()
Get a routing decision for a goal. Uses Thompson Sampling to balance exploration and exploitation. This is what Router.completion() calls internally, but available for low-level control.
from kalibr import decide
decision = decide(goal="book_meeting", task_risk_level="low")
print(decision["model_id"]) # "gpt-4o"
print(decision["tool_id"]) # "calendar_api" or None
print(decision["params"]) # {"temperature": 0.3} or {}
print(decision["trace_id"]) # "abc123..." — pass this to report_outcome
print(decision["confidence"]) # 0.85
Parameters
| Argument | Required | Default | Description |
|---|---|---|---|
goal | Yes | - | Goal name |
task_risk_level | No | "low" | Risk tolerance: "low", "medium", or "high" |
report_outcome()
Report execution outcome directly (without using Router). The feedback loop that teaches Kalibr what works.
from kalibr import report_outcome
report_outcome(
trace_id="abc123",
goal="book_meeting",
success=True,
score=0.95, # optional quality score 0-1
failure_category="timeout", # optional structured category
model_id="gpt-4o", # optional
)
Parameters
| Argument | Required | Default | Description |
|---|---|---|---|
trace_id | Yes | - | Trace ID from decide() or completion |
goal | Yes | - | Goal name |
success | Yes | - | Whether the goal was achieved |
score | No | None | Quality score 0-1 |
failure_reason | No | None | Free-text failure reason |
failure_category | No | None | Structured failure category (see FAILURE_CATEGORIES) |
metadata | No | None | Additional context as dict |
model_id | No | None | Model used |
tool_id | No | None | Tool used |
execution_params | No | None | Parameters used |
update_outcome()
Update an existing outcome with a late-arriving signal. Use when the real success signal arrives after the initial report — for example, updating 48 hours later when a customer reopens a ticket that was initially reported as resolved.
Only fields that are explicitly passed (not None) will be updated. Other fields retain their original values.
from kalibr import update_outcome
# Customer reopened ticket 48 hours after "resolution"
result = update_outcome(
trace_id="abc123",
goal="resolve_ticket",
success=False,
failure_category="user_unsatisfied",
)
print(result["fields_updated"]) # ["success", "failure_category"]
Parameters
| Argument | Required | Default | Description |
|---|---|---|---|
trace_id | Yes | - | Trace ID of the outcome to update |
goal | Yes | - | Goal (must match original outcome) |
success | No | None | Updated success status |
score | No | None | Updated quality score 0-1 |
failure_reason | No | None | Updated failure reason |
failure_category | No | None | Updated failure category |
metadata | No | None | Additional metadata to merge |
Returns 404 if no outcome exists for the given trace_id + goal combination.
get_insights()
Get structured diagnostics about what Kalibr has learned. Returns machine-readable intelligence per goal, designed for coding agents that need to decide what to improve.
Response includes schema_version: "1.0" for forward compatibility.
from kalibr import get_insights
# All goals
insights = get_insights()
# Single goal, custom window
insights = get_insights(goal="resolve_ticket", window_hours=48)
for goal in insights["goals"]:
print(f"{goal['goal']}: {goal['status']} ({goal['success_rate']:.0%})")
for signal in goal["actionable_signals"]:
if signal["severity"] == "critical":
print(f" ⚠ {signal['type']}: {signal['data']}")
Parameters
| Argument | Required | Default | Description |
|---|---|---|---|
goal | No | None | Filter to a specific goal (returns all if None) |
window_hours | No | 168 | Time window for analysis (default 1 week) |
Response structure (per goal)
| Field | Description |
|---|---|
goal | Goal name |
status | healthy, degrading, failing, or insufficient_data |
success_rate | Overall success rate (0-1) |
sample_count | Total outcomes in window |
trend | improving, stable, or degrading |
confidence | Statistical confidence (0-1) |
top_failure_modes | Failure categories ranked by frequency |
paths | Per-path performance (success_rate, trend, cost, latency) |
param_sensitivity | Parameters that significantly affect outcomes |
actionable_signals | Machine-readable signals (see below) |
Actionable signal types
| Signal | Description |
|---|---|
path_underperforming | A path is >15pp below the best path |
failure_mode_dominant | One failure category accounts for >50% of failures |
param_sensitivity_detected | A parameter value significantly affects outcomes (>10pp spread) |
drift_detected | Path performance is degrading over time |
cost_inefficiency | A cheaper path has similar success rate (within 5pp) |
low_confidence | Path has fewer than 20 samples |
goal_healthy | No action needed — goal is performing well |
register_path()
Register a new routing path for a goal.
from kalibr import register_path
result = register_path(
goal="book_meeting",
model_id="gpt-4o",
tool_id="calendar_api", # optional
params={"temperature": 0.3}, # optional
risk_level="low", # optional: "low", "medium", "high"
)
print(result["path_id"])
FAILURE_CATEGORIES
Constant containing all valid failure category values. Import and use for client-side validation.
from kalibr import FAILURE_CATEGORIES
print(FAILURE_CATEGORIES)
# ["timeout", "context_exceeded", "tool_error", "rate_limited",
# "validation_failed", "hallucination_detected", "user_unsatisfied",
# "empty_response", "malformed_output", "auth_error", "provider_error", "unknown"]
# Used by report() and report_outcome() — raises ValueError if invalid category passed
Intelligence API (TypeScript)
The TypeScript SDK exports convenience functions for direct access to the Intelligence API.
import {
KalibrIntelligence,
getPolicy,
reportOutcome,
registerPath,
decide,
getRecommendation,
listPaths,
disablePath,
setExplorationConfig,
getExplorationConfig,
} from '@kalibr/sdk';
// Initialize singleton
KalibrIntelligence.init({
apiKey: process.env.KALIBR_API_KEY!,
tenantId: process.env.KALIBR_TENANT_ID!,
});
// Get routing decision
const decision = await decide('extract_company');
console.log(decision.model_id, decision.confidence);
// Report outcome directly
await reportOutcome(traceId, 'extract_company', true, {
score: 0.95,
modelId: 'gpt-4o',
});
// List registered paths
const { paths } = await listPaths({ goal: 'extract_company' });
// Configure exploration
await setExplorationConfig({
goal: 'extract_company',
explorationRate: 0.1,
minSamplesBeforeExploit: 20,
});
Auto-Instrumentation (TypeScript)
Wrap OpenAI or Anthropic clients to automatically trace all LLM calls.
import { createTracedOpenAI, createTracedAnthropic } from '@kalibr/sdk';
// Wrap OpenAI client - all calls auto-traced
const openai = createTracedOpenAI();
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello!' }],
});
// Wrap Anthropic client
const anthropic = createTracedAnthropic();
const message = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hello!' }],
});
Context Management (TypeScript)
Run code within goal or trace contexts using async local storage.
import { withGoal, withTraceId, traceContext } from '@kalibr/sdk';
// Run code within a goal context
await withGoal('extract_company', async () => {
// All Kalibr operations inherit this goal
const response = await openai.chat.completions.create({...});
});
// Run code with a specific trace ID
await withTraceId('my-custom-trace-id', async () => {
// All operations use this trace ID
});
// Combined trace context
await traceContext({ traceId: 'my-trace', goal: 'summarize' }, async () => {
// Both trace ID and goal available
});
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
KALIBR_API_KEY | Yes | - | Your API key from dashboard |
KALIBR_TENANT_ID | Yes | default | Your tenant ID |
KALIBR_AUTO_INSTRUMENT | No | true | Auto-instrument OpenAI/Anthropic/Google SDKs |
KALIBR_INTELLIGENCE_URL | No | https://kalibr-intelligence.fly.dev | Intelligence service endpoint |
OPENAI_API_KEY | For OpenAI | - | OpenAI API key |
ANTHROPIC_API_KEY | For Anthropic | - | Anthropic API key |
GOOGLE_API_KEY | For Google | - | Google API key |
REST API Endpoints
Intelligence service: https://kalibr-intelligence.fly.dev
POST /api/v1/routing/decide
Get routing decision for a goal.
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/routing/decide \
-H "X-API-Key: your-key" \
-H "X-Tenant-ID: your-tenant" \
-H "Content-Type: application/json" \
-d '{"goal": "extract_company"}'
POST /api/v1/intelligence/report-outcome
Report execution outcome.
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/report-outcome \
-H "X-API-Key: your-key" \
-H "X-Tenant-ID: your-tenant" \
-H "Content-Type: application/json" \
-d '{"trace_id": "abc123", "goal": "extract_company", "success": true}'
POST /api/v1/routing/paths
Register a new path.
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/routing/paths \
-H "X-API-Key: your-key" \
-H "X-Tenant-ID: your-tenant" \
-H "Content-Type: application/json" \
-d '{"goal": "extract_company", "model_id": "gpt-4o"}'
POST /api/v1/intelligence/update-outcome
Update an existing outcome with a late-arriving signal.
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/update-outcome \
-H "X-API-Key: your-key" \
-H "X-Tenant-ID: your-tenant" \
-H "Content-Type: application/json" \
-d '{"trace_id": "abc123", "goal": "resolve_ticket", "success": false, "failure_category": "user_unsatisfied"}'
GET /api/v1/intelligence/insights
Get structured diagnostics per goal.
curl https://kalibr-intelligence.fly.dev/api/v1/intelligence/insights?window_hours=168&goal=resolve_ticket \
-H "X-API-Key: your-key" \
-H "X-Tenant-ID: your-tenant"
GET /api/v1/routing/paths
List registered paths for a goal.
curl https://kalibr-intelligence.fly.dev/api/v1/routing/paths?goal=extract_company \
-H "X-API-Key: your-key" \
-H "X-Tenant-ID: your-tenant"
DELETE /api/v1/routing/paths/{path_id}
Disable a path.
curl -X DELETE https://kalibr-intelligence.fly.dev/api/v1/routing/paths/path_abc123 \
-H "X-API-Key: your-key" \
-H "X-Tenant-ID: your-tenant"
GET /api/v1/routing/stats
Get goal statistics.
curl https://kalibr-intelligence.fly.dev/api/v1/routing/stats?goal=extract_company \
-H "X-API-Key: your-key" \
-H "X-Tenant-ID: your-tenant"
Default Values
| Parameter | Default | Description |
|---|---|---|
exploration_rate | 0.1 (10%) | Percentage of requests that explore non-optimal paths |
min_samples | 20 | Outcomes needed per path before stable routing |
success_when | None | Heuristic auto-scoring (response length, structure, finish reason) |
score_when | None | Heuristic auto-scoring used when both callbacks are None |
Next
- Production Guide - Exploration, failure modes