API Reference

Python SDK v1.4.3 TypeScript SDK v1.3.0

Router

What this does: Routes requests to the best model based on learned outcomes. Works across any modality.

When to use: Create one Router per goal. Reuse for multiple requests in the same thread/async context.

from kalibr import Router

router = Router(
    goal="extract_company",           # required
    paths=["gpt-4o", "claude-sonnet-4-20250514"],  # required
    success_when=lambda x: len(x) > 0,  # optional — bool
    score_when=lambda x: min(1.0, len(x) / 500),  # optional — float 0-1
    exploration_rate=0.1,             # optional
    auto_register=True,               # optional, default True
)
import { Router } from '@kalibr/sdk';

const router = new Router({
  goal: 'extract_company',           // required
  paths: ['gpt-4o', 'claude-sonnet-4-20250514'], // required
  successWhen: (output) => output.length > 0, // optional — boolean
  scoreWhen: (output) => Math.min(1.0, output.length / 500), // optional — float 0-1
  explorationRate: 0.1,              // optional
  autoRegister: true,                // optional, default true
});

Required arguments

ArgumentTypeDescription
goalstr / stringName of the goal (e.g., "extract_company")
pathslist / arrayList of models or path configs

Optional arguments

PythonTypeScriptTypeDefaultDescription
success_when successWhen callable / function None Function that takes output string and returns bool. Auto-calls report().
score_when scoreWhen callable / function None Function that takes output string and returns float (0.0-1.0). Provides continuous quality signal to routing. If success_when is also set, both are used. If only score_when is set, success is derived as score >= 0.5. Score is clamped to [0, 1].
exploration_rate explorationRate float / number None Override exploration rate (0.0-1.0)
auto_register autoRegister bool / boolean True Register paths on init

Thread Safety

Router is NOT thread-safe. Internal state (trace_id) will be corrupted if used across threads/async contexts.

  • Python: Create one Router instance per thread.
  • TypeScript: Create one Router instance per async context (request handler, worker, etc.)

Common mistakes

  • Creating new Router per request – Reuse across requests
  • Forgetting to call report()
  • Using same goal for different tasks
  • Using Router across threads/async contexts – Create separate instances

Router.completion()

What this does: Makes a completion request with intelligent routing.

When to call: Every time your agent needs a model response for this goal.

Example

response = router.completion(
    messages=[
        {"role": "system", "content": "Extract company names."},
        {"role": "user", "content": "Hi, I'm Sarah from Stripe."}
    ],
    max_tokens=100
)
print(response.choices[0].message.content)
const response = await router.completion(
  [
    { role: 'system', content: 'Extract company names.' },
    { role: 'user', content: "Hi, I'm Sarah from Stripe." },
  ],
  { maxTokens: 100 }
);

console.log(response.choices[0].message.content);

Required arguments

ArgumentTypeDescription
messageslist / arrayOpenAI-format messages

Optional arguments

PythonTypeScriptTypeDescription
force_modelforceModelstr / stringOverride routing, use this model
max_tokensmaxTokensint / numberMaximum tokens in response
**kwargsoptionsany / objectPassed to provider (temperature, etc.)

Common mistakes

  • Passing model in kwargs (Kalibr picks the model; use force_model/forceModel to override)
  • Not handling exceptions (provider errors still raise)

Exceptions

Provider errors propagate to caller:

  • Python: openai.OpenAIError, anthropic.AnthropicError
  • TypeScript: OpenAI.APIError, Anthropic.APIError

Intelligence service failures do NOT raise – Router falls back to first path.


as_langchain()

Returns a LangChain-compatible LLM that uses Kalibr for routing. Use this to integrate with frameworks like CrewAI and LangChain.

from kalibr import Router

router = Router(
    goal="my_task",
    paths=["gpt-4o", "claude-sonnet-4-20250514"]
)

# Returns a LangChain BaseChatModel
llm = router.as_langchain()

# Use with any LangChain chain or CrewAI agent
from langchain_core.prompts import ChatPromptTemplate
chain = ChatPromptTemplate.from_template("{input}") | llm

Note: You still need to call router.report() after the chain completes to report the outcome.


Router.execute()

What this does: Routes any HuggingFace task with the same outcome-learning loop as completion(). Works for transcription, image generation, embeddings, classification, and all 17 HuggingFace task types.

When to call: When your agent needs to run a non-chat model task (audio, image, embedding, classification).

ParameterTypeRequiredDescription
taskstrYesHuggingFace task type (e.g. "automatic_speech_recognition", "text_to_image")
input_dataAnyYesTask-appropriate input (audio bytes, text prompt, image, etc.)
**kwargsNoPassed to HuggingFace provider

Returns: Task-native response (transcription text, PIL image, embedding vector, classification labels, etc.)

Examples

# Transcription
result = router.execute(task="automatic_speech_recognition", input_data=audio_bytes)

# Image generation
result = router.execute(task="text_to_image", input_data="a product photo of a laptop")

# Embeddings
result = router.execute(task="feature_extraction", input_data="search query text")

# Classification
result = router.execute(task="text_classification", input_data="This product is amazing!")

Supported task types: chat_completion, text_generation, summarization, translation, fill_mask, table_question_answering, automatic_speech_recognition, text_to_speech, audio_classification, text_to_image, image_to_text, image_classification, image_segmentation, object_detection, feature_extraction, text_classification, token_classification


Router.report()

What this does: Reports outcome for the last completion. This is how Kalibr learns.

When to call: After you know whether the task succeeded or failed.

Example

# Success
router.report(success=True)

# Failure with reason
router.report(success=False, reason="invalid_json")

# Success with quality score
router.report(success=True, score=0.8)

# Failure with structured category
router.report(success=False, failure_category="timeout", reason="Provider timed out after 30s")
// Success
await router.report(true);

// Failure with reason
await router.report(false, 'invalid_json');

// Success with score
await router.report(true, undefined, 0.8);

Required arguments

ArgumentTypeDescription
successbool / booleanWhether the task succeeded

Optional arguments

ArgumentTypeDescription
reasonstr / stringFailure reason (for debugging)
scorefloat / numberQuality score 0.0-1.0. Feeds directly into routing. A path that consistently scores 0.85 will be preferred over one scoring 0.6, even if both “succeed.” When provided, score is used as a continuous signal in Thompson Sampling — Beta(alpha + score, beta + (1 - score)) — giving finer-grained routing than boolean alone.
failure_categorystr / stringStructured failure category for clustering. Valid: timeout, context_exceeded, tool_error, rate_limited, validation_failed, hallucination_detected, user_unsatisfied, empty_response, malformed_output, auth_error, provider_error, unknown. Raises ValueError if invalid.

Common mistakes

  • Calling report() multiple times for one completion (second call is ignored)
  • Not calling report() at all (routing never improves)
  • Reporting success based on "response exists" instead of "task actually worked"

When to call

  • After you know if task succeeded/failed
  • Once per completion() call
  • For multi-turn, report once at end

Validation

Calling report() without a prior completion() raises an error. Calling report() twice for the same completion logs a warning and ignores the second call.


get_policy()

Get the recommended path for a goal without making a completion call. Useful for inspecting what Kalibr would choose or for custom routing logic.

from kalibr import get_policy

policy = get_policy(goal="book_meeting")

print(policy["recommended_model"])       # "gpt-4o"
print(policy["recommended_tool"])        # "calendar_api"
print(policy["outcome_success_rate"])    # 0.87
print(policy["confidence"])              # 0.92
import { getPolicy } from '@kalibr/sdk';

const policy = await getPolicy({ goal: 'book_meeting' });

console.log(policy.recommendedModel);      // "gpt-4o"
console.log(policy.recommendedTool);       // "calendar_api"
console.log(policy.outcomeSuccessRate);    // 0.87
console.log(policy.confidence);            // 0.92

With Constraints

policy = get_policy(
    goal="book_meeting",
    constraints={
        "max_cost_usd": 0.05,
        "max_latency_ms": 2000,
        "min_quality": 0.8
    }
)
const policy = await getPolicy({
    goal: 'book_meeting',
    constraints: {
        maxCostUsd: 0.05,
        maxLatencyMs: 2000,
        minQuality: 0.8,
    },
});

Kalibr will only recommend paths that meet all constraints. If no paths meet the constraints, the response will indicate no recommendation is available.

Parameters

ArgumentRequiredDescription
goalYesGoal name
constraintsNoObject with max_cost_usd, max_latency_ms, min_quality

Returns

FieldDescription
recommended_modelBest model for this goal
recommended_toolBest tool (if tools are tracked)
recommended_paramsBest parameters (if params are tracked)
outcome_success_rateHistorical success rate for this path
confidenceStatistical confidence (0-1)
alternativesOther viable paths ranked by performance

decide()

Get a routing decision for a goal. Uses Thompson Sampling to balance exploration and exploitation. This is what Router.completion() calls internally, but available for low-level control.

from kalibr import decide

decision = decide(goal="book_meeting", task_risk_level="low")

print(decision["model_id"])    # "gpt-4o"
print(decision["tool_id"])     # "calendar_api" or None
print(decision["params"])      # {"temperature": 0.3} or {}
print(decision["trace_id"])    # "abc123..." — pass this to report_outcome
print(decision["confidence"])  # 0.85

Parameters

ArgumentRequiredDefaultDescription
goalYes-Goal name
task_risk_levelNo"low"Risk tolerance: "low", "medium", or "high"

report_outcome()

Report execution outcome directly (without using Router). The feedback loop that teaches Kalibr what works.

from kalibr import report_outcome

report_outcome(
    trace_id="abc123",
    goal="book_meeting",
    success=True,
    score=0.95,                    # optional quality score 0-1
    failure_category="timeout",    # optional structured category
    model_id="gpt-4o",            # optional
)

Parameters

ArgumentRequiredDefaultDescription
trace_idYes-Trace ID from decide() or completion
goalYes-Goal name
successYes-Whether the goal was achieved
scoreNoNoneQuality score 0-1
failure_reasonNoNoneFree-text failure reason
failure_categoryNoNoneStructured failure category (see FAILURE_CATEGORIES)
metadataNoNoneAdditional context as dict
model_idNoNoneModel used
tool_idNoNoneTool used
execution_paramsNoNoneParameters used

update_outcome()

Update an existing outcome with a late-arriving signal. Use when the real success signal arrives after the initial report — for example, updating 48 hours later when a customer reopens a ticket that was initially reported as resolved.

Only fields that are explicitly passed (not None) will be updated. Other fields retain their original values.

from kalibr import update_outcome

# Customer reopened ticket 48 hours after "resolution"
result = update_outcome(
    trace_id="abc123",
    goal="resolve_ticket",
    success=False,
    failure_category="user_unsatisfied",
)
print(result["fields_updated"])  # ["success", "failure_category"]

Parameters

ArgumentRequiredDefaultDescription
trace_idYes-Trace ID of the outcome to update
goalYes-Goal (must match original outcome)
successNoNoneUpdated success status
scoreNoNoneUpdated quality score 0-1
failure_reasonNoNoneUpdated failure reason
failure_categoryNoNoneUpdated failure category
metadataNoNoneAdditional metadata to merge

Returns 404 if no outcome exists for the given trace_id + goal combination.


get_insights()

Get structured diagnostics about what Kalibr has learned. Returns machine-readable intelligence per goal, designed for coding agents that need to decide what to improve.

Response includes schema_version: "1.0" for forward compatibility.

from kalibr import get_insights

# All goals
insights = get_insights()

# Single goal, custom window
insights = get_insights(goal="resolve_ticket", window_hours=48)

for goal in insights["goals"]:
    print(f"{goal['goal']}: {goal['status']} ({goal['success_rate']:.0%})")
    for signal in goal["actionable_signals"]:
        if signal["severity"] == "critical":
            print(f"  ⚠ {signal['type']}: {signal['data']}")

Parameters

ArgumentRequiredDefaultDescription
goalNoNoneFilter to a specific goal (returns all if None)
window_hoursNo168Time window for analysis (default 1 week)

Response structure (per goal)

FieldDescription
goalGoal name
statushealthy, degrading, failing, or insufficient_data
success_rateOverall success rate (0-1)
sample_countTotal outcomes in window
trendimproving, stable, or degrading
confidenceStatistical confidence (0-1)
top_failure_modesFailure categories ranked by frequency
pathsPer-path performance (success_rate, trend, cost, latency)
param_sensitivityParameters that significantly affect outcomes
actionable_signalsMachine-readable signals (see below)

Actionable signal types

SignalDescription
path_underperformingA path is >15pp below the best path
failure_mode_dominantOne failure category accounts for >50% of failures
param_sensitivity_detectedA parameter value significantly affects outcomes (>10pp spread)
drift_detectedPath performance is degrading over time
cost_inefficiencyA cheaper path has similar success rate (within 5pp)
low_confidencePath has fewer than 20 samples
goal_healthyNo action needed — goal is performing well

register_path()

Register a new routing path for a goal.

from kalibr import register_path

result = register_path(
    goal="book_meeting",
    model_id="gpt-4o",
    tool_id="calendar_api",              # optional
    params={"temperature": 0.3},         # optional
    risk_level="low",                    # optional: "low", "medium", "high"
)
print(result["path_id"])

FAILURE_CATEGORIES

Constant containing all valid failure category values. Import and use for client-side validation.

from kalibr import FAILURE_CATEGORIES

print(FAILURE_CATEGORIES)
# ["timeout", "context_exceeded", "tool_error", "rate_limited",
#  "validation_failed", "hallucination_detected", "user_unsatisfied",
#  "empty_response", "malformed_output", "auth_error", "provider_error", "unknown"]

# Used by report() and report_outcome() — raises ValueError if invalid category passed

Intelligence API (TypeScript)

The TypeScript SDK exports convenience functions for direct access to the Intelligence API.

import {
  KalibrIntelligence,
  getPolicy,
  reportOutcome,
  registerPath,
  decide,
  getRecommendation,
  listPaths,
  disablePath,
  setExplorationConfig,
  getExplorationConfig,
} from '@kalibr/sdk';

// Initialize singleton
KalibrIntelligence.init({
  apiKey: process.env.KALIBR_API_KEY!,
  tenantId: process.env.KALIBR_TENANT_ID!,
});

// Get routing decision
const decision = await decide('extract_company');
console.log(decision.model_id, decision.confidence);

// Report outcome directly
await reportOutcome(traceId, 'extract_company', true, {
  score: 0.95,
  modelId: 'gpt-4o',
});

// List registered paths
const { paths } = await listPaths({ goal: 'extract_company' });

// Configure exploration
await setExplorationConfig({
  goal: 'extract_company',
  explorationRate: 0.1,
  minSamplesBeforeExploit: 20,
});

Auto-Instrumentation (TypeScript)

Wrap OpenAI or Anthropic clients to automatically trace all LLM calls.

import { createTracedOpenAI, createTracedAnthropic } from '@kalibr/sdk';

// Wrap OpenAI client - all calls auto-traced
const openai = createTracedOpenAI();
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello!' }],
});

// Wrap Anthropic client
const anthropic = createTracedAnthropic();
const message = await anthropic.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello!' }],
});

Context Management (TypeScript)

Run code within goal or trace contexts using async local storage.

import { withGoal, withTraceId, traceContext } from '@kalibr/sdk';

// Run code within a goal context
await withGoal('extract_company', async () => {
  // All Kalibr operations inherit this goal
  const response = await openai.chat.completions.create({...});
});

// Run code with a specific trace ID
await withTraceId('my-custom-trace-id', async () => {
  // All operations use this trace ID
});

// Combined trace context
await traceContext({ traceId: 'my-trace', goal: 'summarize' }, async () => {
  // Both trace ID and goal available
});

Environment Variables

VariableRequiredDefaultDescription
KALIBR_API_KEYYes-Your API key from dashboard
KALIBR_TENANT_IDYesdefaultYour tenant ID
KALIBR_AUTO_INSTRUMENTNotrueAuto-instrument OpenAI/Anthropic/Google SDKs
KALIBR_INTELLIGENCE_URLNohttps://kalibr-intelligence.fly.devIntelligence service endpoint
OPENAI_API_KEYFor OpenAI-OpenAI API key
ANTHROPIC_API_KEYFor Anthropic-Anthropic API key
GOOGLE_API_KEYFor Google-Google API key

REST API Endpoints

Intelligence service: https://kalibr-intelligence.fly.dev

POST /api/v1/routing/decide

Get routing decision for a goal.

curl -X POST https://kalibr-intelligence.fly.dev/api/v1/routing/decide \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"goal": "extract_company"}'

POST /api/v1/intelligence/report-outcome

Report execution outcome.

curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/report-outcome \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"trace_id": "abc123", "goal": "extract_company", "success": true}'

POST /api/v1/routing/paths

Register a new path.

curl -X POST https://kalibr-intelligence.fly.dev/api/v1/routing/paths \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"goal": "extract_company", "model_id": "gpt-4o"}'

POST /api/v1/intelligence/update-outcome

Update an existing outcome with a late-arriving signal.

curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/update-outcome \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"trace_id": "abc123", "goal": "resolve_ticket", "success": false, "failure_category": "user_unsatisfied"}'

GET /api/v1/intelligence/insights

Get structured diagnostics per goal.

curl https://kalibr-intelligence.fly.dev/api/v1/intelligence/insights?window_hours=168&goal=resolve_ticket \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant"

GET /api/v1/routing/paths

List registered paths for a goal.

curl https://kalibr-intelligence.fly.dev/api/v1/routing/paths?goal=extract_company \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant"

DELETE /api/v1/routing/paths/{path_id}

Disable a path.

curl -X DELETE https://kalibr-intelligence.fly.dev/api/v1/routing/paths/path_abc123 \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant"

GET /api/v1/routing/stats

Get goal statistics.

curl https://kalibr-intelligence.fly.dev/api/v1/routing/stats?goal=extract_company \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant"

Default Values

ParameterDefaultDescription
exploration_rate0.1 (10%)Percentage of requests that explore non-optimal paths
min_samples20Outcomes needed per path before stable routing
success_whenNoneHeuristic auto-scoring (response length, structure, finish reason)
score_whenNoneHeuristic auto-scoring used when both callbacks are None

Next