What this does: Routes requests to the best model based on learned outcomes. Works across any modality.
When to use: Create one Router per goal. Reuse for multiple requests in the same thread/async context.
from kalibr import Router
router = Router(
goal="extract_company", # required
paths=["gpt-4o", "claude-sonnet-4-20250514"], # required
success_when=lambda x: len(x) > 0, # optional, bool
score_when=lambda x: min(1.0, len(x) / 500), # optional, float 0-1
auto_register=True, # optional, default True
)import { Router } from '@kalibr/sdk';
const router = new Router({
goal: 'extract_company', // required
paths: ['gpt-4o', 'claude-sonnet-4-20250514'], // required
successWhen: (output) => output.length > 0, // optional, boolean
scoreWhen: (output) => Math.min(1.0, output.length / 500), // optional, float 0-1
autoRegister: true, // optional, default true
});| Argument | Type | Description |
|---|---|---|
| goal | str / string | Name of the goal (e.g., "extract_company") |
| paths | list / array | List of models or path configs |
| Python | TypeScript | Type | Default | Description |
|---|---|---|---|---|
| success_when | successWhen | callable / function | None | Function that takes output string and returns bool. Auto-calls report(). |
| score_when | scoreWhen | callable / function | None | Function that takes output string and returns float (0.0-1.0). Provides continuous quality signal to routing. Takes priority over success_when when both are set. Success is derived as score >= 0.5. Score is clamped to [0, 1]. |
| session_id | sessionId | str / string | None | Optional session identifier for session-aware routing. When provided, the intelligence service reads recent session momentum and may escalate the model if the session is widening (user frustrated). Falls back to the KALIBR_SESSION_ID environment variable when not passed. |
| prefer_cached | preferCached | bool / boolean | False | When True, future routing decisions will prefer providers that have warm prompt caches for this goal. Wired in for forward compatibility. |
| judge_model | — | str | None | Python only. Model ID to use as a Gate 2 quality judge (e.g. "deepseek-chat"). When set, Kalibr runs an LLM quality check on each response; if the score falls below judge_threshold, it tries the next path (or repairs the prompt if repair_prompt=True). Requires DEEPSEEK_API_KEY or OPENAI_API_KEY in env. |
| judge_threshold | — | float | 0.7 | Python only. Quality score threshold for Gate 2. Responses scoring below this value trigger a model swap (or prompt repair). Range: 0.0–1.0. |
| repair_prompt | — | bool | False | Python only. When True and a Gate 2 quality check fails, Kalibr rewrites the user prompt using the judge model before trying the next path. The rewritten prompt is derived from the original request and the bad output. Only active when a judge_model is also set. |
| exploration_rate | explorationRate | float / number | None / 0.1 | Override the Thompson Sampling exploration rate (0.0–1.0). Higher values explore less-proven paths more aggressively. When None (Python default), the intelligence service controls exploration automatically. |
| auto_register | autoRegister | bool / boolean | True | Register paths on init |
Router is NOT thread-safe. Internal state (trace_id) will be corrupted if used across threads/async contexts.
What this does: Makes a completion request with intelligent routing.
When to call: Every time your agent needs a model response for this goal.
response = router.completion(
messages=[
{"role": "system", "content": "Extract company names."},
{"role": "user", "content": "Hi, I'm Sarah from Stripe."}
],
max_tokens=100
)
print(response.choices[0].message.content)const response = await router.completion(
[
{ role: 'system', content: 'Extract company names.' },
{ role: 'user', content: "Hi, I'm Sarah from Stripe." },
],
{ maxTokens: 100 }
);
console.log(response.choices[0].message.content);| Argument | Type | Description |
|---|---|---|
| messages | list / array | OpenAI-format messages |
| Python | TypeScript | Type | Description |
|---|---|---|---|
| force_model | forceModel | str / string | Override routing, use this model |
| max_tokens | maxTokens | int / number | Maximum tokens in response |
| healing | healing | bool / boolean | Default False. When True, the Router runs the structural gate after each call, classifies failures, repairs the meta prompt or swaps to the next path, and retries inside the same call. New in v1.14. |
| heal_config | healConfig | HealConfig | Optional. Tunes retry budget and which gates run during healing. See HealConfig below. |
| pipeline_id | pipelineId | str / string | Optional. Scopes outcome learning to this pipeline so routing signals don't bleed between unrelated agents that share a goal. New in v1.14. |
| **kwargs | options | any / object | Passed to provider (temperature, etc.) |
Provider errors propagate to caller:
Intelligence service failures do NOT raise -- Router falls back to first path.
What this does: Configures how router.completion(healing=True) retries failed calls. Pass via heal_config=.
When to use: When the default heal budget or gate set does not fit the goal. For example: to enable an LLM-judge quality gate, disable meta-prompt repair, or change the retry count.
from kalibr import Router, HealConfig
config = HealConfig(
max_retries=2, # heal attempts before giving up
gate2_enabled=True, # LLM-judge quality gate
meta_prompt_enabled=True, # repair meta prompt before model swap
)
response = router.completion(
messages=[{"role": "user", "content": "Summarize: ..."}],
healing=True,
heal_config=config,
)| Field | Type | Default | Description |
|---|---|---|---|
| max_retries | int | 2 | Maximum heal attempts before returning the last response. |
| gate2_enabled | bool | False | Run the LLM-judge quality gate in addition to the structural gate. Uses the caller's model/keys. |
| judge_model | str | "deepseek-chat" | Model used for the Gate 2 LLM judge when gate2_enabled=True. |
| repair_model | str or None | None | Optional override model for repair calls. When None, reuses the same model that produced the failing output. |
| meta_prompt_enabled | bool | False | Generate a task-specific system prompt via a cheap LLM before each heal step. Combined with repair prompts on retry. Fails open. |
What this does: Runs a sequence of routed, gated, healed steps as a single call. Each step picks its own goal and messages and runs the full self-healing loop independently.
When to use: Multi-step agent workflows (research → outreach, extract → enrich → classify) where you want every intermediate step to be evaluated and healed without writing orchestration code.
result = router.pipeline(
[
{"goal": "research", "messages": [...]},
{"goal": "outreach_generation", "messages": [...], "chain": True},
],
healing=True,
pipeline_id="sales-outreach-prod",
)| Argument | Type | Description |
|---|---|---|
| steps | list[dict] | Ordered list of step specs. Each step requires goal and messages. Set "chain": True to feed the previous step's output into the current step. |
| Argument | Type | Default | Description |
|---|---|---|---|
| healing | bool | False | Enable the self-healing loop for every step. |
| heal_config | HealConfig | None | Override default heal budget / gates for every step. |
| pipeline_id | str | None | Scope outcome learning to this pipeline so routing signals don't bleed between unrelated pipelines. |
Returns: Pipeline result with per-step outputs and metadata. If a step fails after exhausting retries, the pipeline returns a partial result with the failure attached.
Returns a LangChain-compatible LLM that uses Kalibr for routing. Use this to integrate with frameworks like CrewAI and LangChain.
from kalibr import Router
router = Router(
goal="my_task",
paths=["gpt-4o", "claude-sonnet-4-20250514"]
)
# Returns a LangChain BaseChatModel
llm = router.as_langchain()
# Use with any LangChain chain or CrewAI agent
from langchain_core.prompts import ChatPromptTemplate
chain = ChatPromptTemplate.from_template("{input}") | llmNote: You still need to call router.report() after the chain completes to report the outcome.
What this does: Routes any HuggingFace task with the same outcome-learning loop as completion(). Works for transcription, image generation, embeddings, classification, and all 17 HuggingFace task types.
When to call: When your agent needs to run a non-chat model task (audio, image, embedding, classification).
| Parameter | Type | Required | Description |
|---|---|---|---|
| task | str | Yes | HuggingFace task type (e.g. "automatic_speech_recognition", "text_to_image") |
| input_data | Any | Yes | Task-appropriate input (audio bytes, text prompt, image, etc.) |
| **kwargs | No | Passed to HuggingFace provider |
Returns: Task-native response (transcription text, PIL image, embedding vector, classification labels, etc.)
# Transcription result = router.execute(task="automatic_speech_recognition", input_data=audio_bytes) # Image generation result = router.execute(task="text_to_image", input_data="a product photo of a laptop") # Embeddings result = router.execute(task="feature_extraction", input_data="search query text") # Classification result = router.execute(task="text_classification", input_data="This product is amazing!")
Supported task types: chat_completion, text_generation, summarization, translation, fill_mask, table_question_answering, automatic_speech_recognition, text_to_speech, audio_classification, text_to_image, image_to_text, image_classification, image_segmentation, object_detection, feature_extraction, text_classification, token_classification
What this does: Routes a text-to-speech (TTS) request to the best voice model, with the same outcome-learning loop as completion(). Supports ElevenLabs, OpenAI TTS, and Deepgram Aura.
router = Router(
goal="narrate",
paths=["tts-1", "eleven_multilingual_v2"], # OpenAI TTS or ElevenLabs
)
result = router.synthesize("Hello world", voice="alloy")
# result.audio , audio bytes
# result.cost_usd , cost of this call
# result.model , which model was selected
# result.kalibr_trace_id , for manual report() if neededProvider detection: tts-* / whisper-* -- OpenAI · eleven_* -- ElevenLabs · nova-* / aura-* -- Deepgram
Required env vars: OPENAI_API_KEY for OpenAI TTS · ELEVENLABS_API_KEY for ElevenLabs · DEEPGRAM_API_KEY for Deepgram
Install: pip install kalibr[voice] (includes ElevenLabs + Deepgram SDK)
What this does: Routes a speech-to-text (STT) request to the best model. Supports OpenAI Whisper and Deepgram Nova.
router = Router(
goal="transcribe_meeting",
paths=["whisper-1", "nova-2"], # OpenAI Whisper or Deepgram Nova
)
result = router.transcribe(audio_bytes, audio_duration_seconds=120.0)
# result.text , transcribed text
# result.cost_usd , cost of this call
# result.kalibr_trace_id , for manual report() if neededProvider detection: whisper-* -- OpenAI · nova-* / enhanced / base -- Deepgram
What this does: Reports outcome for the last completion. This is how Kalibr learns.
When to call: After you know whether the task succeeded or failed.
# Success router.report(success=True) # Failure with reason router.report(success=False, reason="invalid_json") # Success with quality score router.report(success=True, score=0.8) # Failure with structured category router.report(success=False, failure_category="timeout", reason="Provider timed out after 30s")
// Success await router.report(true); // Failure with reason await router.report(false, 'invalid_json'); // Success with score await router.report(true, undefined, 0.8);
| Argument | Type | Description |
|---|---|---|
| success | bool / boolean | Whether the task succeeded |
| Argument | Type | Description |
|---|---|---|
| reason | str / string | Failure reason (for debugging) |
| score | float / number | Quality score 0.0-1.0. Feeds directly into routing. A path that consistently scores 0.85 will be preferred over one scoring 0.6, even if both "succeed." When provided, score is used as a continuous signal in the routing engine, giving finer-grained path selection than boolean alone. A score of 0.85 counts as 0.85 successes and 0.15 failures. |
| failure_category | str / string | Structured failure category for clustering. Valid: timeout, context_exceeded, tool_error, rate_limited, validation_failed, hallucination_detected, user_unsatisfied, empty_response, malformed_output, auth_error, provider_error, healed, unknown. Raises ValueError if invalid. |
Calling report() without a prior completion() raises an error. Calling report() twice for the same completion logs a warning and ignores the second call.
Get the recommended path for a goal without making a completion call. Useful for inspecting what Kalibr would choose or for custom routing logic.
from kalibr import get_policy policy = get_policy(goal="book_meeting") print(policy["recommended_model"]) # "gpt-4o" print(policy["recommended_tool"]) # "calendar_api" print(policy["outcome_success_rate"]) # 0.87 print(policy["confidence"]) # 0.92
import { getPolicy } from '@kalibr/sdk';
const policy = await getPolicy({ goal: 'book_meeting' });
console.log(policy.recommendedModel); // "gpt-4o"
console.log(policy.recommendedTool); // "calendar_api"
console.log(policy.outcomeSuccessRate); // 0.87
console.log(policy.confidence); // 0.92policy = get_policy(
goal="book_meeting",
constraints={
"max_cost_usd": 0.05,
"max_latency_ms": 2000,
"min_quality": 0.8
}
)const policy = await getPolicy({
goal: 'book_meeting',
constraints: {
maxCostUsd: 0.05,
maxLatencyMs: 2000,
minQuality: 0.8,
},
});Kalibr will only recommend paths that meet all constraints. If no paths meet the constraints, the response will indicate no recommendation is available.
| Argument | Required | Description |
|---|---|---|
| goal | Yes | Goal name |
| constraints | No | Object with max_cost_usd, max_latency_ms, min_quality |
| Field | Description |
|---|---|
| recommended_model | Best model for this goal |
| recommended_tool | Best tool (if tools are tracked) |
| recommended_params | Best parameters (if params are tracked) |
| outcome_success_rate | Historical success rate for this path |
| confidence | Statistical confidence (0-1) |
| alternatives | Other viable paths ranked by performance |
Get a routing decision for a goal. Returns the routing decision for a goal. This is what Router.completion() calls internally, but available for low-level control.
from kalibr import decide
decision = decide(goal="book_meeting", task_risk_level="low")
print(decision["model_id"]) # "gpt-4o"
print(decision["tool_id"]) # "calendar_api" or None
print(decision["params"]) # {"temperature": 0.3} or {}
print(decision["trace_id"]) # "abc123...", pass this to report_outcome
print(decision["confidence"]) # 0.85| Argument | Required | Default | Description |
|---|---|---|---|
| goal | Yes | - | Goal name |
| task_risk_level | No | "low" | Risk tolerance: "low", "medium", or "high" |
Report execution outcome directly (without using Router). The feedback loop that teaches Kalibr what works.
from kalibr import report_outcome
report_outcome(
trace_id="abc123",
goal="book_meeting",
success=True,
score=0.95, # optional quality score 0-1
failure_category="timeout", # optional structured category
model_id="gpt-4o", # optional
)| Argument | Required | Default | Description |
|---|---|---|---|
| trace_id | Yes | - | Trace ID from decide() or completion |
| goal | Yes | - | Goal name |
| success | Yes | - | Whether the goal was achieved |
| score | No | None | Quality score 0-1 |
| failure_reason | No | None | Free-text failure reason |
| failure_category | No | None | Structured failure category (see FAILURE_CATEGORIES) |
| metadata | No | None | Additional context as dict |
| model_id | No | None | Model used |
| tool_id | No | None | Tool used |
| execution_params | No | None | Parameters used |
Update an existing outcome with a late-arriving signal. Use when the real success signal arrives after the initial report, for example, updating 48 hours later when a customer reopens a ticket that was initially reported as resolved.
Only fields that are explicitly passed (not None) will be updated. Other fields retain their original values.
from kalibr import update_outcome
# Customer reopened ticket 48 hours after "resolution"
result = update_outcome(
trace_id="abc123",
goal="resolve_ticket",
success=False,
failure_category="user_unsatisfied",
)
print(result["fields_updated"]) # ["success", "failure_category"]| Argument | Required | Default | Description |
|---|---|---|---|
| trace_id | Yes | - | Trace ID of the outcome to update |
| goal | Yes | - | Goal (must match original outcome) |
| success | No | None | Updated success status |
| score | No | None | Updated quality score 0-1 |
| failure_reason | No | None | Updated failure reason |
| failure_category | No | None | Updated failure category |
| metadata | No | None | Additional metadata to merge |
Returns 404 if no outcome exists for the given trace_id + goal combination.
Get structured diagnostics about what Kalibr has learned. Returns machine-readable intelligence per goal, designed for coding agents that need to decide what to improve.
Response includes schema_version: "1.0" for forward compatibility.
from kalibr import get_insights
# All goals
insights = get_insights()
# Single goal, custom window
insights = get_insights(goal="resolve_ticket", window_hours=48)
for goal in insights["goals"]:
print(f"{goal['goal']}: {goal['status']} ({goal['success_rate']:.0%})")
for signal in goal["actionable_signals"]:
if signal["severity"] == "critical":
print(f" {signal['type']}: {signal['data']}")| Argument | Required | Default | Description |
|---|---|---|---|
| goal | No | None | Filter to a specific goal (returns all if None) |
| window_hours | No | 168 | Time window for analysis (default 1 week) |
| Field | Description |
|---|---|
| goal | Goal name |
| status | healthy, degrading, failing, or insufficient_data |
| success_rate | Overall success rate (0-1) |
| sample_count | Total outcomes in window |
| trend | improving, stable, or degrading |
| confidence | Statistical confidence (0-1) |
| top_failure_modes | Failure categories ranked by frequency |
| paths | Per-path performance (success_rate, trend, cost, latency) |
| param_sensitivity | Parameters that significantly affect outcomes |
| actionable_signals | Machine-readable signals (see below) |
| Signal | Description |
|---|---|
| path_underperforming | A path is >15pp below the best path |
| failure_mode_dominant | One failure category accounts for >50% of failures |
| param_sensitivity_detected | A parameter value significantly affects outcomes (>10pp spread) |
| drift_detected | Path performance is degrading over time |
| cost_inefficiency | A cheaper path has similar success rate (within 5pp) |
| low_confidence | Path has fewer than 20 samples |
| goal_healthy | No action needed, goal is performing well |
Register a new routing path for a goal.
from kalibr import register_path
result = register_path(
goal="book_meeting",
model_id="gpt-4o",
tool_id="calendar_api", # optional
params={"temperature": 0.3}, # optional
risk_level="low", # optional: "low", "medium", "high"
)
print(result["path_id"])Constant containing all valid failure category values. Import and use for client-side validation.
from kalibr import FAILURE_CATEGORIES print(FAILURE_CATEGORIES) # ["timeout", "context_exceeded", "tool_error", "rate_limited", # "validation_failed", "hallucination_detected", "user_unsatisfied", # "empty_response", "malformed_output", "auth_error", "provider_error", "healed", "unknown"] # Used by report() and report_outcome(), raises ValueError if invalid category passed
The TypeScript SDK exports convenience functions for direct access to the Intelligence API.
import {
KalibrIntelligence,
getPolicy,
reportOutcome,
registerPath,
decide,
getRecommendation,
listPaths,
disablePath,
setExplorationConfig,
getExplorationConfig,
} from '@kalibr/sdk';
// Initialize singleton
KalibrIntelligence.init({
apiKey: process.env.KALIBR_API_KEY!,
tenantId: process.env.KALIBR_TENANT_ID!,
});
// Get routing decision
const decision = await decide('extract_company');
console.log(decision.model_id, decision.confidence);
// Report outcome directly
await reportOutcome(traceId, 'extract_company', true, {
score: 0.95,
modelId: 'gpt-4o',
});
// List registered paths
const { paths } = await listPaths({ goal: 'extract_company' });
// List registered paths
const allPaths = await listPaths({ goal: 'extract_company' });Wrap OpenAI or Anthropic clients to automatically trace all LLM calls.
import { createTracedOpenAI, createTracedAnthropic } from '@kalibr/sdk';
// Wrap OpenAI client, all calls auto-traced
const openai = createTracedOpenAI();
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello!' }],
});
// Wrap Anthropic client
const anthropic = createTracedAnthropic();
const message = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hello!' }],
});Run code within goal or trace contexts using async local storage.
import { withGoal, withTraceId, traceContext } from '@kalibr/sdk';
// Run code within a goal context
await withGoal('extract_company', async () => {
// All Kalibr operations inherit this goal
const response = await openai.chat.completions.create({...});
});
// Run code with a specific trace ID
await withTraceId('my-custom-trace-id', async () => {
// All operations use this trace ID
});
// Combined trace context
await traceContext({ traceId: 'my-trace', goal: 'summarize' }, async () => {
// Both trace ID and goal available
});| Variable | Required | Default | Description |
|---|---|---|---|
| KALIBR_API_KEY | Yes | - | Your API key from dashboard |
| KALIBR_TENANT_ID | Yes | default | Your tenant ID |
| KALIBR_AUTO_INSTRUMENT | No | true | Auto-instrument OpenAI/Anthropic/Google SDKs |
| KALIBR_INTELLIGENCE_URL | No | https://kalibr-intelligence.fly.dev | Intelligence service endpoint |
| OPENAI_API_KEY | For OpenAI | - | OpenAI models (gpt-4o, tts-1, whisper-1) |
| ANTHROPIC_API_KEY | For Anthropic | - | Claude models |
| GOOGLE_API_KEY | For Google | - | Gemini models |
| DEEPSEEK_API_KEY | For DeepSeek | - | deepseek-chat, deepseek-reasoner, deepseek-coder |
| HF_API_TOKEN | For HuggingFace | - | Private models or free-tier rate limit bypass |
| ELEVENLABS_API_KEY | For ElevenLabs | - | ElevenLabs TTS (eleven_multilingual_v2, etc) |
| DEEPGRAM_API_KEY | For Deepgram | - | Deepgram STT (nova-2) and TTS (aura-*) |
Intelligence service: https://kalibr-intelligence.fly.dev
All endpoints except GET /health require two headers: X-API-Key and X-Tenant-ID.
Get a routing decision for a goal. Selects the best registered path based on outcome history. Returns a trace_id that must be passed to report-outcome.
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/routing/decide \
-H "X-API-Key: your-key" \
-H "X-Tenant-ID: your-tenant" \
-H "Content-Type: application/json" \
-d '{"goal": "extract_company"}'
| Field | Required | Default | Description |
|---|---|---|---|
| goal | Yes | - | The goal to achieve (e.g. "extract_company") |
| task_risk_level | No | "low" | Risk level: "low", "medium", or "high" |
| Field | Type | Description |
|---|---|---|
| trace_id | string | Unique ID -- pass this to report-outcome |
| path_id | string | The selected path identifier |
| model_id | string | Model to use (e.g. "gpt-4o") |
| tool_id | string | null | Tool to use, if any |
| params | object | Execution parameters for this path |
| goal | string | Echo of the requested goal |
| reason | string | Why this path was chosen: "optimal" (best known path by success rate), "cost_optimized" (tied on quality, lower cost wins), or "fallback" (learning in progress, not enough data yet) |
| confidence | float | Confidence in this path (0-1) |
| exploration | bool | True if this is an exploration decision |
| success_rate | float | Historical success rate for this path |
Report execution outcome. This is the feedback loop that teaches Kalibr what works. Updates both ClickHouse (durable) and Redis (real-time).
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/report-outcome \
-H "X-API-Key: your-key" \
-H "X-Tenant-ID: your-tenant" \
-H "Content-Type: application/json" \
-d '{"trace_id": "abc123", "goal": "extract_company", "success": true}'
| Field | Required | Default | Description |
|---|---|---|---|
| trace_id | Yes | - | Trace ID from decide() or Router.completion() |
| goal | Yes | - | The goal this execution was trying to achieve |
| success | Yes | - | Whether the goal was achieved (boolean) |
| score | No | null | Quality score 0.0-1.0 |
| model_id | No | null | Model that was used. If omitted, looked up from trace. |
| failure_reason | No | null | Free-text failure description |
| failure_category | No | null | Structured category -- must be a value from FAILURE_CATEGORIES |
| tool_id | No | null | Tool that was used |
| execution_params | No | null | Parameters used (e.g. {"temperature": 0.3}) |
| metadata | No | null | Additional context as an object |
| Field | Type | Description |
|---|---|---|
| status | string | Always "accepted" on success |
| trace_id | string | Echo of the submitted trace ID |
| goal | string | Echo of the submitted goal |
Update an existing outcome with a late-arriving signal. Only fields explicitly passed (not null) are updated. Use for async validation, user feedback, or downstream confirmation.
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/update-outcome \
-H "X-API-Key: your-key" \
-H "X-Tenant-ID: your-tenant" \
-H "Content-Type: application/json" \
-d '{"trace_id": "abc123", "goal": "resolve_ticket", "success": false, "failure_category": "user_unsatisfied"}'
| Field | Required | Default | Description |
|---|---|---|---|
| trace_id | Yes | - | Trace ID of the outcome to update |
| goal | Yes | - | Goal name -- must match the original |
| success | No | null | Updated success status |
| score | No | null | Updated quality score 0.0-1.0 |
| failure_reason | No | null | Updated failure reason |
| failure_category | No | null | Updated failure category |
| metadata | No | null | Additional metadata to merge into existing |
| Field | Type | Description |
|---|---|---|
| status | string | "updated" or "no_changes" if nothing changed |
| trace_id | string | Echo of trace ID |
| goal | string | Echo of goal |
| fields_updated | string[] | List of field names that were actually changed |
Get structured diagnostics about what Kalibr has learned. Returns health status, failure mode breakdown, path comparisons, and actionable signals per goal.
curl "https://kalibr-intelligence.fly.dev/api/v1/intelligence/insights?window_hours=168&goal=resolve_ticket" \ -H "X-API-Key: your-key" \ -H "X-Tenant-ID: your-tenant"
| Param | Default | Description |
|---|---|---|
| window_hours | 168 | Time window in hours (default 1 week) |
| goal | null | Filter to a specific goal. Omit for all goals. |
| Field | Type | Description |
|---|---|---|
| schema_version | string | Always "1.0" |
| tenant_id | string | Your tenant ID |
| generated_at | string | ISO timestamp |
| goals | object[] | Per-goal insight objects |
| cross_goal_summary | object | Aggregate counts: total_goals, healthy, degrading, failing, insufficient_data, total_outcomes |
Each goals[] entry contains: goal, status ("healthy" / "degrading" / "failing" / "insufficient_data"), success_rate, sample_count, trend, trend_delta, confidence, top_failure_modes, paths, and actionable_signals.
Get the historically best-performing execution path for a goal. Unlike /decide, this is deterministic -- returns the historically best path with no sampling. Returns 404 if no execution data exists yet.
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/policy \
-H "X-API-Key: your-key" \
-H "X-Tenant-ID: your-tenant" \
-H "Content-Type: application/json" \
-d '{"goal": "summarize_ticket"}'
| Field | Required | Default | Description |
|---|---|---|---|
| goal | Yes | - | The goal to get policy for |
| task_type | No | null | Optional task type filter |
| window_hours | No | 168 | Time window for pattern analysis |
| include_tools | No | true | Whether to include tool recommendations |
| include_params | No | [] | Parameter keys to include recommendations for |
| constraints | No | null | Cost/latency/quality constraints object |
| Field | Type | Description |
|---|---|---|
| goal | string | Echo of the goal |
| recommended_model | string | Best model ID based on outcomes |
| recommended_provider | string | Provider name |
| outcome_success_rate | float | Historical success rate for this path |
| outcome_sample_count | int | Number of outcome reports used |
| confidence | float | Statistical confidence 0-1 |
| risk_score | float | Risk score 0-1, lower is better |
| reasoning | string | Human-readable explanation |
| alternatives | object[] | Other viable paths |
| source | string | "realtime" or "historical" |
| recommended_tool | string | null | Best tool for this goal |
| recommended_params | object | null | Recommended parameter values |
Get the next-best path after a primary model fails. Use for retry logic: call /decide, execute, fail, then call this with the failed models in exclude_models. Returns 404 if all registered paths are exhausted.
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/get-alternative \
-H "X-API-Key: your-key" \
-H "X-Tenant-ID: your-tenant" \
-H "Content-Type: application/json" \
-d '{"goal": "extract_company", "exclude_models": ["gpt-4o"]}'
| Field | Required | Default | Description |
|---|---|---|---|
| goal | Yes | - | The goal to get an alternative for |
| exclude_models | Yes | - | List of model IDs already tried |
| task_type | No | null | Optional task type filter |
| window_hours | No | 168 | Time window for pattern analysis |
| constraints | No | null | Cost/latency/quality constraints |
| Field | Type | Description |
|---|---|---|
| goal | string | Echo of the goal |
| recommended_model | string | Next-best model ID |
| recommended_provider | string | Provider name |
| outcome_success_rate | float | Historical success rate |
| confidence | float | Statistical confidence 0-1 |
| reasoning | string | Why this alternative was chosen |
| remaining_alternatives | int | Number of other alternatives still available |
Register a new execution path for a goal. Kalibr only routes to explicitly registered paths.
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/routing/paths \
-H "X-API-Key: your-key" \
-H "X-Tenant-ID: your-tenant" \
-H "Content-Type: application/json" \
-d '{"goal": "extract_company", "model_id": "gpt-4o"}'
| Field | Required | Default | Description |
|---|---|---|---|
| goal | Yes | - | The goal this path achieves |
| model_id | Yes | - | Model to use (e.g. "gpt-4o") |
| tool_id | No | null | Tool to use |
| params | No | {} | Execution parameters |
| risk_level | No | "low" | Risk level: "low", "medium", or "high" |
List registered paths for a goal.
curl "https://kalibr-intelligence.fly.dev/api/v1/routing/paths?goal=extract_company" \ -H "X-API-Key: your-key" \ -H "X-Tenant-ID: your-tenant"
Disable a path. Soft-deleted -- marked disabled, traffic stops, outcome history preserved.
curl -X DELETE https://kalibr-intelligence.fly.dev/api/v1/routing/paths/path_abc123 \ -H "X-API-Key: your-key" \ -H "X-Tenant-ID: your-tenant"
Get goal-level routing statistics including success rates, sample counts, and path performance.
curl "https://kalibr-intelligence.fly.dev/api/v1/routing/stats?goal=extract_company" \ -H "X-API-Key: your-key" \ -H "X-Tenant-ID: your-tenant"
Check service health. Does not require authentication.
curl https://kalibr-intelligence.fly.dev/api/v1/intelligence/health
Returns {"status": "healthy", "clickhouse": "connected", "redis": "connected", "last_aggregation": "..."}.
| Parameter | Default | Description |
|---|---|---|
| min_samples | 20 | Outcomes needed per path before stable routing |
| success_when | None | Heuristic auto-scoring (response length, structure, finish reason) |
| score_when | None | Heuristic auto-scoring used when both callbacks are None |