API Reference

Python SDK v1.14.3 TypeScript SDK v1.11.0

Router

What this does: Routes requests to the best model based on learned outcomes. Works across any modality.

When to use: Create one Router per goal. Reuse for multiple requests in the same thread/async context.

python
from kalibr import Router

router = Router(
    goal="extract_company",           # required
    paths=["gpt-4o", "claude-sonnet-4-20250514"],  # required
    success_when=lambda x: len(x) > 0,  # optional, bool
    score_when=lambda x: min(1.0, len(x) / 500),  # optional, float 0-1
    auto_register=True,               # optional, default True
)
typescript
import { Router } from '@kalibr/sdk';

const router = new Router({
  goal: 'extract_company',           // required
  paths: ['gpt-4o', 'claude-sonnet-4-20250514'], // required
  successWhen: (output) => output.length > 0, // optional, boolean
  scoreWhen: (output) => Math.min(1.0, output.length / 500), // optional, float 0-1
  autoRegister: true,                // optional, default true
});

Required arguments

ArgumentTypeDescription
goalstr / stringName of the goal (e.g., "extract_company")
pathslist / arrayList of models or path configs

Optional arguments

PythonTypeScriptTypeDefaultDescription
success_whensuccessWhencallable / functionNoneFunction that takes output string and returns bool. Auto-calls report().
score_whenscoreWhencallable / functionNoneFunction that takes output string and returns float (0.0-1.0). Provides continuous quality signal to routing. Takes priority over success_when when both are set. Success is derived as score >= 0.5. Score is clamped to [0, 1].
session_idsessionIdstr / stringNoneOptional session identifier for session-aware routing. When provided, the intelligence service reads recent session momentum and may escalate the model if the session is widening (user frustrated). Falls back to the KALIBR_SESSION_ID environment variable when not passed.
prefer_cachedpreferCachedbool / booleanFalseWhen True, future routing decisions will prefer providers that have warm prompt caches for this goal. Wired in for forward compatibility.
judge_modelstrNonePython only. Model ID to use as a Gate 2 quality judge (e.g. "deepseek-chat"). When set, Kalibr runs an LLM quality check on each response; if the score falls below judge_threshold, it tries the next path (or repairs the prompt if repair_prompt=True). Requires DEEPSEEK_API_KEY or OPENAI_API_KEY in env.
judge_thresholdfloat0.7Python only. Quality score threshold for Gate 2. Responses scoring below this value trigger a model swap (or prompt repair). Range: 0.0–1.0.
repair_promptboolFalsePython only. When True and a Gate 2 quality check fails, Kalibr rewrites the user prompt using the judge model before trying the next path. The rewritten prompt is derived from the original request and the bad output. Only active when a judge_model is also set.
exploration_rateexplorationRatefloat / numberNone / 0.1Override the Thompson Sampling exploration rate (0.0–1.0). Higher values explore less-proven paths more aggressively. When None (Python default), the intelligence service controls exploration automatically.
auto_registerautoRegisterbool / booleanTrueRegister paths on init

Thread Safety

Router is NOT thread-safe. Internal state (trace_id) will be corrupted if used across threads/async contexts.

Common mistakes

Router.completion()

What this does: Makes a completion request with intelligent routing.

When to call: Every time your agent needs a model response for this goal.

Example

python
response = router.completion(
    messages=[
        {"role": "system", "content": "Extract company names."},
        {"role": "user", "content": "Hi, I'm Sarah from Stripe."}
    ],
    max_tokens=100
)
print(response.choices[0].message.content)
typescript
const response = await router.completion(
  [
    { role: 'system', content: 'Extract company names.' },
    { role: 'user', content: "Hi, I'm Sarah from Stripe." },
  ],
  { maxTokens: 100 }
);

console.log(response.choices[0].message.content);

Required arguments

ArgumentTypeDescription
messageslist / arrayOpenAI-format messages

Optional arguments

PythonTypeScriptTypeDescription
force_modelforceModelstr / stringOverride routing, use this model
max_tokensmaxTokensint / numberMaximum tokens in response
healinghealingbool / booleanDefault False. When True, the Router runs the structural gate after each call, classifies failures, repairs the meta prompt or swaps to the next path, and retries inside the same call. New in v1.14.
heal_confighealConfigHealConfigOptional. Tunes retry budget and which gates run during healing. See HealConfig below.
pipeline_idpipelineIdstr / stringOptional. Scopes outcome learning to this pipeline so routing signals don't bleed between unrelated agents that share a goal. New in v1.14.
**kwargsoptionsany / objectPassed to provider (temperature, etc.)

Common mistakes

Exceptions

Provider errors propagate to caller:

Intelligence service failures do NOT raise -- Router falls back to first path.

HealConfig

What this does: Configures how router.completion(healing=True) retries failed calls. Pass via heal_config=.

When to use: When the default heal budget or gate set does not fit the goal. For example: to enable an LLM-judge quality gate, disable meta-prompt repair, or change the retry count.

python
from kalibr import Router, HealConfig

config = HealConfig(
    max_retries=2,            # heal attempts before giving up
    gate2_enabled=True,       # LLM-judge quality gate
    meta_prompt_enabled=True, # repair meta prompt before model swap
)

response = router.completion(
    messages=[{"role": "user", "content": "Summarize: ..."}],
    healing=True,
    heal_config=config,
)

Fields

FieldTypeDefaultDescription
max_retriesint2Maximum heal attempts before returning the last response.
gate2_enabledboolFalseRun the LLM-judge quality gate in addition to the structural gate. Uses the caller's model/keys.
judge_modelstr"deepseek-chat"Model used for the Gate 2 LLM judge when gate2_enabled=True.
repair_modelstr or NoneNoneOptional override model for repair calls. When None, reuses the same model that produced the failing output.
meta_prompt_enabledboolFalseGenerate a task-specific system prompt via a cheap LLM before each heal step. Combined with repair prompts on retry. Fails open.

Router.pipeline()

What this does: Runs a sequence of routed, gated, healed steps as a single call. Each step picks its own goal and messages and runs the full self-healing loop independently.

When to use: Multi-step agent workflows (research → outreach, extract → enrich → classify) where you want every intermediate step to be evaluated and healed without writing orchestration code.

python
result = router.pipeline(
    [
        {"goal": "research", "messages": [...]},
        {"goal": "outreach_generation", "messages": [...], "chain": True},
    ],
    healing=True,
    pipeline_id="sales-outreach-prod",
)

Required arguments

ArgumentTypeDescription
stepslist[dict]Ordered list of step specs. Each step requires goal and messages. Set "chain": True to feed the previous step's output into the current step.

Optional arguments

ArgumentTypeDefaultDescription
healingboolFalseEnable the self-healing loop for every step.
heal_configHealConfigNoneOverride default heal budget / gates for every step.
pipeline_idstrNoneScope outcome learning to this pipeline so routing signals don't bleed between unrelated pipelines.

Returns: Pipeline result with per-step outputs and metadata. If a step fails after exhausting retries, the pipeline returns a partial result with the failure attached.

as_langchain()

Returns a LangChain-compatible LLM that uses Kalibr for routing. Use this to integrate with frameworks like CrewAI and LangChain.

python
from kalibr import Router

router = Router(
    goal="my_task",
    paths=["gpt-4o", "claude-sonnet-4-20250514"]
)

# Returns a LangChain BaseChatModel
llm = router.as_langchain()

# Use with any LangChain chain or CrewAI agent
from langchain_core.prompts import ChatPromptTemplate
chain = ChatPromptTemplate.from_template("{input}") | llm

Note: You still need to call router.report() after the chain completes to report the outcome.

Router.execute()

What this does: Routes any HuggingFace task with the same outcome-learning loop as completion(). Works for transcription, image generation, embeddings, classification, and all 17 HuggingFace task types.

When to call: When your agent needs to run a non-chat model task (audio, image, embedding, classification).

ParameterTypeRequiredDescription
taskstrYesHuggingFace task type (e.g. "automatic_speech_recognition", "text_to_image")
input_dataAnyYesTask-appropriate input (audio bytes, text prompt, image, etc.)
**kwargsNoPassed to HuggingFace provider

Returns: Task-native response (transcription text, PIL image, embedding vector, classification labels, etc.)

Examples

python
# Transcription
result = router.execute(task="automatic_speech_recognition", input_data=audio_bytes)

# Image generation
result = router.execute(task="text_to_image", input_data="a product photo of a laptop")

# Embeddings
result = router.execute(task="feature_extraction", input_data="search query text")

# Classification
result = router.execute(task="text_classification", input_data="This product is amazing!")

Supported task types: chat_completion, text_generation, summarization, translation, fill_mask, table_question_answering, automatic_speech_recognition, text_to_speech, audio_classification, text_to_image, image_to_text, image_classification, image_segmentation, object_detection, feature_extraction, text_classification, token_classification

Router.synthesize()

What this does: Routes a text-to-speech (TTS) request to the best voice model, with the same outcome-learning loop as completion(). Supports ElevenLabs, OpenAI TTS, and Deepgram Aura.

python
router = Router(
    goal="narrate",
    paths=["tts-1", "eleven_multilingual_v2"],  # OpenAI TTS or ElevenLabs
)
result = router.synthesize("Hello world", voice="alloy")
# result.audio     , audio bytes
# result.cost_usd  , cost of this call
# result.model     , which model was selected
# result.kalibr_trace_id , for manual report() if needed

Provider detection: tts-* / whisper-* -- OpenAI · eleven_* -- ElevenLabs · nova-* / aura-* -- Deepgram

Required env vars: OPENAI_API_KEY for OpenAI TTS · ELEVENLABS_API_KEY for ElevenLabs · DEEPGRAM_API_KEY for Deepgram

Install: pip install kalibr[voice] (includes ElevenLabs + Deepgram SDK)

Router.transcribe()

What this does: Routes a speech-to-text (STT) request to the best model. Supports OpenAI Whisper and Deepgram Nova.

python
router = Router(
    goal="transcribe_meeting",
    paths=["whisper-1", "nova-2"],  # OpenAI Whisper or Deepgram Nova
)
result = router.transcribe(audio_bytes, audio_duration_seconds=120.0)
# result.text      , transcribed text
# result.cost_usd  , cost of this call
# result.kalibr_trace_id , for manual report() if needed

Provider detection: whisper-* -- OpenAI · nova-* / enhanced / base -- Deepgram

Router.report()

What this does: Reports outcome for the last completion. This is how Kalibr learns.

When to call: After you know whether the task succeeded or failed.

Example

python
# Success
router.report(success=True)

# Failure with reason
router.report(success=False, reason="invalid_json")

# Success with quality score
router.report(success=True, score=0.8)

# Failure with structured category
router.report(success=False, failure_category="timeout", reason="Provider timed out after 30s")
typescript
// Success
await router.report(true);

// Failure with reason
await router.report(false, 'invalid_json');

// Success with score
await router.report(true, undefined, 0.8);

Required arguments

ArgumentTypeDescription
successbool / booleanWhether the task succeeded

Optional arguments

ArgumentTypeDescription
reasonstr / stringFailure reason (for debugging)
scorefloat / numberQuality score 0.0-1.0. Feeds directly into routing. A path that consistently scores 0.85 will be preferred over one scoring 0.6, even if both "succeed." When provided, score is used as a continuous signal in the routing engine, giving finer-grained path selection than boolean alone. A score of 0.85 counts as 0.85 successes and 0.15 failures.
failure_categorystr / stringStructured failure category for clustering. Valid: timeout, context_exceeded, tool_error, rate_limited, validation_failed, hallucination_detected, user_unsatisfied, empty_response, malformed_output, auth_error, provider_error, healed, unknown. Raises ValueError if invalid.

Common mistakes

When to call

Validation

Calling report() without a prior completion() raises an error. Calling report() twice for the same completion logs a warning and ignores the second call.

get_policy()

Get the recommended path for a goal without making a completion call. Useful for inspecting what Kalibr would choose or for custom routing logic.

python
from kalibr import get_policy

policy = get_policy(goal="book_meeting")

print(policy["recommended_model"])       # "gpt-4o"
print(policy["recommended_tool"])        # "calendar_api"
print(policy["outcome_success_rate"])    # 0.87
print(policy["confidence"])              # 0.92
typescript
import { getPolicy } from '@kalibr/sdk';

const policy = await getPolicy({ goal: 'book_meeting' });

console.log(policy.recommendedModel);      // "gpt-4o"
console.log(policy.recommendedTool);       // "calendar_api"
console.log(policy.outcomeSuccessRate);    // 0.87
console.log(policy.confidence);            // 0.92

With Constraints

python
policy = get_policy(
    goal="book_meeting",
    constraints={
        "max_cost_usd": 0.05,
        "max_latency_ms": 2000,
        "min_quality": 0.8
    }
)
typescript
const policy = await getPolicy({
    goal: 'book_meeting',
    constraints: {
        maxCostUsd: 0.05,
        maxLatencyMs: 2000,
        minQuality: 0.8,
    },
});

Kalibr will only recommend paths that meet all constraints. If no paths meet the constraints, the response will indicate no recommendation is available.

Parameters

ArgumentRequiredDescription
goalYesGoal name
constraintsNoObject with max_cost_usd, max_latency_ms, min_quality

Returns

FieldDescription
recommended_modelBest model for this goal
recommended_toolBest tool (if tools are tracked)
recommended_paramsBest parameters (if params are tracked)
outcome_success_rateHistorical success rate for this path
confidenceStatistical confidence (0-1)
alternativesOther viable paths ranked by performance

decide()

Get a routing decision for a goal. Returns the routing decision for a goal. This is what Router.completion() calls internally, but available for low-level control.

python
from kalibr import decide

decision = decide(goal="book_meeting", task_risk_level="low")

print(decision["model_id"])    # "gpt-4o"
print(decision["tool_id"])     # "calendar_api" or None
print(decision["params"])      # {"temperature": 0.3} or {}
print(decision["trace_id"])    # "abc123...", pass this to report_outcome
print(decision["confidence"])  # 0.85

Parameters

ArgumentRequiredDefaultDescription
goalYes-Goal name
task_risk_levelNo"low"Risk tolerance: "low", "medium", or "high"

report_outcome()

Report execution outcome directly (without using Router). The feedback loop that teaches Kalibr what works.

python
from kalibr import report_outcome

report_outcome(
    trace_id="abc123",
    goal="book_meeting",
    success=True,
    score=0.95,                    # optional quality score 0-1
    failure_category="timeout",    # optional structured category
    model_id="gpt-4o",            # optional
)

Parameters

ArgumentRequiredDefaultDescription
trace_idYes-Trace ID from decide() or completion
goalYes-Goal name
successYes-Whether the goal was achieved
scoreNoNoneQuality score 0-1
failure_reasonNoNoneFree-text failure reason
failure_categoryNoNoneStructured failure category (see FAILURE_CATEGORIES)
metadataNoNoneAdditional context as dict
model_idNoNoneModel used
tool_idNoNoneTool used
execution_paramsNoNoneParameters used

update_outcome()

Update an existing outcome with a late-arriving signal. Use when the real success signal arrives after the initial report, for example, updating 48 hours later when a customer reopens a ticket that was initially reported as resolved.

Only fields that are explicitly passed (not None) will be updated. Other fields retain their original values.

python
from kalibr import update_outcome

# Customer reopened ticket 48 hours after "resolution"
result = update_outcome(
    trace_id="abc123",
    goal="resolve_ticket",
    success=False,
    failure_category="user_unsatisfied",
)
print(result["fields_updated"])  # ["success", "failure_category"]

Parameters

ArgumentRequiredDefaultDescription
trace_idYes-Trace ID of the outcome to update
goalYes-Goal (must match original outcome)
successNoNoneUpdated success status
scoreNoNoneUpdated quality score 0-1
failure_reasonNoNoneUpdated failure reason
failure_categoryNoNoneUpdated failure category
metadataNoNoneAdditional metadata to merge

Returns 404 if no outcome exists for the given trace_id + goal combination.

get_insights()

Get structured diagnostics about what Kalibr has learned. Returns machine-readable intelligence per goal, designed for coding agents that need to decide what to improve.

Response includes schema_version: "1.0" for forward compatibility.

python
from kalibr import get_insights

# All goals
insights = get_insights()

# Single goal, custom window
insights = get_insights(goal="resolve_ticket", window_hours=48)

for goal in insights["goals"]:
    print(f"{goal['goal']}: {goal['status']} ({goal['success_rate']:.0%})")
    for signal in goal["actionable_signals"]:
        if signal["severity"] == "critical":
            print(f"  {signal['type']}: {signal['data']}")

Parameters

ArgumentRequiredDefaultDescription
goalNoNoneFilter to a specific goal (returns all if None)
window_hoursNo168Time window for analysis (default 1 week)

Response structure (per goal)

FieldDescription
goalGoal name
statushealthy, degrading, failing, or insufficient_data
success_rateOverall success rate (0-1)
sample_countTotal outcomes in window
trendimproving, stable, or degrading
confidenceStatistical confidence (0-1)
top_failure_modesFailure categories ranked by frequency
pathsPer-path performance (success_rate, trend, cost, latency)
param_sensitivityParameters that significantly affect outcomes
actionable_signalsMachine-readable signals (see below)

Actionable signal types

SignalDescription
path_underperformingA path is >15pp below the best path
failure_mode_dominantOne failure category accounts for >50% of failures
param_sensitivity_detectedA parameter value significantly affects outcomes (>10pp spread)
drift_detectedPath performance is degrading over time
cost_inefficiencyA cheaper path has similar success rate (within 5pp)
low_confidencePath has fewer than 20 samples
goal_healthyNo action needed, goal is performing well

register_path()

Register a new routing path for a goal.

python
from kalibr import register_path

result = register_path(
    goal="book_meeting",
    model_id="gpt-4o",
    tool_id="calendar_api",              # optional
    params={"temperature": 0.3},         # optional
    risk_level="low",                    # optional: "low", "medium", "high"
)
print(result["path_id"])

FAILURE_CATEGORIES

Constant containing all valid failure category values. Import and use for client-side validation.

python
from kalibr import FAILURE_CATEGORIES

print(FAILURE_CATEGORIES)
# ["timeout", "context_exceeded", "tool_error", "rate_limited",
#  "validation_failed", "hallucination_detected", "user_unsatisfied",
#  "empty_response", "malformed_output", "auth_error", "provider_error", "healed", "unknown"]

# Used by report() and report_outcome(), raises ValueError if invalid category passed

Intelligence API (TypeScript)

The TypeScript SDK exports convenience functions for direct access to the Intelligence API.

typescript
import {
  KalibrIntelligence,
  getPolicy,
  reportOutcome,
  registerPath,
  decide,
  getRecommendation,
  listPaths,
  disablePath,
  setExplorationConfig,
  getExplorationConfig,
} from '@kalibr/sdk';

// Initialize singleton
KalibrIntelligence.init({
  apiKey: process.env.KALIBR_API_KEY!,
  tenantId: process.env.KALIBR_TENANT_ID!,
});

// Get routing decision
const decision = await decide('extract_company');
console.log(decision.model_id, decision.confidence);

// Report outcome directly
await reportOutcome(traceId, 'extract_company', true, {
  score: 0.95,
  modelId: 'gpt-4o',
});

// List registered paths
const { paths } = await listPaths({ goal: 'extract_company' });

// List registered paths
const allPaths = await listPaths({ goal: 'extract_company' });

Auto-Instrumentation (TypeScript)

Wrap OpenAI or Anthropic clients to automatically trace all LLM calls.

typescript
import { createTracedOpenAI, createTracedAnthropic } from '@kalibr/sdk';

// Wrap OpenAI client, all calls auto-traced
const openai = createTracedOpenAI();
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello!' }],
});

// Wrap Anthropic client
const anthropic = createTracedAnthropic();
const message = await anthropic.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello!' }],
});

Context Management (TypeScript)

Run code within goal or trace contexts using async local storage.

typescript
import { withGoal, withTraceId, traceContext } from '@kalibr/sdk';

// Run code within a goal context
await withGoal('extract_company', async () => {
  // All Kalibr operations inherit this goal
  const response = await openai.chat.completions.create({...});
});

// Run code with a specific trace ID
await withTraceId('my-custom-trace-id', async () => {
  // All operations use this trace ID
});

// Combined trace context
await traceContext({ traceId: 'my-trace', goal: 'summarize' }, async () => {
  // Both trace ID and goal available
});

Environment Variables

VariableRequiredDefaultDescription
KALIBR_API_KEYYes-Your API key from dashboard
KALIBR_TENANT_IDYesdefaultYour tenant ID
KALIBR_AUTO_INSTRUMENTNotrueAuto-instrument OpenAI/Anthropic/Google SDKs
KALIBR_INTELLIGENCE_URLNohttps://kalibr-intelligence.fly.devIntelligence service endpoint
OPENAI_API_KEYFor OpenAI-OpenAI models (gpt-4o, tts-1, whisper-1)
ANTHROPIC_API_KEYFor Anthropic-Claude models
GOOGLE_API_KEYFor Google-Gemini models
DEEPSEEK_API_KEYFor DeepSeek-deepseek-chat, deepseek-reasoner, deepseek-coder
HF_API_TOKENFor HuggingFace-Private models or free-tier rate limit bypass
ELEVENLABS_API_KEYFor ElevenLabs-ElevenLabs TTS (eleven_multilingual_v2, etc)
DEEPGRAM_API_KEYFor Deepgram-Deepgram STT (nova-2) and TTS (aura-*)

REST API Endpoints

Intelligence service: https://kalibr-intelligence.fly.dev

All endpoints except GET /health require two headers: X-API-Key and X-Tenant-ID.

POST /api/v1/routing/decide

Get a routing decision for a goal. Selects the best registered path based on outcome history. Returns a trace_id that must be passed to report-outcome.

curl
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/routing/decide \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"goal": "extract_company"}'

Request

FieldRequiredDefaultDescription
goalYes-The goal to achieve (e.g. "extract_company")
task_risk_levelNo"low"Risk level: "low", "medium", or "high"

Response

FieldTypeDescription
trace_idstringUnique ID -- pass this to report-outcome
path_idstringThe selected path identifier
model_idstringModel to use (e.g. "gpt-4o")
tool_idstring | nullTool to use, if any
paramsobjectExecution parameters for this path
goalstringEcho of the requested goal
reasonstringWhy this path was chosen: "optimal" (best known path by success rate), "cost_optimized" (tied on quality, lower cost wins), or "fallback" (learning in progress, not enough data yet)
confidencefloatConfidence in this path (0-1)
explorationboolTrue if this is an exploration decision
success_ratefloatHistorical success rate for this path

POST /api/v1/intelligence/report-outcome

Report execution outcome. This is the feedback loop that teaches Kalibr what works. Updates both ClickHouse (durable) and Redis (real-time).

curl
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/report-outcome \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"trace_id": "abc123", "goal": "extract_company", "success": true}'

Request

FieldRequiredDefaultDescription
trace_idYes-Trace ID from decide() or Router.completion()
goalYes-The goal this execution was trying to achieve
successYes-Whether the goal was achieved (boolean)
scoreNonullQuality score 0.0-1.0
model_idNonullModel that was used. If omitted, looked up from trace.
failure_reasonNonullFree-text failure description
failure_categoryNonullStructured category -- must be a value from FAILURE_CATEGORIES
tool_idNonullTool that was used
execution_paramsNonullParameters used (e.g. {"temperature": 0.3})
metadataNonullAdditional context as an object

Response

FieldTypeDescription
statusstringAlways "accepted" on success
trace_idstringEcho of the submitted trace ID
goalstringEcho of the submitted goal

POST /api/v1/intelligence/update-outcome

Update an existing outcome with a late-arriving signal. Only fields explicitly passed (not null) are updated. Use for async validation, user feedback, or downstream confirmation.

curl
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/update-outcome \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"trace_id": "abc123", "goal": "resolve_ticket", "success": false, "failure_category": "user_unsatisfied"}'

Request

FieldRequiredDefaultDescription
trace_idYes-Trace ID of the outcome to update
goalYes-Goal name -- must match the original
successNonullUpdated success status
scoreNonullUpdated quality score 0.0-1.0
failure_reasonNonullUpdated failure reason
failure_categoryNonullUpdated failure category
metadataNonullAdditional metadata to merge into existing

Response

FieldTypeDescription
statusstring"updated" or "no_changes" if nothing changed
trace_idstringEcho of trace ID
goalstringEcho of goal
fields_updatedstring[]List of field names that were actually changed

GET /api/v1/intelligence/insights

Get structured diagnostics about what Kalibr has learned. Returns health status, failure mode breakdown, path comparisons, and actionable signals per goal.

curl
curl "https://kalibr-intelligence.fly.dev/api/v1/intelligence/insights?window_hours=168&goal=resolve_ticket" \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant"

Query params

ParamDefaultDescription
window_hours168Time window in hours (default 1 week)
goalnullFilter to a specific goal. Omit for all goals.

Response shape

FieldTypeDescription
schema_versionstringAlways "1.0"
tenant_idstringYour tenant ID
generated_atstringISO timestamp
goalsobject[]Per-goal insight objects
cross_goal_summaryobjectAggregate counts: total_goals, healthy, degrading, failing, insufficient_data, total_outcomes

Each goals[] entry contains: goal, status ("healthy" / "degrading" / "failing" / "insufficient_data"), success_rate, sample_count, trend, trend_delta, confidence, top_failure_modes, paths, and actionable_signals.

POST /api/v1/intelligence/policy

Get the historically best-performing execution path for a goal. Unlike /decide, this is deterministic -- returns the historically best path with no sampling. Returns 404 if no execution data exists yet.

curl
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/policy \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"goal": "summarize_ticket"}'

Request

FieldRequiredDefaultDescription
goalYes-The goal to get policy for
task_typeNonullOptional task type filter
window_hoursNo168Time window for pattern analysis
include_toolsNotrueWhether to include tool recommendations
include_paramsNo[]Parameter keys to include recommendations for
constraintsNonullCost/latency/quality constraints object

Response

FieldTypeDescription
goalstringEcho of the goal
recommended_modelstringBest model ID based on outcomes
recommended_providerstringProvider name
outcome_success_ratefloatHistorical success rate for this path
outcome_sample_countintNumber of outcome reports used
confidencefloatStatistical confidence 0-1
risk_scorefloatRisk score 0-1, lower is better
reasoningstringHuman-readable explanation
alternativesobject[]Other viable paths
sourcestring"realtime" or "historical"
recommended_toolstring | nullBest tool for this goal
recommended_paramsobject | nullRecommended parameter values

POST /api/v1/intelligence/get-alternative

Get the next-best path after a primary model fails. Use for retry logic: call /decide, execute, fail, then call this with the failed models in exclude_models. Returns 404 if all registered paths are exhausted.

curl
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/intelligence/get-alternative \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"goal": "extract_company", "exclude_models": ["gpt-4o"]}'

Request

FieldRequiredDefaultDescription
goalYes-The goal to get an alternative for
exclude_modelsYes-List of model IDs already tried
task_typeNonullOptional task type filter
window_hoursNo168Time window for pattern analysis
constraintsNonullCost/latency/quality constraints

Response

FieldTypeDescription
goalstringEcho of the goal
recommended_modelstringNext-best model ID
recommended_providerstringProvider name
outcome_success_ratefloatHistorical success rate
confidencefloatStatistical confidence 0-1
reasoningstringWhy this alternative was chosen
remaining_alternativesintNumber of other alternatives still available

POST /api/v1/routing/paths

Register a new execution path for a goal. Kalibr only routes to explicitly registered paths.

curl
curl -X POST https://kalibr-intelligence.fly.dev/api/v1/routing/paths \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant" \
  -H "Content-Type: application/json" \
  -d '{"goal": "extract_company", "model_id": "gpt-4o"}'

Request

FieldRequiredDefaultDescription
goalYes-The goal this path achieves
model_idYes-Model to use (e.g. "gpt-4o")
tool_idNonullTool to use
paramsNo{}Execution parameters
risk_levelNo"low"Risk level: "low", "medium", or "high"

GET /api/v1/routing/paths

List registered paths for a goal.

curl
curl "https://kalibr-intelligence.fly.dev/api/v1/routing/paths?goal=extract_company" \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant"

DELETE /api/v1/routing/paths/{path_id}

Disable a path. Soft-deleted -- marked disabled, traffic stops, outcome history preserved.

curl
curl -X DELETE https://kalibr-intelligence.fly.dev/api/v1/routing/paths/path_abc123 \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant"

GET /api/v1/routing/stats

Get goal-level routing statistics including success rates, sample counts, and path performance.

curl
curl "https://kalibr-intelligence.fly.dev/api/v1/routing/stats?goal=extract_company" \
  -H "X-API-Key: your-key" \
  -H "X-Tenant-ID: your-tenant"

GET /api/v1/intelligence/health

Check service health. Does not require authentication.

curl
curl https://kalibr-intelligence.fly.dev/api/v1/intelligence/health

Returns {"status": "healthy", "clickhouse": "connected", "redis": "connected", "last_aggregation": "..."}.

Default Values

ParameterDefaultDescription
min_samples20Outcomes needed per path before stable routing
success_whenNoneHeuristic auto-scoring (response length, structure, finish reason)
score_whenNoneHeuristic auto-scoring used when both callbacks are None

Next