Getting started

User signals

Every time a user accepts or rejects an agent output, Kalibr can learn from it. Connect that signal and future runs route to better models automatically — no redeployment, no config changes.

What Kalibr does with signals

Structural evals and provider outcomes tell Kalibr what succeeded technically. User signals tell Kalibr what actually worked for real users.

User accepts output — Kalibr records which model and prompt combination worked for this goal and user context.
User rejects or reprompts — Kalibr records what failed and routes differently on the next run.
Signals aggregate every 5 minutes. After 5+ signals per model/goal pair, routing shifts toward what your users prefer.
You do not need to redeploy, change config, or do anything. Kalibr handles it.

How to send signals

Three levels. Pick the one that matches how your app is built.

Level 1 — Explicit (you know the outcome)

Use this when your app has a clear thumbs up / thumbs down interaction, or when you can detect acceptance from user behavior.

python

from kalibr.feedback import user_accepted, user_rejected, track_run

# After a pipeline run — store context so feedback can reference it
result = router.completion(messages=[...])
track_run(result)

# When user approves the output:
user_accepted()

# When user pushes back or asks for a redo:
user_rejected(reason="output too short")

Level 2 — Session-based (Kalibr classifies the signal automatically)

Use this when users send follow-up messages. Kalibr classifies whether the message is an acceptance, rejection, or continuation — you do not have to detect it yourself.

python

from kalibr.feedback import report_pipeline, report_user_turn

# After the pipeline runs — anchor this session:
report_pipeline(
    session_id="user-session-123",
    goal="research_summary",
    prompt=system_prompt,
    output=response,
    model="gpt-4o"
)

# When the user sends their next message:
report_user_turn(
    session_id="user-session-123",
    user_message=user_next_message  # Kalibr classifies this automatically
)

i report_user_turn uses a two-layer classifier: a heuristic check first, an LLM fallback only if confidence is below threshold. It runs in a background thread and never blocks your main loop.

Level 3 — Downstream actions (highest quality signal)

Use this when you can observe what the user did with the output — whether they used it verbatim, edited it, or discarded it. This overrides classifier results and gives Kalibr the most precise feedback.

python

from kalibr.feedback import report_action

# Output was used exactly as produced:
report_action(session_id="user-session-123", action_type="output_used_verbatim")

# Output was used but edited before use:
report_action(session_id="user-session-123", action_type="output_edited")

# Output was not used:
report_action(session_id="user-session-123", action_type="output_discarded")

What changes after signals arrive

Signals are written immediately to the event log.
Every 5 minutes, Kalibr aggregates signals per tenant, goal, and model.
User signals blend with structural eval outcomes when routing decisions are made.
Once a model/goal pair has 5+ signals, routing shifts toward models your users prefer.
No action needed on your side.

Privacy

User messages are never sent to Kalibr. Only the classification result (accepted/rejected) and the trace ID are transmitted.
report_user_turn classifies the message locally first. It only calls an LLM if the heuristic confidence is below 0.85.
The raw_evidence field (optional) is capped at 500 characters if you choose to include it.