Getting started

User signals

Every time a user accepts or rejects an agent output, Kalibr can learn from it. Connect that signal and future runs route to better models automatically — no redeployment, no config changes.

What Kalibr does with signals

Structural evals and provider outcomes tell Kalibr what succeeded technically. User signals tell Kalibr what actually worked for real users.

How to send signals

Three levels. Pick the one that matches how your app is built.

Level 1 — Explicit (you know the outcome)

Use this when your app has a clear thumbs up / thumbs down interaction, or when you can detect acceptance from user behavior.

python
from kalibr.feedback import user_accepted, user_rejected, track_run

# After a pipeline run — store context so feedback can reference it
result = router.completion(messages=[...])
track_run(result)

# When user approves the output:
user_accepted()

# When user pushes back or asks for a redo:
user_rejected(reason="output too short")

Level 2 — Session-based (Kalibr classifies the signal automatically)

Use this when users send follow-up messages. Kalibr classifies whether the message is an acceptance, rejection, or continuation — you do not have to detect it yourself.

python
from kalibr.feedback import report_pipeline, report_user_turn

# After the pipeline runs — anchor this session:
report_pipeline(
    session_id="user-session-123",
    goal="research_summary",
    prompt=system_prompt,
    output=response,
    model="gpt-4o"
)

# When the user sends their next message:
report_user_turn(
    session_id="user-session-123",
    user_message=user_next_message  # Kalibr classifies this automatically
)
i report_user_turn uses a two-layer classifier: a heuristic check first, an LLM fallback only if confidence is below threshold. It runs in a background thread and never blocks your main loop.

Level 3 — Downstream actions (highest quality signal)

Use this when you can observe what the user did with the output — whether they used it verbatim, edited it, or discarded it. This overrides classifier results and gives Kalibr the most precise feedback.

python
from kalibr.feedback import report_action

# Output was used exactly as produced:
report_action(session_id="user-session-123", action_type="output_used_verbatim")

# Output was used but edited before use:
report_action(session_id="user-session-123", action_type="output_edited")

# Output was not used:
report_action(session_id="user-session-123", action_type="output_discarded")

What changes after signals arrive

Privacy