The 12 goal types Kalibr uses for classification and routing. Input type, output type, and cognitive load determine the goal_id. The goal_id determines the default path order and the success contract.
| goal_id | Input to Output | Load | Default path order | Success contract |
|---|---|---|---|---|
| web_scraping | URL to rows | low | DeepSeek, Llama, Mixtral, gpt-4o-mini | field_completeness >= 0.8, min 1 row |
| data_enrichment | rows to rows | low | DeepSeek, Llama, Qwen, gpt-4o-mini | null_rate_after < null_rate_before |
| lead_scoring | text to score | low | DeepSeek, Llama, Mixtral, gpt-4o-mini | score numeric, in [0, 100] |
| classification | text to label | low | DeepSeek, Llama, Qwen, gpt-4o-mini | label in allowed_labels |
| summarization | text to prose | low | DeepSeek, Llama, Mixtral, claude-haiku | compression ratio 0.05 to 0.4 |
| data_pipeline | data to rows | low | DeepSeek, Llama, Qwen, gpt-4o-mini | rows_out > 0, no exception |
| research | text to synthesis | medium | Llama, DeepSeek, deepseek-r1, claude-sonnet | structural: min 200 chars, no error markers + float judge 20% |
| outreach_generation | rows to content | medium | Llama, DeepSeek, Mixtral, claude-sonnet | structural: subject + body present, 50-2000 chars + float judge 20% |
| code_generation | any to code | high | Sonnet, GPT-4o, o3-mini, deepseek-r1 | AST parse passes or tests_pass = True |
| code_review | code to prose | high | Sonnet, GPT-4o, deepseek-r1, o3-mini | min 50 chars of structured feedback |
| system_design | any to prose | high | Sonnet, deepseek-r1, GPT-4o, o3-mini | min 200 chars of structured output |
| agent_orchestration | multi to coordinates | high | Sonnet, GPT-4o, deepseek-r1, o3-mini | subtasks_completed = True, no timeout |
The listed order is the warm-start default: cheapest capable model first. This is used when your tenant has no outcome data yet for a given goal type. As outcomes accumulate, routing shifts based on actual results, the cheapest model that is currently succeeding wins. The default order only matters in the first few runs.
Conversational replies, status checks, config changes, memory operations, and simple lookups. These carry no signal worth routing.