Reference

Goal taxonomy

The 12 goal types Kalibr uses for classification and routing. Input type, output type, and cognitive load determine the goal_id. The goal_id determines the default path order and the success contract.

Classification table

goal_id	Input to Output	Load	Default path order	Success contract
web_scraping	URL to rows	low	DeepSeek, Llama, Mixtral, gpt-4o-mini	field_completeness >= 0.8, min 1 row
data_enrichment	rows to rows	low	DeepSeek, Llama, Qwen, gpt-4o-mini	null_rate_after < null_rate_before
lead_scoring	text to score	low	DeepSeek, Llama, Mixtral, gpt-4o-mini	score numeric, in [0, 100]
classification	text to label	low	DeepSeek, Llama, Qwen, gpt-4o-mini	label in allowed_labels
summarization	text to prose	low	DeepSeek, Llama, Mixtral, claude-haiku	compression ratio 0.05 to 0.4
data_pipeline	data to rows	low	DeepSeek, Llama, Qwen, gpt-4o-mini	rows_out > 0, no exception
research	text to synthesis	medium	Llama, DeepSeek, deepseek-r1, claude-sonnet	structural: min 200 chars, no error markers + float judge 20%
outreach_generation	rows to content	medium	Llama, DeepSeek, Mixtral, claude-sonnet	structural: subject + body present, 50-2000 chars + float judge 20%
code_generation	any to code	high	Sonnet, GPT-4o, o3-mini, deepseek-r1	AST parse passes or tests_pass = True
code_review	code to prose	high	Sonnet, GPT-4o, deepseek-r1, o3-mini	min 50 chars of structured feedback
system_design	any to prose	high	Sonnet, deepseek-r1, GPT-4o, o3-mini	min 200 chars of structured output
agent_orchestration	multi to coordinates	high	Sonnet, GPT-4o, deepseek-r1, o3-mini	subtasks_completed = True, no timeout

Default path ordering

The listed order is the warm-start default: cheapest capable model first. This is used when your tenant has no outcome data yet for a given goal type. As outcomes accumulate, routing shifts based on actual results, the cheapest model that is currently succeeding wins. The default order only matters in the first few runs.

Skip routing for

Conversational replies, status checks, config changes, memory operations, and simple lookups. These carry no signal worth routing.

Eval rules

Structural eval fires synchronously after every task. Boolean pass/fail. Result goes to report_outcome(success=bool).
For research and outreach_generation: a float quality judge runs on 20% of successful structural evals, asynchronously, using DeepSeek. Scores 0.0 to 1.0 with the original request as context. Goes to report_outcome(success=True, score=float). All other task types: Boolean only.
score=float passed to report_outcome() should be derived from actual token cost in response.usage, not pre-weighted.
Adaptive sampling: 100% of runs until 50 outcomes per goal_id, then 25%.

How Kalibr works

Goal taxonomy

Classification table

Default path ordering

Skip routing for

Eval rules

Next