1. CJK token estimation: replace flat rune_count/3 with script-aware
counting — CJK runes (U+2E80–U+9FFF, U+F900–U+FAFF, U+AC00–U+D7AF)
count as 1 token each, non-CJK runes at /4. This fixes a 3x
underestimate for Chinese/Japanese/Korean text that could incorrectly
route complex CJK messages to the light model.
2. Routing observability: SelectModel now returns the computed score as
a third value. selectCandidates logs the score on both paths — Info
level for light model selection, Debug level for primary model
selection.
3. Added tests: TestExtractFeatures_TokenEstimate_Mixed (CJK+ASCII mix),
TestRouter_SelectModel_ReturnsScore.
Addresses review feedback from @mingmxren.
- classifier.go: s/honour/honor/ (American English per misspell)
- router.go: break SelectModel signature across lines (golines)
- router_test.go: break long Message literal (golines)
- router_test.go: replace CJK string literal with rune slice so
gosmopolitan does not flag the source file; behaviour is identical
Add three new files to pkg/routing/:
features.go — ExtractFeatures(msg, history) → Features
Computes five structural dimensions with zero keyword matching:
- TokenEstimate: rune_count/3 (CJK-safe token proxy)
- CodeBlockCount: ``` pairs in the message
- RecentToolCalls: tool call count in the last 6 history entries
- ConversationDepth: total messages in session
- HasAttachments: data URIs or media file extensions
classifier.go — Classifier interface + RuleClassifier
RuleClassifier uses a weighted sum that is capped at 1.0:
code block → +0.40 (triggers heavy model alone at 0.35 threshold)
token > 200 → +0.35 (triggers heavy model alone)
tool calls > 3 → +0.25
token 50-200 → +0.15
conversation depth > 10 → +0.10
attachment → 1.00 (hard gate, always heavy)
router.go — Router wraps config + Classifier
Router.SelectModel(msg, history, primaryModel) returns either the
configured light_model or the primary model depending on whether
the complexity score clears the threshold. Threshold defaults to
0.35 when zero/negative to prevent misconfiguration.
router_test.go — 34 tests covering all branches and edge cases
Introduce SenderInfo struct and pkg/identity package to standardize user
identification across all channels. Each channel now constructs structured
sender info (platform, platformID, canonicalID, username, displayName)
instead of ad-hoc string IDs. Allow-list matching supports all legacy
formats (numeric ID, @username, id|username) plus the new canonical
"platform:id" format. Session key resolution also handles canonical
peerIDs for backward-compatible identity link matching.