Session history only stores user/assistant/tool messages — the system
prompt is built dynamically by BuildMessages. Remove the incorrect
system message from TestAgentLoop_ContextExhaustionRetry test data
to match the real data model that forceCompression operates on.
When the entire history is a single Turn (one user message followed by
tool calls and responses, no subsequent user message), the only Turn
boundary is at index 0. Previously the fallback returned targetIndex,
which could land on a tool or assistant message — splitting the Turn.
Return 0 instead, so callers (forceCompression, summarizeSession) see
mid <= 0 and skip compression rather than cutting inside the Turn.
Two estimation bugs fixed:
1. Media tokens were added to the chars accumulator before the chars*2/5
conversion, resulting in 256*2/5=102 tokens per item instead of 256.
Fix: add media tokens directly to the final token count, bypassing
the character-based heuristic.
2. estimateMessageTokens counted both tc.Name and tc.Function.Name for
tool calls, but providers only send one (OpenAI-compat uses
function.name, Anthropic uses tc.Name). Fix: count tc.Function.Name
when Function is present, fall back to tc.Name only otherwise.
Also fix i18n hint text: "auto-detect" was misleading — the backend
uses a 4x max_tokens heuristic, not actual model detection.
Introduce parseTurnBoundaries() which identifies each Turn start index
in the session history. A Turn is a complete "user input → LLM iterations
→ final response" cycle (as defined in the agent refactor design #1316).
findSafeBoundary now uses Turn boundaries instead of raw role-scanning,
making the intent explicit: "find the nearest Turn boundary."
forceCompression drops the oldest half of Turns (not arbitrary messages),
which is simpler and more intuitive. The Turn-based approach naturally
prevents splitting tool-call sequences since each Turn is atomic.
Add tests that reflect actual session data shape: history starts with
user messages (no system prompt), includes chained tool-call sequences,
reasoning content, and media items. Exercises the proactive budget check
path with BuildMessages-style assembled messages.
estimateMessageTokens now counts ReasoningContent (extended thinking /
chain-of-thought) which can be substantial and is persisted in session
history. Media items get a fixed per-item overhead (256 tokens) since
actual cost depends on provider-specific image tokenization.
Session history (GetHistory) contains only user/assistant/tool messages.
The system prompt is built dynamically by BuildMessages and is never
stored in session. The previous code incorrectly treated history[0] as
a system prompt, skipping the first user message and appending a
compression note to it.
Fix: operate on the full history slice, and record the compression
note in the session summary (which BuildMessages already injects into
the system prompt) rather than modifying any history message.
Separate context_window from max_tokens — they serve different purposes
(input capacity vs output generation limit). The previous conflation caused
premature summarization or missed compression triggers.
Changes:
- Add context_window field to AgentDefaults config (default: 4x max_tokens)
- Extract boundary-safe truncation helpers (isSafeBoundary, findSafeBoundary)
into context_budget.go — pure functions with no AgentLoop dependency
- forceCompression: align split to safe boundary so tool-call sequences
(assistant+ToolCalls → tool results) are never torn apart
- summarizeSession: use findSafeBoundary instead of hardcoded keep-last-4
- estimateTokens: count ToolCalls arguments and ToolCallID metadata,
not just Content — fixes systematic undercounting in tool-heavy sessions
- Add proactive context budget check before LLM call in runAgentLoop,
preventing 400 context-length errors instead of reacting to them
- Add estimateToolDefsTokens for tool definition token cost
Closes#556, closes#665
Ref #1439
* make gateway aware of config.json change
* fix according to code review
* fix lint
* fix review comment
* fix for review
* refactor to fix review
* fix for review
* fix for review
* feat(web_search): add load balance and failover for api keys
* feat(web_search): add load balance and failover for api keys
* lint
* new iter to get api key
* deleted conflicts
* feat(session): add SessionStore interface and JSONL backend adapter
Extract a SessionStore interface from the methods the agent loop uses
(AddMessage, GetHistory, SetSummary, TruncateHistory, Save, etc.).
Both SessionManager and the new JSONLBackend satisfy this interface,
allowing the persistence layer to be swapped transparently.
JSONLBackend wraps memory.Store and maps its error-returning API to
the fire-and-forget contract that the agent loop expects — write
errors are logged, reads return empty defaults on failure. Save()
triggers compaction to reclaim space after logical truncation.
Part of #1169
* test(session): add JSONLBackend integration tests
8 tests covering the full SessionStore contract through the JSONL
backend: message roundtrip, tool calls, summary, truncation with
compaction, history replacement, empty sessions, session isolation,
and the complete summarization flow (SetSummary → TruncateHistory →
Save).
Includes compile-time interface satisfaction checks for both
SessionManager and JSONLBackend.
Part of #1169
* feat(agent): wire JSONL session store into agent loop
Replace the concrete *SessionManager field with the SessionStore
interface and initialize the JSONL backend by default. Legacy .json
session files are auto-migrated on first startup. Falls back to
SessionManager if the JSONL store cannot be initialized.
The agent loop code (loop.go) requires zero changes — all method
calls work identically through the interface.
Closes#1169
* fix(session): propagate compact error from Save
Save() was swallowing the error returned by Compact and always
returning nil. Callers checking Save's return value would never
see a compaction failure. Return the error directly so the agent
loop can log or handle it as needed.
* feat(session): add Close to SessionStore interface
Add Close() error to SessionStore so callers can release resources
through the interface. JSONLBackend already had Close; this adds
a no-op implementation to SessionManager for compatibility.
* fix(session): close session stores on shutdown and harden migration
- Add Close() to AgentInstance, AgentRegistry, and AgentLoop so JSONL
file handles are released during gateway shutdown and CLI exit.
- Fall back to SessionManager when migration fails, preventing a split
state where some sessions live in JSONL and others remain in JSON.
- Add defer agentLoop.Close() in the CLI agent command path.
- Document SessionStore interface methods (fire-and-forget contract).
* feat(feishu): implement SendMedia and add send_file tool
Add outbound media support for the Feishu channel so the agent can send
images and files to users via the MediaStore pipeline.
Feishu channel:
- SendMedia dispatches media parts as image or file uploads
- sendImage uploads via Image.Create then sends image message
- sendFile uploads via File.Create then sends file message
- feishuFileType maps extensions to Feishu file_type values
send_file tool:
- New tool lets the LLM send a local file to the current chat
- Validates path, registers file in MediaStore, returns media ref
- Agent loop wires tool registration, MediaStore propagation, and
context updates
Tested on Radxa Cubie A7A (arm64) with Feishu websocket channel.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(agent): publish outbound media regardless of SendResponse flag
The SendResponse flag controls whether the agent loop publishes the
final text response (callers that publish it themselves set this to
false). However, the media publish path was also gated behind this
flag, which meant tool-produced media was silently dropped for normal
channel messages.
Media should be published immediately when a tool returns media refs,
independent of how the text response is delivered.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tools): use magic-bytes MIME detection and add file size limit to send_file
- Replace hardcoded extension-to-MIME map with h2non/filetype (magic
bytes) + mime.TypeByExtension fallback, consistent with the vision
pipeline in resolveMediaRefs
- Add configurable max file size check (defaults to config.DefaultMaxMediaSize,
20 MB) to prevent oversized uploads
- Add tests for magic-bytes detection, extension fallback, size limit,
and default max size
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(agent): add ForEachTool to AgentRegistry for cross-agent tool lookup
Extract the pattern of iterating agents to find a named tool into
AgentRegistry.ForEachTool, simplifying SetMediaStore propagation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(agent,tools): adapt send_file to ctx-based channel injection after upstream refactor
Replace ContextualTool interface (removed upstream) with direct ctx
reading in SendFileTool.Execute, using ToolChannel/ToolChatID helpers.
Remove updateToolContexts which is no longer needed since ExecuteWithContext
already injects channel/chatID into ctx for all tools.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(tools): support toggling send_file tool via config
Add SendFileConfig with Enabled field to ToolsConfig, defaulting to
true. Wrap send_file tool registration in loop.go with the config
check, consistent with the pattern used by other tools.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat(commands): Session management [Phase 1/2] command centralization and registration
* docs: add design for command registry post-review fixes
Documents the architecture decisions for fixing 5 Important issues
from code review: SubCommand pattern, Deps struct, command-group files,
Executor caching, and Telegram registration dedup.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(commands): add SubCommand type and EffectiveUsage method
Introduce SubCommand struct for declaring sub-commands structurally
within a parent command Definition. The EffectiveUsage() method
auto-generates usage strings from sub-command names and args,
preventing drift between help text and actual handler behavior.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(commands): add Deps struct and secondToken helper, remove dead contains()
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(commands): add sub-command routing to Executor
Uses Registry.Lookup for O(1) command dispatch instead of iterating
all definitions. Definitions with SubCommands are routed to matching
sub-command handlers. Missing or unknown sub-commands reply with
auto-generated usage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(commands): split into command-group files with Deps injection
Extract show/list/start/help into individual cmd_*.go files.
Replace config.Config parameter with Deps struct for runtime data.
Restore /show agents and /list agents sub-commands.
Use EffectiveUsage for auto-generated help text.
Bridge external callers (agent/loop.go, telegram.go) with Deps wrapper
until Task 5 fully wires the Deps fields.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* perf(commands): cache Executor in AgentLoop, wire Deps with runtime callbacks
Create Executor once in NewAgentLoop instead of per-message. Deps
closures capture AgentLoop pointer for late-bound access to
channelManager and runtime agent model.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(telegram): remove duplicate initBotCommands, keep async startCommandRegistration only
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore(commands): restore Outcome comments and annotate Deps.Config
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(commands): consolidate /switch into commands package, fix ! prefix
Move /switch model and /switch channel handling from inline loop.go
logic into cmd_switch.go using the SubCommand + Deps pattern. This
removes the OutcomePassthrough branch in handleCommand entirely.
Also replace the hardcoded "/" prefix check with commands.HasCommandPrefix
so that "!" prefixed commands are correctly routed to the Executor.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: add docs/plans to .gitignore and untrack existing files
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(commands): address code review findings
- Remove dead ExecuteResult.Reply field and unused branch in loop.go
- Extract shared agentsHandler for /show agents and /list agents
- Remove redundant firstToken/secondToken (use nthToken instead)
- Simplify Telegram startup: pass BuiltinDefinitions directly
- Centralize req.Reply nil guard in executeDefinition
- Extract unavailableMsg constant (was duplicated 5 times)
- Remove unused MessageID from Request
- Remove stale "reserved for Phase 2" comment on Deps.Config
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(commands): replace Deps with per-request Runtime
Separate stateless Registry (cached on AgentLoop) from per-request
Runtime (passed to handlers at execution time). This enables future
session management features to inject per-request context without
modifying the command registry.
- Rename Deps → Runtime, move to runtime.go
- Change Handler signature: func(ctx, req) error → func(ctx, req, rt *Runtime) error
- NewExecutor now takes (registry, runtime) — executor is created per-request
- BuiltinDefinitions() no longer takes parameters (stateless)
- AgentLoop caches cmdRegistry, builds Runtime via buildRuntime()
- Update all cmd_*.go handlers and tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: fix gci import grouping and godoc formatting
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(onboard): skip legacy AGENT.md when copying embedded workspace templates
The workspace/ directory contains both AGENT.md (legacy) and AGENTS.md
(current). copyEmbeddedToTarget was copying both, causing the test
TestCopyEmbeddedToTargetUsesAgentsMarkdown to fail. Skip AGENT.md
during the walk to match the expected behavior.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(agent): address self-review comments on loop.go
- Move cmdRegistry init into struct literal (review comment #11)
- Rename buildRuntime → buildCommandsRuntime for clarity (review comment #12)
- Add comment to default switch case explaining passthrough (review comment #13)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(commands): address code review findings on naming and correctness
- Rename dispatcher.go → request.go (no Dispatcher type remains)
- Rename cmd_agents.go → handler_agents.go (shared handler, not a top-level command)
- Add modelMu to protect AgentInstance.Model writes in SwitchModel
- Add ListDefinitions to Runtime so /help uses registry instead of BuiltinDefinitions()
- Fix SwitchChannel message: validation-only callback should not say "Switched"
- Propagate Reply errors in executor instead of discarding with _ =
- Add HasCommandPrefix unit test
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(onboard): extract legacy filename to constant
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(agent): handle commands before route error check
Move handleCommand() before the routeErr gate so global commands
(/help, /show, /switch) remain available even when routing fails.
Context-dependent commands that need a routed agent will report
"unavailable" through their nil-Runtime guards.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* revert: remove unnecessary AGENT.md skip in onboard
Reverts 02d0c04 and 74deae1. The test failure was caused by a local
leftover workspace/AGENT.md file (gitignored but embedded by go:embed).
Deleting the local file fixes the root cause; the code-level skip was
never needed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: executeDefinition Unknown option
* fix(agent): use routed agent for model commands, restore Telegram command diff
- Remove modelMu: message processing is serial, no concurrent writes
- Pass routed agent to handleCommand/buildCommandsRuntime instead of
always using default agent
- GetModelInfo/SwitchModel are nil when agent is nil (route failed),
handlers reply "unavailable"
- Restore GetMyCommands + slices.Equal check before SetMyCommands to
avoid unnecessary Telegram API calls on restart
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(commands): remove unintended config mutation in SwitchModel
SwitchModel should only update the routed agent's runtime Model field.
Writing to cfg.Agents.Defaults.ModelName was a behavioral change that
corrupts the default agent config when switching a non-default agent.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(commands): move /switch channel to /check channel
/switch channel only validates availability, not actually switching.
Rename to /check channel to match actual behavior. /switch channel
now shows a redirect message pointing users to the new command.
Addresses review feedback from yinwm on PR #959.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1. CJK token estimation: replace flat rune_count/3 with script-aware
counting — CJK runes (U+2E80–U+9FFF, U+F900–U+FAFF, U+AC00–U+D7AF)
count as 1 token each, non-CJK runes at /4. This fixes a 3x
underestimate for Chinese/Japanese/Korean text that could incorrectly
route complex CJK messages to the light model.
2. Routing observability: SelectModel now returns the computed score as
a third value. selectCandidates logs the score on both paths — Info
level for light model selection, Debug level for primary model
selection.
3. Added tests: TestExtractFeatures_TokenEstimate_Mixed (CJK+ASCII mix),
TestRouter_SelectModel_ReturnsScore.
Addresses review feedback from @mingmxren.
Upstream added ThinkingLevel, SummarizeMessageThreshold,
SummarizeTokenPercent, MaxMediaSize, and maybeSummarize.
Our branch added Router, LightCandidates, and selectCandidates.
Both sets of changes are kept. Dead updateToolContexts removed
(upstream deleted it; no callers exist).
* fix: eliminate data races on shared tool instances
Signed-off-by: Boris Bliznioukov <blib@mail.com>
* fix: remove unused indirect dependency on github.com/gdamore/tcell/v2
Signed-off-by: Boris Bliznioukov <blib@mail.com>
* fix: reviewer comments improve context handling for tool execution and ensure defaults for non-conversation callers
Signed-off-by: Boris Bliznioukov <blib@mail.com>
---------
Signed-off-by: Boris Bliznioukov <blib@mail.com>
* feat: add extended thinking support for Anthropic models
Support configurable thinking levels (off/low/medium/high/xhigh/adaptive)
via `agents.defaults.thinking_level` config field.
- "adaptive": uses Anthropic's adaptive thinking API (Claude 4.6+)
- "low/medium/high/xhigh": uses budget_tokens (all thinking-capable models)
- "off": disables thinking (default)
API constraints handled:
- Temperature cleared when thinking is enabled
- budget_tokens clamped to max_tokens-1
- Thinking response blocks parsed into Reasoning field
Relates to #645, #966
* fix: address PR review feedback for thinking support
- Add ThinkingCapable interface for provider capability detection
- Warn when thinking_level is set but provider doesn't support it
- Warn when temperature is cleared due to thinking enabled
- Adjust budget values per Anthropic best practices (medium=16K, xhigh=64K)
- Add budget clamp warning and 80% threshold warning
- Add parseResponse thinking block tests
- Add thinking_level field to config.example.json
* refactor: move ThinkingLevel from AgentDefaults to ModelConfig
Thinking is a model-level capability, not a global agent property.
Per-model config avoids silent ignoring on non-Anthropic providers
and eliminates spurious warning logs in multi-provider setups.
Addresses PR #1076 review feedback from @yinwm.
Resolve merge conflicts to keep both SearXNG and GLM Search
providers. Updated search priority order to:
Perplexity > Brave > SearXNG > Tavily > DuckDuckGo > GLM Search
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>