When the entire history is a single Turn (one user message followed by
tool calls and responses, no subsequent user message), the only Turn
boundary is at index 0. Previously the fallback returned targetIndex,
which could land on a tool or assistant message — splitting the Turn.
Return 0 instead, so callers (forceCompression, summarizeSession) see
mid <= 0 and skip compression rather than cutting inside the Turn.
Two estimation bugs fixed:
1. Media tokens were added to the chars accumulator before the chars*2/5
conversion, resulting in 256*2/5=102 tokens per item instead of 256.
Fix: add media tokens directly to the final token count, bypassing
the character-based heuristic.
2. estimateMessageTokens counted both tc.Name and tc.Function.Name for
tool calls, but providers only send one (OpenAI-compat uses
function.name, Anthropic uses tc.Name). Fix: count tc.Function.Name
when Function is present, fall back to tc.Name only otherwise.
Also fix i18n hint text: "auto-detect" was misleading — the backend
uses a 4x max_tokens heuristic, not actual model detection.
Introduce parseTurnBoundaries() which identifies each Turn start index
in the session history. A Turn is a complete "user input → LLM iterations
→ final response" cycle (as defined in the agent refactor design #1316).
findSafeBoundary now uses Turn boundaries instead of raw role-scanning,
making the intent explicit: "find the nearest Turn boundary."
forceCompression drops the oldest half of Turns (not arbitrary messages),
which is simpler and more intuitive. The Turn-based approach naturally
prevents splitting tool-call sequences since each Turn is atomic.
estimateMessageTokens now counts ReasoningContent (extended thinking /
chain-of-thought) which can be substantial and is persisted in session
history. Media items get a fixed per-item overhead (256 tokens) since
actual cost depends on provider-specific image tokenization.
Separate context_window from max_tokens — they serve different purposes
(input capacity vs output generation limit). The previous conflation caused
premature summarization or missed compression triggers.
Changes:
- Add context_window field to AgentDefaults config (default: 4x max_tokens)
- Extract boundary-safe truncation helpers (isSafeBoundary, findSafeBoundary)
into context_budget.go — pure functions with no AgentLoop dependency
- forceCompression: align split to safe boundary so tool-call sequences
(assistant+ToolCalls → tool results) are never torn apart
- summarizeSession: use findSafeBoundary instead of hardcoded keep-last-4
- estimateTokens: count ToolCalls arguments and ToolCallID metadata,
not just Content — fixes systematic undercounting in tool-heavy sessions
- Add proactive context budget check before LLM call in runAgentLoop,
preventing 400 context-length errors instead of reacting to them
- Add estimateToolDefsTokens for tool definition token cost
Closes#556, closes#665
Ref #1439