refactor(agent): context boundary detection, proactive budget check, and safe compression

Separate context_window from max_tokens — they serve different purposes
(input capacity vs output generation limit). The previous conflation caused
premature summarization or missed compression triggers.

Changes:
- Add context_window field to AgentDefaults config (default: 4x max_tokens)
- Extract boundary-safe truncation helpers (isSafeBoundary, findSafeBoundary)
  into context_budget.go — pure functions with no AgentLoop dependency
- forceCompression: align split to safe boundary so tool-call sequences
  (assistant+ToolCalls → tool results) are never torn apart
- summarizeSession: use findSafeBoundary instead of hardcoded keep-last-4
- estimateTokens: count ToolCalls arguments and ToolCallID metadata,
  not just Content — fixes systematic undercounting in tool-heavy sessions
- Add proactive context budget check before LLM call in runAgentLoop,
  preventing 400 context-length errors instead of reacting to them
- Add estimateToolDefsTokens for tool definition token cost

Closes #556, closes #665
Ref #1439
This commit is contained in:
xiaoen
2026-03-13 14:20:24 +08:00
parent 021aa7d6d5
commit 9c82b0baa2
5 changed files with 727 additions and 14 deletions
+12 -1
View File
@@ -127,6 +127,17 @@ func NewAgentInstance(
maxTokens = 8192
}
contextWindow := defaults.ContextWindow
if contextWindow == 0 {
// Default heuristic: 4x the output token limit.
// Most models have context windows well above their output limits
// (e.g., GPT-4o 128k ctx / 16k out, Claude 200k ctx / 8k out).
// 4x is a conservative lower bound that avoids premature
// summarization while remaining safe — the reactive
// forceCompression handles any overshoot.
contextWindow = maxTokens * 4
}
temperature := 0.7
if defaults.Temperature != nil {
temperature = *defaults.Temperature
@@ -224,7 +235,7 @@ func NewAgentInstance(
MaxTokens: maxTokens,
Temperature: temperature,
ThinkingLevel: thinkingLevel,
ContextWindow: maxTokens,
ContextWindow: contextWindow,
SummarizeMessageThreshold: summarizeMessageThreshold,
SummarizeTokenPercent: summarizeTokenPercent,
Provider: provider,