* feat(seahorse): implement short-term memory engine of seahorse
Add pkg/seahorse/ module implementing a SQLite-backed DAG-based summary
hierarchy for context management, ported from lossless-claw's LCM design:
- types.go + short_constants.go: core types (Message, Summary, Conversation,
ContextItem) and configuration constants (fanout, token targets, thresholds)
- migration.go: idempotent DB schema with FTS5 trigram tokenizer for CJK
- store.go: full SQLite CRUD (conversations, messages, summaries DAG,
context_items with ordinal gap numbering, FTS5 search)
- short_engine.go: Engine lifecycle (NewEngine, Ingest, Assemble, Compact),
session pattern filtering (ignore/stateless glob→regex compilation),
per-session mutex via sync.Map
- short_assembler.go: budget-aware context assembly with fresh tail protection
(32 messages), oldest-first eviction, summary XML formatting, RebuildContextItems
- short_compaction.go: leaf compaction (messages→summary) and condensed
compaction (summaries→higher-level summary), 3-level LLM escalation,
CompactUntilUnder for emergency overflow
- short_retrieval.go: lookupByID, FTS5/LIKE search, recursive expand with
token cap
- context_seahorse.go: agent.ContextManager adapter, registered as "seahorse",
provider↔seahorse message type conversion (ToolCalls, tool_result)
* fix(seahorse): correct 3 adapter bugs in context management
- TokenCount: use full message (Content+ToolCalls+Media) instead of Content-only
- Empty Content: rebuild Content from tool_result Parts when stored empty
- Duplicate summaries: summaries only in Summary field, not in History messages
- Grep: fix SearchResult.Snippet→Content for summaries
- Schema: fix FTS5 SQL uses VIRTUAL TABLE not TEMP TABLE
- TestFTS5SQLConstants: verify FTS5 SQL syntax correctness
- Test: fix flaky TestCompactLeaf
* fix(agent): ingest steering messages into seahorse SQLite
Steering messages were only persisted to session JSONL but not ingested
into seahorse SQLite, causing them to be missing from context assembly.
Added `ts.ingestMessage(turnCtx, al, pm)` call in the steering message
injection block alongside the existing JSONL persistence.
Test: TestSeahorseSteeringMessageIngested verifies steering messages
appear in seahorse SQLite DB after being processed.
* fix(seahorse): address 3 blocking bugs from code review
- Fix resequenceContextItemsTx scan error handling (store.go:850)
Changed `return err` to `return scanErr` to properly propagate scan errors
instead of returning nil (which silently corrupts data)
- Fix sql.NullString for INTEGER column (store.go:847)
Changed `mid` from sql.NullString to sql.NullInt64 since message_id
is INTEGER in schema. Removed unnecessary strconv.ParseInt call.
- Fix compactCondensed fallback deleting non-candidate items
Added ReplaceContextItemsWithSummary method for per-item deletion
when candidates are not contiguous in ordinal space.
Optimized to use range deletion when candidates are consecutive.
* fix(seahorse): pass Budget to Compact for correct condensed threshold
Issue #4 from PR review: When Budget was not passed to seahorse.Compact,
it defaulted to `tokensBefore * 0.75`, making `tokensBefore > budget`
always true and causing condensed compaction to trigger unnecessarily.
Changes:
- context_seahorse.go: Forward Budget from CompactRequest to CompactInput
- loop.go: Pass Budget (ContextWindow) in all 3 Compact calls
- Add test verifying condensed is skipped when tokens < threshold
- Fix lint issues in store.go and store_test.go
* fix(seahorse): add mutex for assembler lazy initialization
Issue #5 from PR review: The check-then-create pattern for e.assembler
was a data race when multiple goroutines called Assemble() concurrently:
if e.assembler == nil {
e.assembler = &Assembler{...}
}
Changes:
- Add assemblerMu sync.Mutex to Engine struct
- Add initAssemblerOnce() using double-checked locking (same pattern as initCompactionOnce)
- Add TestAssemblerLazyInitRace to verify thread-safety
* fix(seahorse): handle non-consecutive depths in selectShallowestCondensationCandidate
Issue #8 from PR review: the loop iterated depth 0, 1, 2... assuming
consecutive keys, but break when key was missing caused deeper depths
to never be checked.
Fix: collect all existing depth keys, sort, then iterate in order.
* fix(seahorse): wrap DeleteMessagesAfterID and appendContextItems in transactions
- DeleteMessagesAfterID: wrap all DELETE operations in a transaction for
atomicity, remove redundant manual FTS delete (handled by trigger)
- appendContextItems: use transaction to fix read-then-write race condition
- Add GetMaxOrdinalTx and resolveItemTokenCountTx for transaction-scoped queries
- Remove unused resolveItemTokenCount function
Fixes PR review issues 6 and 7.
* fix(seahorse): derive readable content from Parts and cap CompactUntilUnder iterations
- Derive readable content from MessageParts in AddMessageWithParts so
FTS5 indexing and summary formatting can access tool call information
- formatMessagesForSummary and truncateSummary now fall back to Parts
when Content is empty, fixing blank summaries for Part-based messages
- Add MaxCompactIterations (20) to prevent CompactUntilUnder infinite
loops; exceeded iterations are logged as warnings
Treat SystemParts as an alternative representation of message Content
rather than an additive one. This prevents systematic overestimation
of system message tokens which could trigger premature context
pruning or summarization.
- Picks the maximum of Content vs. SystemParts to stay conservative.
- Adds a per-part overhead (20 chars) to account for JSON metadata.
- Streamlines the ReasoningContent counting logic.
Fixes a deficiency where structured blocks for cache-aware adapters
caused overestimated budgets or hidden overflows.
When the entire history is a single Turn (one user message followed by
tool calls and responses, no subsequent user message), the only Turn
boundary is at index 0. Previously the fallback returned targetIndex,
which could land on a tool or assistant message — splitting the Turn.
Return 0 instead, so callers (forceCompression, summarizeSession) see
mid <= 0 and skip compression rather than cutting inside the Turn.
Two estimation bugs fixed:
1. Media tokens were added to the chars accumulator before the chars*2/5
conversion, resulting in 256*2/5=102 tokens per item instead of 256.
Fix: add media tokens directly to the final token count, bypassing
the character-based heuristic.
2. estimateMessageTokens counted both tc.Name and tc.Function.Name for
tool calls, but providers only send one (OpenAI-compat uses
function.name, Anthropic uses tc.Name). Fix: count tc.Function.Name
when Function is present, fall back to tc.Name only otherwise.
Also fix i18n hint text: "auto-detect" was misleading — the backend
uses a 4x max_tokens heuristic, not actual model detection.
Introduce parseTurnBoundaries() which identifies each Turn start index
in the session history. A Turn is a complete "user input → LLM iterations
→ final response" cycle (as defined in the agent refactor design #1316).
findSafeBoundary now uses Turn boundaries instead of raw role-scanning,
making the intent explicit: "find the nearest Turn boundary."
forceCompression drops the oldest half of Turns (not arbitrary messages),
which is simpler and more intuitive. The Turn-based approach naturally
prevents splitting tool-call sequences since each Turn is atomic.
estimateMessageTokens now counts ReasoningContent (extended thinking /
chain-of-thought) which can be substantial and is persisted in session
history. Media items get a fixed per-item overhead (256 tokens) since
actual cost depends on provider-specific image tokenization.
Separate context_window from max_tokens — they serve different purposes
(input capacity vs output generation limit). The previous conflation caused
premature summarization or missed compression triggers.
Changes:
- Add context_window field to AgentDefaults config (default: 4x max_tokens)
- Extract boundary-safe truncation helpers (isSafeBoundary, findSafeBoundary)
into context_budget.go — pure functions with no AgentLoop dependency
- forceCompression: align split to safe boundary so tool-call sequences
(assistant+ToolCalls → tool results) are never torn apart
- summarizeSession: use findSafeBoundary instead of hardcoded keep-last-4
- estimateTokens: count ToolCalls arguments and ToolCallID metadata,
not just Content — fixes systematic undercounting in tool-heavy sessions
- Add proactive context budget check before LLM call in runAgentLoop,
preventing 400 context-length errors instead of reacting to them
- Add estimateToolDefsTokens for tool definition token cost
Closes#556, closes#665
Ref #1439