Add ActualSystemPrompt and InitialMessages fields to SubTurnConfig to enable
stateful worker context passing across multiple evaluation iterations.
Changes:
- Add ActualSystemPrompt field to separate system role from user task description
- Add InitialMessages field to preload ephemeral session history before agent loop starts
- Add Messages field to ToolResult for carrying session history (internal use, not serialized)
- Update runTurn to inject system prompt and preload history from InitialMessages
- Update AgentLoopSpawner to map new fields from tools.SubTurnConfig to agent.SubTurnConfig
This enables the evaluator-optimizer execution strategy in team tool to maintain
worker context across iterations while keeping SubTurn isolation intact.
* feat(config): support multiple API keys for failover
Add api_keys field to ModelConfig to support multiple API keys with
automatic failover. When multiple keys are configured, they are expanded
into separate model entries with fallbacks set up for key-level failover.
Example config:
{
"model_name": "glm-4.7",
"model": "zhipu/glm-4.7",
"api_keys": ["key1", "key2", "key3"]
}
Expands internally to:
- glm-4.7 (key1) -> fallbacks: [glm-4.7__key_1, glm-4.7__key_2]
- glm-4.7__key_1 (key2)
- glm-4.7__key_2 (key3)
Backward compatible: single api_key still works as before.
* fix(providers): change cooldown tracking from provider to ModelKey
This enables proper key-switching when multiple API keys share the same
provider. Previously, when one key failed, all keys were blocked because
cooldown was tracked per-provider.
Now each (provider, model) combination has independent cooldown, allowing
fallback to alternate keys when one is rate limited.
Includes TestMultiKeyWithModelFallback and related failover tests.
* feat(feishu): add Lark (international) support via IsLark config field
Add IsLark field to FeishuConfig to switch between Feishu and Lark
domains. Also fix domain inconsistency where WS client defaulted to
LarkBaseUrl while HTTP client used FeishuBaseUrl.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update documentation and web UI for Lark support
Add is_lark field to config example, feishu docs, i18n translations,
and web frontend form.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tools): propagate tool registry to subagents via Clone
SubagentManager was created with an empty ToolRegistry and SetTools()
was never called, causing all subagent tool invocations to fail with
"tool not found". This was a regression from the multi-agent refactor.
Fix: clone the parent agent's tool registry into the subagent manager
after creation but before spawn/spawn_status registration — giving
subagents access to file, exec, web, and other tools while preventing
recursive subagent spawning.
- Add ToolRegistry.Clone() for independent shallow copies
- Call subagentManager.SetTools(agent.Tools.Clone()) in registerSharedTools
- Add tests for Clone isolation, empty clone, and hidden tool state
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tools): fix cron_test build error and add TTL clone test
- Fix cron_test.go:229 — replace non-existent SubscribeOutbound(ctx)
with select on OutboundChan(), matching the MessageBus channel API
- Add TestToolRegistry_Clone_PreservesTTLValue per reviewer feedback
- Add version reset note to Clone() doc comment
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When building parameters for Anthropic API calls, tool calls with empty
names would cause 400 Bad Request errors with the message:
'tool_use.name: String should have at least 1 character'
This fix adds a check to skip tool calls that have empty names, preventing
the API error and allowing the conversation to continue normally.
Fixes#1658
The Lark SDK v3's built-in token retry loop does not clear stale tokens
from cache when the server returns error 99991663 (tenant_access_token
invalid), causing all API calls to fail until the token naturally
expires (~2 hours).
- Add tokenCache struct (implementing larkcore.Cache) with
Get/Set/InvalidateAll methods and proper expired-entry cleanup
- Wire custom cache into lark.NewClient via WithTokenCache()
- Add invalidateTokenOnAuthError helper called in all API methods
* Add Novita provider support
- Add 'novita' prefix to normalizeModel switch in openai_compat provider
- Add Novita provider to all_supported_vendors table in README.md
- Add test cases for Novita model prefix stripping
Novita endpoint: https://api.novita.ai/openai
Default models: deepseek/deepseek-v3.2, zai-org/glm-5, minimax/minimax-m2.5
* feat: complete Novita provider integration
* chore: drop README changes from Novita PR
* fix: remove duplicate function declarations in openai_compat provider
The functions buildToolsList, SupportsNativeSearch, and isNativeSearchHost
were declared twice, causing compilation failures in all CI checks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: break long line in novita test to satisfy golines linter
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Critical flag was declared but never acted on; non-critical SubTurns
now break out of the iteration loop when IsParentEnded() returns true
- tools.SubTurnConfig was missing Critical/Timeout/MaxContextRunes,
making those fields unreachable from the tools layer; added fields and
wired them through AgentLoopSpawner.SpawnSubTurn
- Removed subTurnResults sync.Map from AgentLoop — it was a redundant
alias for the same channel already stored in turnState.pendingResults;
dequeuePendingSubTurnResults now reads directly via activeTurnStates
- Replace hardcoded concurrencySem size 5 with maxConcurrentSubTurns constant
- Update affected tests to match new dequeuePendingSubTurnResults API
* fix(telegram): improve HTML chunking and preserve word boundaries
* fix(telegram): address copilot feedback, filter empty chunks and add word-boundary regression test
* style(telegram): fix gofmt and gci lint errors in tests
* fix to feedback
- Added `/subagents` platform command to visualize the active task tree.
- Implemented GetAllActiveTurns and FormatTree in AgentLoop to support cross-session observability.
- Fixed a bug where sub-turns spawned via tools were not registered in the global `activeTurnStates` map, making them invisible to system queries.
- Enhanced tree rendering logic to identify and display "orphaned" subagents (children that outlive their parent turns).
- Registered the new command in `builtin.go` and injected the turn state provider into the commands runtime.
Modified Files:
- pkg/agent/turn_state.go: Added TurnInfo snapshotting and recursive tree formatting.
- pkg/agent/loop.go: Injected GetActiveTurn hook and implemented multi-root forest rendering.
- pkg/agent/subturn.go: Added child turn registration into activeTurnStates.
- pkg/commands/cmd_subagents.go: New command implementation.
- pkg/commands/builtin.go: Command registration.
This commit addresses several critical concurrency and state management bugs within the SubTurn execution and delivery logic.
1. Fix Goroutine Leak & Deadlock in deliverSubTurnResult:
- Replaced non-blocking select with a safe blocking select that listens to `resultChan` and a new `<-parentTS.Finished()` channel.
- This ensures results are not arbitrarily dropped when the channel is full (preventing orphaned valid results), while also guaranteeing the child goroutine safely unblocks and exits if the parent finishes execution early.
2. Prevent "Send on Closed Channel" Fatal Panics:
- Removed `close(pendingResults)` and `drainPendingResults` from `turnState.Finish()`.
- The pendingResults channel is now naturally garbage collected, completely eliminating the race condition panic when a child attempts delivery at the exact moment the parent finishes.
- Added a `defer recover()` failsafe inside deliverSubTurnResult to gracefully emit Orphan events in extreme edge cases.
3. Fix Truncation Recovery Prompt Drop:
- Fixed the runTurn truncation retry logic by introducing an explicit `promptAlreadyAdded` boolean.
- Ensures that the dynamically generated `recoveryPrompt` is correctly injected into the LLM history sequence on subsequent iterations, adhering to API roles without duplicating arrays.
4. Test Suite Stabilization:
- Fixed TestDeliverSubTurnResultNoDeadlock to accurately wait for deterministic deliveries instead of racing timeouts.
- Replaced defunct closed-channel tests with TestFinishedChannelClosedState matching the new Finished() mechanism.
- Fixed the Finish(true) parameter in TestGrandchildAbort_CascadingCancellation to correctly validate Context cascade behavior.
- All tests now pass cleanly without hanging or emitting false positives.
Problem:
During subturn context limit or truncation recoveries, the recovery loops repeatedly
called `runAgentLoop` with the same or modified `UserMessage`. Because `runAgentLoop`
unconditionally adds the `UserMessage` to the session history, this resulted in:
1. Duplicate User Messages polluting the history upon `context_length_exceeded` retries.
2. The possibility of injecting empty User Messages if `opts.UserMessage` was artificially blanked out to work around the duplication.
3. Messy or duplicate entries during `finish_reason="truncated"` recovery injections.
Solution:
- Introduce `SkipAddUserMessage` boolean to `processOptions` to explicitly control whether the agent loop should write the user prompt to history.
- Add an explicit `opts.UserMessage != ""` check in `runAgentLoop` to prevent polluting history with empty message content.
- In `subturn.go`'s recovery loop, set `SkipAddUserMessage: contextRetryCount > 0` to skip writing the user message on context
* config: add prefer_native and NativeSearchCapable for model-native search
* providers: implement native web search for OpenAI and Codex
* agent: use provider-native search when prefer_native and supported
* tests: add coverage for model-native search
* fix: Golang lint errors
* fix: update the code based on the review
* fix: update codex_provider_test
* fix: Fixed the bug where the bus was closed and consumers had unfinished messages.
* fix: remove unnecessary blank line in Close method
* fix: refactor message bus and channel handling for improved performance and reliability
* fix: improve message handling and bus closure logic for better reliability
* fix: reduce sleep duration in agent loop for improved responsiveness
* fix the test case
* feat(cron): enhance CronService with wake channel and improve job scheduling logic
* fix(cron): update file permission mode to use octal notation in test and fix some lint errors
* fix(cron): improve wake channel handling and enhance concurrency in tests
Problem:
When parent turn finishes early, all child SubTurns receive "context canceled"
error,because child context was derived from parent context.
Solution:
Implement a lifecycle management system that distinguishes between:
- Graceful finish (Finish(false)): signals parentEnded, children continue
- Hard abort (Finish(true)): immediately cancels all children
Changes:
- turn_state.go:
- Add parentEnded atomic.Bool to signal parent completion
- Add parentTurnState reference for IsParentEnded() checks
- Modify Finish(isHardAbort bool) to distinguish abort types
- subturn.go:
- Add Critical bool to SubTurnConfig (Critical SubTurns continue after parent ends)
- Add Timeout time.Duration for SubTurn self-protection
- Use independent context (context.Background()) instead of derived context
- SubTurns check IsParentEnded() to decide whether to continue or exit
- loop.go:
- Call Finish(false) for normal completion (graceful)
- Add IsParentEnded() check in LLM iteration loop
- steering.go:
- HardAbort calls Finish(true) to immediately cancel children
Behavior:
- Normal finish: parentEnded=true, children continue, orphan results delivered
- Hard abort: all children cancelled immediately via context
- Critical SubTurns: continue running after parent finishes gracefully
- Non-Critical SubTurns: can exit gracefully when IsParentEnded() returns true
Includes JSONL session persistence (#1170), spawn_status tool, Azure provider,
credential encryption, and various fixes. SubTurn features preserved and
integrated with new spawn_status functionality.
- add a dedicated exec settings section in the config page
- support timeout and custom allow/deny regex patterns for exec
- validate custom exec regex patterns in the config API
- block cron command scheduling and execution when exec is disabled
- update tests and i18n strings for the new command settings
* feat(gateway): support hot reload and empty startup
- extract gateway runtime into pkg/gateway
- add gateway.hot_reload config with default and example values
- allow starting the gateway without a default model via --allow-empty
- stop treating missing enabled channels as a startup error
- update related tests
* feat: replace gateway SSE updates with polling-based state sync
- remove gateway SSE broadcasting and event endpoint
- add polling-based gateway status refresh with stopping state handling
- detect when gateway restart is required after default model changes
- resolve gateway health and websocket proxy targets from configured host
- update gateway UI labels and add backend/frontend test coverage
* feat(tools): add SpawnStatusTool for reporting subagent statuses
* feat(tools): enhance SpawnStatusTool to restrict task visibility by conversation context
* feat(tests): add Unicode result truncation and channel filtering tests for SpawnStatusTool
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* feat(tools): enhance SpawnStatusTool with task ID validation and sorting by creation timestamp
* feat(tools): update SpawnStatusTool description and parameter documentation for clarity
* refactor(tests): improve comments for clarity in ChannelFiltering test case
* fix(tools): update no subagents message for clarity and remove unnecessary locking in runTask
* fix(tools): improve description clarity for SpawnStatusTool regarding task context
* feat(tools): add spawn_status tool configuration and registration
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(agent): improve subagent management for spawn and spawn_status tools
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(tests): update ResultTruncation_Unicode test to use valid CJK character
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: lxowalle <83055338+lxowalle@users.noreply.github.com>
- add defensive nil check for tool call Arguments field
- replace nil input with empty object to comply with Anthropic spec
- prevent API errors when GLM models return null input in tool_use blocks
Zhipu AI's GLM series models may return tool_use blocks with null input field,
which causes their API to reject subsequent requests with error:
"ClaudeContentBlockToolResult object has no attribute id"
This fix ensures compatibility by converting nil inputs to empty objects {},
matching the Anthropic Messages API specification while maintaining backward
compatibility with other providers.
- add tools.cron.allow_command config with a default value of true
- require command_confirm only when cron command execution is disabled
- expose cron command permission and timeout settings in the config UI
- add backend tests and update i18n strings