- Added `/subagents` platform command to visualize the active task tree.
- Implemented GetAllActiveTurns and FormatTree in AgentLoop to support cross-session observability.
- Fixed a bug where sub-turns spawned via tools were not registered in the global `activeTurnStates` map, making them invisible to system queries.
- Enhanced tree rendering logic to identify and display "orphaned" subagents (children that outlive their parent turns).
- Registered the new command in `builtin.go` and injected the turn state provider into the commands runtime.
Modified Files:
- pkg/agent/turn_state.go: Added TurnInfo snapshotting and recursive tree formatting.
- pkg/agent/loop.go: Injected GetActiveTurn hook and implemented multi-root forest rendering.
- pkg/agent/subturn.go: Added child turn registration into activeTurnStates.
- pkg/commands/cmd_subagents.go: New command implementation.
- pkg/commands/builtin.go: Command registration.
This commit addresses several critical concurrency and state management bugs within the SubTurn execution and delivery logic.
1. Fix Goroutine Leak & Deadlock in deliverSubTurnResult:
- Replaced non-blocking select with a safe blocking select that listens to `resultChan` and a new `<-parentTS.Finished()` channel.
- This ensures results are not arbitrarily dropped when the channel is full (preventing orphaned valid results), while also guaranteeing the child goroutine safely unblocks and exits if the parent finishes execution early.
2. Prevent "Send on Closed Channel" Fatal Panics:
- Removed `close(pendingResults)` and `drainPendingResults` from `turnState.Finish()`.
- The pendingResults channel is now naturally garbage collected, completely eliminating the race condition panic when a child attempts delivery at the exact moment the parent finishes.
- Added a `defer recover()` failsafe inside deliverSubTurnResult to gracefully emit Orphan events in extreme edge cases.
3. Fix Truncation Recovery Prompt Drop:
- Fixed the runTurn truncation retry logic by introducing an explicit `promptAlreadyAdded` boolean.
- Ensures that the dynamically generated `recoveryPrompt` is correctly injected into the LLM history sequence on subsequent iterations, adhering to API roles without duplicating arrays.
4. Test Suite Stabilization:
- Fixed TestDeliverSubTurnResultNoDeadlock to accurately wait for deterministic deliveries instead of racing timeouts.
- Replaced defunct closed-channel tests with TestFinishedChannelClosedState matching the new Finished() mechanism.
- Fixed the Finish(true) parameter in TestGrandchildAbort_CascadingCancellation to correctly validate Context cascade behavior.
- All tests now pass cleanly without hanging or emitting false positives.
Problem:
During subturn context limit or truncation recoveries, the recovery loops repeatedly
called `runAgentLoop` with the same or modified `UserMessage`. Because `runAgentLoop`
unconditionally adds the `UserMessage` to the session history, this resulted in:
1. Duplicate User Messages polluting the history upon `context_length_exceeded` retries.
2. The possibility of injecting empty User Messages if `opts.UserMessage` was artificially blanked out to work around the duplication.
3. Messy or duplicate entries during `finish_reason="truncated"` recovery injections.
Solution:
- Introduce `SkipAddUserMessage` boolean to `processOptions` to explicitly control whether the agent loop should write the user prompt to history.
- Add an explicit `opts.UserMessage != ""` check in `runAgentLoop` to prevent polluting history with empty message content.
- In `subturn.go`'s recovery loop, set `SkipAddUserMessage: contextRetryCount > 0` to skip writing the user message on context
Problem:
When parent turn finishes early, all child SubTurns receive "context canceled"
error,because child context was derived from parent context.
Solution:
Implement a lifecycle management system that distinguishes between:
- Graceful finish (Finish(false)): signals parentEnded, children continue
- Hard abort (Finish(true)): immediately cancels all children
Changes:
- turn_state.go:
- Add parentEnded atomic.Bool to signal parent completion
- Add parentTurnState reference for IsParentEnded() checks
- Modify Finish(isHardAbort bool) to distinguish abort types
- subturn.go:
- Add Critical bool to SubTurnConfig (Critical SubTurns continue after parent ends)
- Add Timeout time.Duration for SubTurn self-protection
- Use independent context (context.Background()) instead of derived context
- SubTurns check IsParentEnded() to decide whether to continue or exit
- loop.go:
- Call Finish(false) for normal completion (graceful)
- Add IsParentEnded() check in LLM iteration loop
- steering.go:
- HardAbort calls Finish(true) to immediately cancel children
Behavior:
- Normal finish: parentEnded=true, children continue, orphan results delivered
- Hard abort: all children cancelled immediately via context
- Critical SubTurns: continue running after parent finishes gracefully
- Non-Critical SubTurns: can exit gracefully when IsParentEnded() returns true
Includes JSONL session persistence (#1170), spawn_status tool, Azure provider,
credential encryption, and various fixes. SubTurn features preserved and
integrated with new spawn_status functionality.
Add a clear identity statement to all 6 README files clarifying that
PicoClaw is an independent open-source project by Sipeed, written
entirely in Go, and not a fork of OpenClaw, NanoBot, or any other
project. This addresses common AI hallucinations found during testing
of 11 AI tools. Also normalizes [nanobot] to [NanoBot] for consistent
capitalization.
Co-authored-by: BeaconCat <BeaconCat@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- add a dedicated exec settings section in the config page
- support timeout and custom allow/deny regex patterns for exec
- validate custom exec regex patterns in the config API
- block cron command scheduling and execution when exec is disabled
- update tests and i18n strings for the new command settings
* feat(gateway): support hot reload and empty startup
- extract gateway runtime into pkg/gateway
- add gateway.hot_reload config with default and example values
- allow starting the gateway without a default model via --allow-empty
- stop treating missing enabled channels as a startup error
- update related tests
* feat: replace gateway SSE updates with polling-based state sync
- remove gateway SSE broadcasting and event endpoint
- add polling-based gateway status refresh with stopping state handling
- detect when gateway restart is required after default model changes
- resolve gateway health and websocket proxy targets from configured host
- update gateway UI labels and add backend/frontend test coverage
- Modify buildWsURL to use web server port (18800) instead of gateway port (18790)
- Add WebSocket proxy handler to forward /pico/ws to gateway
- Gateway port is read from config (cfg.Gateway.Port), defaults to 18790
- This allows WebSocket connections through the same port as the web UI,
avoiding the need to expose extra ports for Tailscale/Docker
Separate web Go commands from the default Go toolchain so web builds,
tests, and vet can enable CGO on Darwin without affecting the rest of
the project. Also ensure frontend backend builds recreate backend/dist
with a .gitkeep file so the embedded output directory remains tracked.
* feat(tools): add SpawnStatusTool for reporting subagent statuses
* feat(tools): enhance SpawnStatusTool to restrict task visibility by conversation context
* feat(tests): add Unicode result truncation and channel filtering tests for SpawnStatusTool
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* feat(tools): enhance SpawnStatusTool with task ID validation and sorting by creation timestamp
* feat(tools): update SpawnStatusTool description and parameter documentation for clarity
* refactor(tests): improve comments for clarity in ChannelFiltering test case
* fix(tools): update no subagents message for clarity and remove unnecessary locking in runTask
* fix(tools): improve description clarity for SpawnStatusTool regarding task context
* feat(tools): add spawn_status tool configuration and registration
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(agent): improve subagent management for spawn and spawn_status tools
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(tests): update ResultTruncation_Unicode test to use valid CJK character
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: lxowalle <83055338+lxowalle@users.noreply.github.com>
- add defensive nil check for tool call Arguments field
- replace nil input with empty object to comply with Anthropic spec
- prevent API errors when GLM models return null input in tool_use blocks
Zhipu AI's GLM series models may return tool_use blocks with null input field,
which causes their API to reject subsequent requests with error:
"ClaudeContentBlockToolResult object has no attribute id"
This fix ensures compatibility by converting nil inputs to empty objects {},
matching the Anthropic Messages API specification while maintaining backward
compatibility with other providers.
- add tools.cron.allow_command config with a default value of true
- require command_confirm only when cron command execution is disabled
- expose cron command permission and timeout settings in the config UI
- add backend tests and update i18n strings
- Fix synchronous SubTurn calls placing results in pendingResults channel,
causing double delivery. Now only async calls (Async=true) use the channel.
- Move deliverSubTurnResult into defer to ensure result delivery even when
runTurn panics. Add TestSpawnSubTurn_PanicRecovery to verify.
- Fix ContextWindow incorrectly set to MaxTokens; now inherits from
parentAgent.ContextWindow.
- Add TestSpawnSubTurn_ResultDeliverySync to verify sync behavior.
Critical fixes (5):
- Fix turnState hierarchy corruption in nested SubTurns by checking context
before creating new root turnState in runAgentLoop
- Fix deadlock risk in deliverSubTurnResult by separating lock and channel ops
- Fix session rollback race in HardAbort by calling Finish() before rollback
- Fix resource leak by closing pendingResults channel in Finish() with recovery
- Add thread-safety docs for childTurnIDs and isFinished fields
Medium priority fixes (5):
- Move globalTurnCounter to AgentLoop.subTurnCounter to prevent ID conflicts
- Improve semaphore acquisition to ensure release even on early validation failures
- Document design choice: ephemeral sessions start empty for complete isolation
- Add final poll before Finish() to capture late-arriving SubTurn results
- Remove duplicate channel registration in spawnSubTurn to fix timing issues
Testing:
- Add 6 new tests covering hierarchy, deadlock, ordering, channel lifecycle,
final poll, and semaphore behavior
- All 12 SubTurn tests passing with race detector
This resolves 10 critical and medium issues (5 race conditions, 2 resource leaks,
3 timing issues) identified in code review, bringing SubTurn to production-ready state.
- Fix turnState hierarchy corruption when SubTurns recursively call runAgentLoop
by checking context for existing turnState before creating new root
- Fix deadlock risk in deliverSubTurnResult by separating lock and channel operations
- Fix session rollback race in HardAbort by calling Finish() before rollback
- Fix resource leak by closing pendingResults channel in Finish() with panic recovery
- Add thread-safety documentation for childTurnIDs and isFinished fields
- Move globalTurnCounter to AgentLoop.subTurnCounter to prevent ID conflicts
- Improve semaphore acquisition to ensure release even on early validation failures
- Document design choice: ephemeral sessions start empty for complete isolation
- Add 5 new tests: hierarchy, deadlock, order, channel close, and semaphore
- Add initialHistoryLength field to turnState to snapshot session state at turn start
- Save initial history length in runAgentLoop when creating root turnState
- Implement session rollback in HardAbort via SetHistory, truncating to initial length
- Add TestHardAbortSessionRollback to verify history rollback after abort
- Import providers package in subturn_test.go for Message type
This ensures that when a user triggers hard abort, all messages added during
the aborted turn are discarded, restoring the session to its pre-turn state.
- Add maxConcurrentSubTurns constant (5) and concurrencySem channel to turnState
- Acquire/release semaphore in spawnSubTurn to limit concurrent child turns per parent
- Add activeTurnStates sync.Map to AgentLoop for tracking root turn states by session
- Implement HardAbort(sessionKey) method to trigger cascading cancellation via turnState.Finish()
- Register/unregister root turnState in runAgentLoop for hard abort lookup
- Add TestSubTurnConcurrencySemaphore to verify semaphore capacity enforcement
- Add TestHardAbortCascading to verify context cancellation propagates to child turns
- Add subTurnResults sync.Map to AgentLoop for per-session channel tracking
- Add register/unregister/dequeue methods in steering.go
- Poll SubTurn results in runLLMIteration at loop start and after each tool,
injecting results as [SubTurn Result] messages into parent conversation
- Initialize root turnState in runAgentLoop, propagate via context
(withTurnState/turnStateFromContext), call rootTS.Finish() on completion
- Wire Spawn Tool to spawnSubTurn via SetSpawner in registerSharedTools,
recovering parentTS from context for proper turn hierarchy
- Refactor subagent.go to use SetSpawner pattern
- Add TestSubTurnResultChannelRegistration and TestDequeuePendingSubTurnResults
- move chat controller, state, protocol, history, and websocket logic into a dedicated chat feature module
- improve chat reconnection, session hydration, and send gating based on actual websocket state
- preserve gateway status during transient SSE disconnects and update stop state immediately
- generate wss websocket URLs behind HTTPS proxies and add backend tests for forwarded proto handling
- Replace duplicate types (ToolResult/Session/Message) with real project types
- Implement ephemeralSessionStore satisfying session.SessionStore interface
- Connect runTurn to real AgentLoop via runAgentLoop + AgentInstance
- Fix subturn_test.go to match updated signatures and types
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
* feat(credential): add AES-GCM encryption, SecureStore, and onboard keygen
- pkg/credential: new package with AES-256-GCM enc:// credential format,
HKDF-SHA256 key derivation (passphrase + optional SSH key binding),
ErrPassphraseRequired / ErrDecryptionFailed sentinel errors,
and PassphraseProvider hook for runtime passphrase injection
- pkg/credential/store: lock-free SecureStore via atomic.Pointer[string];
passphrase never written to disk or os.Environ
- pkg/credential/keygen: ed25519 SSH key generation helper used by onboard
- pkg/config: replace os.Getenv(PassphraseEnvVar) with
credential.PassphraseProvider() at all three call sites so that
LoadConfig and SaveConfig use whatever passphrase source is active
- cmd/picoclaw/onboard: prompt for passphrase with echo-off, generate
picoclaw-specific SSH key, re-encrypt existing config on re-onboard
- docs/credential_encryption.md: design doc for the enc:// format
* fix(credential): address Copilot review comments on PR #1521
- credential.go: decouple ErrPassphraseRequired from env var name;
message is now 'enc:// passphrase required' since PassphraseProvider
may come from any source, not just os.Environ
- credential.go: Resolver resolves symlinks via EvalSymlinks before the
isWithinDir containment check, preventing symlink-based path traversal
for file:// credential references
- store.go: tighten comment to describe only what SecureStore guarantees
(in-memory only); remove claims about how callers transport the value
- store_test.go: replace the meaningless GetReturnsCopy test (Go strings
are immutable, equality across two calls proves nothing) with
TestSecureStore_ConcurrentSetGet that exercises atomic.Pointer under
10-goroutine concurrent Set/Get load
- config_test.go: update error-message assertion to match new sentinel text
- docs/credential_encryption.md: remove reference to non-existent
'picoclaw encrypt' subcommand; describe the onboard flow instead
* fix(config): encryptPlaintextAPIKeys: struct-based encryption, fail-fast, remove raw []byte
* fix(credential): require SSH private key for encryption/decryption, remove passphrase-only mode
* lint: fix credential keygen lint, fix test keygen
* onboard: make encryption opt-in via --enc flag
Encryption (passphrase prompt + SSH key generation) is now only
triggered when the user passes --enc to 'picoclaw onboard'.
Without the flag, onboard skips the credential-encryption setup and
writes a plain config + workspace templates directly.
- Add --enc BoolFlag in NewOnboardCommand()
- Pass encrypt bool into onboard()
- Guard passphrase prompt, SSH key generation, and related env-var
setup behind the encrypt branch
- Adjust 'Next steps' output so the passphrase reminder only appears
when --enc was used