- Critical flag was declared but never acted on; non-critical SubTurns
now break out of the iteration loop when IsParentEnded() returns true
- tools.SubTurnConfig was missing Critical/Timeout/MaxContextRunes,
making those fields unreachable from the tools layer; added fields and
wired them through AgentLoopSpawner.SpawnSubTurn
- Removed subTurnResults sync.Map from AgentLoop — it was a redundant
alias for the same channel already stored in turnState.pendingResults;
dequeuePendingSubTurnResults now reads directly via activeTurnStates
- Replace hardcoded concurrencySem size 5 with maxConcurrentSubTurns constant
- Update affected tests to match new dequeuePendingSubTurnResults API
- Added `/subagents` platform command to visualize the active task tree.
- Implemented GetAllActiveTurns and FormatTree in AgentLoop to support cross-session observability.
- Fixed a bug where sub-turns spawned via tools were not registered in the global `activeTurnStates` map, making them invisible to system queries.
- Enhanced tree rendering logic to identify and display "orphaned" subagents (children that outlive their parent turns).
- Registered the new command in `builtin.go` and injected the turn state provider into the commands runtime.
Modified Files:
- pkg/agent/turn_state.go: Added TurnInfo snapshotting and recursive tree formatting.
- pkg/agent/loop.go: Injected GetActiveTurn hook and implemented multi-root forest rendering.
- pkg/agent/subturn.go: Added child turn registration into activeTurnStates.
- pkg/commands/cmd_subagents.go: New command implementation.
- pkg/commands/builtin.go: Command registration.
This commit addresses several critical concurrency and state management bugs within the SubTurn execution and delivery logic.
1. Fix Goroutine Leak & Deadlock in deliverSubTurnResult:
- Replaced non-blocking select with a safe blocking select that listens to `resultChan` and a new `<-parentTS.Finished()` channel.
- This ensures results are not arbitrarily dropped when the channel is full (preventing orphaned valid results), while also guaranteeing the child goroutine safely unblocks and exits if the parent finishes execution early.
2. Prevent "Send on Closed Channel" Fatal Panics:
- Removed `close(pendingResults)` and `drainPendingResults` from `turnState.Finish()`.
- The pendingResults channel is now naturally garbage collected, completely eliminating the race condition panic when a child attempts delivery at the exact moment the parent finishes.
- Added a `defer recover()` failsafe inside deliverSubTurnResult to gracefully emit Orphan events in extreme edge cases.
3. Fix Truncation Recovery Prompt Drop:
- Fixed the runTurn truncation retry logic by introducing an explicit `promptAlreadyAdded` boolean.
- Ensures that the dynamically generated `recoveryPrompt` is correctly injected into the LLM history sequence on subsequent iterations, adhering to API roles without duplicating arrays.
4. Test Suite Stabilization:
- Fixed TestDeliverSubTurnResultNoDeadlock to accurately wait for deterministic deliveries instead of racing timeouts.
- Replaced defunct closed-channel tests with TestFinishedChannelClosedState matching the new Finished() mechanism.
- Fixed the Finish(true) parameter in TestGrandchildAbort_CascadingCancellation to correctly validate Context cascade behavior.
- All tests now pass cleanly without hanging or emitting false positives.
Problem:
During subturn context limit or truncation recoveries, the recovery loops repeatedly
called `runAgentLoop` with the same or modified `UserMessage`. Because `runAgentLoop`
unconditionally adds the `UserMessage` to the session history, this resulted in:
1. Duplicate User Messages polluting the history upon `context_length_exceeded` retries.
2. The possibility of injecting empty User Messages if `opts.UserMessage` was artificially blanked out to work around the duplication.
3. Messy or duplicate entries during `finish_reason="truncated"` recovery injections.
Solution:
- Introduce `SkipAddUserMessage` boolean to `processOptions` to explicitly control whether the agent loop should write the user prompt to history.
- Add an explicit `opts.UserMessage != ""` check in `runAgentLoop` to prevent polluting history with empty message content.
- In `subturn.go`'s recovery loop, set `SkipAddUserMessage: contextRetryCount > 0` to skip writing the user message on context
Problem:
When parent turn finishes early, all child SubTurns receive "context canceled"
error,because child context was derived from parent context.
Solution:
Implement a lifecycle management system that distinguishes between:
- Graceful finish (Finish(false)): signals parentEnded, children continue
- Hard abort (Finish(true)): immediately cancels all children
Changes:
- turn_state.go:
- Add parentEnded atomic.Bool to signal parent completion
- Add parentTurnState reference for IsParentEnded() checks
- Modify Finish(isHardAbort bool) to distinguish abort types
- subturn.go:
- Add Critical bool to SubTurnConfig (Critical SubTurns continue after parent ends)
- Add Timeout time.Duration for SubTurn self-protection
- Use independent context (context.Background()) instead of derived context
- SubTurns check IsParentEnded() to decide whether to continue or exit
- loop.go:
- Call Finish(false) for normal completion (graceful)
- Add IsParentEnded() check in LLM iteration loop
- steering.go:
- HardAbort calls Finish(true) to immediately cancel children
Behavior:
- Normal finish: parentEnded=true, children continue, orphan results delivered
- Hard abort: all children cancelled immediately via context
- Critical SubTurns: continue running after parent finishes gracefully
- Non-Critical SubTurns: can exit gracefully when IsParentEnded() returns true