mirror of
https://github.com/sipeed/picoclaw.git
synced 2026-06-12 18:08:54 +00:00
021aa7d6d5
* feat(agent): steering * fix loop * fix lint * fix lint
167 lines
7.2 KiB
Markdown
167 lines
7.2 KiB
Markdown
# Steering
|
|
|
|
Steering allows injecting messages into an already-running agent loop, interrupting it between tool calls without waiting for the entire cycle to complete.
|
|
|
|
## How it works
|
|
|
|
When the agent is executing a sequence of tool calls (e.g. the model requested 3 tools in a single turn), steering checks the queue **after each tool** completes. If it finds queued messages:
|
|
|
|
1. The remaining tools are **skipped** and receive `"Skipped due to queued user message."` as their result
|
|
2. The steering messages are **injected into the conversation context**
|
|
3. The model is called again with the updated context, including the user's steering message
|
|
|
|
```
|
|
User ──► Steer("change approach")
|
|
│
|
|
Agent Loop ▼
|
|
├─ tool[0] ✔ (executed)
|
|
├─ [polling] → steering found!
|
|
├─ tool[1] ✘ (skipped)
|
|
├─ tool[2] ✘ (skipped)
|
|
└─ new LLM turn with steering message
|
|
```
|
|
|
|
## Configuration
|
|
|
|
In `config.json`, under `agents.defaults`:
|
|
|
|
```json
|
|
{
|
|
"agents": {
|
|
"defaults": {
|
|
"steering_mode": "one-at-a-time"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Modes
|
|
|
|
| Value | Behavior |
|
|
|-------|----------|
|
|
| `"one-at-a-time"` | **(default)** Dequeues only one message per polling cycle. If there are 3 messages in the queue, they are processed one at a time across 3 successive iterations. |
|
|
| `"all"` | Drains the entire queue in a single poll. All pending messages are injected into the context together. |
|
|
|
|
The environment variable `PICOCLAW_AGENTS_DEFAULTS_STEERING_MODE` can be used as an alternative.
|
|
|
|
## Go API
|
|
|
|
### Steer — Send a steering message
|
|
|
|
```go
|
|
err := agentLoop.Steer(providers.Message{
|
|
Role: "user",
|
|
Content: "change direction, focus on X instead",
|
|
})
|
|
if err != nil {
|
|
// Queue is full (MaxQueueSize=10) or not initialized
|
|
}
|
|
```
|
|
|
|
The message is enqueued in a thread-safe manner. Returns an error if the queue is full or not initialized. It will be picked up at the next polling point (after the current tool finishes).
|
|
|
|
### SteeringMode / SetSteeringMode
|
|
|
|
```go
|
|
// Read the current mode
|
|
mode := agentLoop.SteeringMode() // SteeringOneAtATime | SteeringAll
|
|
|
|
// Change it at runtime
|
|
agentLoop.SetSteeringMode(agent.SteeringAll)
|
|
```
|
|
|
|
### Continue — Resume an idle agent
|
|
|
|
When the agent is idle (it has finished processing and its last message was from the assistant), `Continue` checks if there are steering messages in the queue and uses them to start a new cycle:
|
|
|
|
```go
|
|
response, err := agentLoop.Continue(ctx, sessionKey, channel, chatID)
|
|
if err != nil {
|
|
// Error (e.g. "no default agent available")
|
|
}
|
|
if response == "" {
|
|
// No steering messages in queue, the agent stays idle
|
|
}
|
|
```
|
|
|
|
`Continue` internally uses `SkipInitialSteeringPoll: true` to avoid double-dequeuing the same messages (since it already extracted them and passes them directly as input).
|
|
|
|
## Polling points in the loop
|
|
|
|
Steering is checked at **two points** in the agent cycle:
|
|
|
|
1. **At loop start** — before the first LLM call, to catch messages enqueued during setup
|
|
2. **After every tool completes** — including the first and the last. If steering is found and there are remaining tools, they are all skipped immediately
|
|
|
|
## Why remaining tools are skipped
|
|
|
|
When a steering message is detected, all remaining tools in the batch are skipped rather than executed. The alternative — let all tools finish and inject the steering message afterwards — was considered and rejected. Here is why.
|
|
|
|
### Preventing unwanted side effects
|
|
|
|
Tools can have **irreversible side effects**. If the user says "no, wait" while the agent is mid-batch, executing the remaining tools means those side effects happen anyway:
|
|
|
|
| Tool batch | Steering message | With skip | Without skip |
|
|
|---|---|---|---|
|
|
| `[web_search, send_email]` | "don't send it" | Email **not** sent | Email sent, damage done |
|
|
| `[query_db, write_file, spawn_agent]` | "use another database" | Only the query runs | File written + subagent spawned, all wasted |
|
|
| `[search₁, search₂, search₃, write_file]` | user changes topic entirely | 1 search | 3 searches + file write, all irrelevant |
|
|
|
|
### Avoiding wasted time
|
|
|
|
Tools that take seconds (web fetches, API calls, database queries) would all run to completion before the agent sees the user's correction. In a batch of 3 tools each taking 3-4 seconds, that's 10+ seconds of work that will be discarded.
|
|
|
|
With skipping, the agent reacts as soon as the current tool finishes — typically within a few seconds instead of waiting for the entire batch.
|
|
|
|
### The LLM gets full context
|
|
|
|
Skipped tools receive an explicit error result (`"Skipped due to queued user message."`), so the model knows exactly which actions were not performed. It can then decide whether to re-execute them with the new context, or take a different path entirely.
|
|
|
|
### Trade-off: sequential execution
|
|
|
|
Skipping requires tools to run **sequentially** (the previous implementation ran them in parallel). This introduces latency when the LLM requests multiple independent tools in a single turn. In practice, most batches contain 1-2 tools, so the impact is minimal compared to the benefit of being able to stop unwanted actions.
|
|
|
|
## Skipped tool result format
|
|
|
|
When steering interrupts a batch, each tool that was not executed receives a `tool` result with:
|
|
|
|
```
|
|
Content: "Skipped due to queued user message."
|
|
```
|
|
|
|
This is saved to the session via `AddFullMessage` and sent to the model, so it is aware that some requested actions were not performed.
|
|
|
|
## Full flow example
|
|
|
|
```
|
|
1. User: "search for info on X, write a file, and send me a message"
|
|
|
|
2. LLM responds with 3 tool calls: [web_search, write_file, message]
|
|
|
|
3. web_search is executed → result saved
|
|
|
|
4. [polling] → User called Steer("no, search for Y instead")
|
|
|
|
5. write_file is skipped → "Skipped due to queued user message."
|
|
message is skipped → "Skipped due to queued user message."
|
|
|
|
6. Message "search for Y instead" injected into context
|
|
|
|
7. LLM receives the full updated context and responds accordingly
|
|
```
|
|
|
|
## Automatic bus drain
|
|
|
|
When the agent loop (`Run()`) starts processing a message, it spawns a background goroutine that keeps consuming new inbound messages from the bus. These messages are automatically redirected into the steering queue via `Steer()`. This means:
|
|
|
|
- Users on any channel (Telegram, Discord, etc.) don't need to do anything special — their messages are automatically captured as steering when the agent is busy
|
|
- Audio messages are transcribed before being steered, so the agent receives text. If transcription fails, the original (non-transcribed) message is steered as-is
|
|
- When `processMessage` finishes, the drain goroutine is canceled and normal message consumption resumes
|
|
|
|
## Notes
|
|
|
|
- Steering **does not interrupt** a tool that is currently executing. It waits for the current tool to finish, then checks the queue.
|
|
- With `one-at-a-time` mode, if multiple messages are enqueued rapidly, they will be processed one per iteration. This gives the model the opportunity to react to each message individually.
|
|
- With `all` mode, all pending messages are combined into a single injection. Useful when you want the agent to receive all the context at once.
|
|
- The steering queue has a maximum capacity of 10 messages (`MaxQueueSize`). `Steer()` returns an error when the queue is full. In the bus drain path, the error is logged as a warning and the message is effectively dropped.
|