feat: support streaming (#2892)

* Support streaming * fix: stream pico reasoning updates Route Pico reasoning through the active streamer and hide empty thought placeholders. * fix: harden configured streaming delivery * fix ci * fix split issue
2026-05-25 16:00:35 +00:00 · 2026-05-19 16:38:47 +08:00
parent 941bac2332
commit 639b32703a
74 changed files with 6197 additions and 202 deletions
@@ -364,6 +364,55 @@ Configurez plusieurs endpoints pour le même nom de modèle — PicoClaw effectu

 L'ancienne configuration `providers` est **dépréciée** et a été supprimée dans V2. Les configs V0/V1 existantes sont auto-migrées. Voir [docs/migration/model-list-migration.md](../migration/model-list-migration.md).

+#### Configuration du Streaming
+
+Le streaming provider utilise un double opt-in et est désactivé par défaut. L'agent ne tente le streaming que lorsque le channel courant a `settings.streaming.enabled: true`, que l'entrée de modèle active a `streaming.enabled: true`, et que le provider comme le channel prennent en charge le streaming. Si une condition manque, PicoClaw utilise le chemin de requête non-streaming normal.
+
+Pico WebUI est le premier channel entièrement câblé. Pico crée le premier message assistant avec le message wire existant `message.create`, puis met à jour ce même message avec `message.update`; aucun nouveau type de message Pico n'est introduit.
+
+Laissez `streaming` absent si vous ne voulez pas de streaming. Un bloc `streaming` omis signifie désactivé; il n'est pas nécessaire d'écrire `"streaming": {"enabled": false}`.
+
+Exemple d'activation :
+
+```json
+{
+  "model_list": [
+    {
+      "model_name": "gpt-5.4",
+      "provider": "openai",
+      "model": "gpt-5.4",
+      "api_keys": ["sk-your-openai-key"],
+      "streaming": {
+        "enabled": true
+      }
+    }
+  ],
+  "channel_list": {
+    "pico": {
+      "enabled": true,
+      "type": "pico",
+      "settings": {
+        "token": "YOUR_PICO_TOKEN",
+        "streaming": {
+          "enabled": true
+        }
+      }
+    }
+  }
+}
+```
+
+| Champ | Type | Défaut | Description |
+| ----- | ---- | ------ | ----------- |
+| `channel_list.<name>.settings.streaming.enabled` | bool | `false` | Autorise ce channel à afficher la sortie streaming du provider |
+| `channel_list.<name>.settings.streaming.throttle_seconds` | int | Défaut Pico après activation : `0` | Intervalle minimal entre les mises à jour intermédiaires; le contenu final est toujours envoyé |
+| `channel_list.<name>.settings.streaming.min_growth_chars` | int | Défaut Pico après activation : `1` | Croissance minimale du texte avant une mise à jour intermédiaire; le contenu final est toujours envoyé |
+| `model_list[].streaming.enabled` | bool | `false` | Autorise cette entrée de modèle à tenter des requêtes provider en streaming |
+
+Les anciennes variables d'environnement Telegram restent compatibles : `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_ENABLED`, `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_THROTTLE_SECONDS` et `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_MIN_GROWTH_CHARS`. Elles s'appliquent uniquement aux settings Telegram et n'activent ni ne modifient `settings.streaming` de Pico.
+
+Le comportement d'échec est volontairement conservateur : si le streaming échoue avant l'envoi d'un chunk visible, PicoClaw réessaie une fois via le chemin `Chat()` normal. Si un chunk a déjà été affiché à l'utilisateur, PicoClaw n'envoie pas une deuxième réponse non-streaming, afin d'éviter une sortie dupliquée.
+
 ### Architecture des Providers

 PicoClaw route les providers par famille de protocole :
@@ -31,6 +31,55 @@ PICOCLAW_HOME=/opt/picoclaw picoclaw agent
 PICOCLAW_HOME=/srv/picoclaw PICOCLAW_CONFIG=/srv/picoclaw/main.json picoclaw gateway
 ```

+### Configurazione Streaming
+
+Lo streaming del provider usa un double opt-in ed è disattivato per impostazione predefinita. L'agent prova lo streaming solo quando il canale corrente ha `settings.streaming.enabled: true`, l'entry del modello attivo ha `streaming.enabled: true`, e sia il provider sia il canale supportano lo streaming. Se manca una qualsiasi condizione, PicoClaw usa il normale percorso di richiesta non streaming.
+
+Pico WebUI è il primo canale completamente collegato. Pico crea il primo messaggio assistant con il wire message esistente `message.create`, poi aggiorna lo stesso messaggio con `message.update`; non viene introdotto alcun nuovo tipo di wire message Pico.
+
+Lascia `streaming` assente quando non vuoi usare lo streaming. Un blocco `streaming` omesso significa disattivato; non è necessario scrivere `"streaming": {"enabled": false}`.
+
+Esempio di attivazione:
+
+```json
+{
+  "model_list": [
+    {
+      "model_name": "gpt-5.4",
+      "provider": "openai",
+      "model": "gpt-5.4",
+      "api_keys": ["sk-your-openai-key"],
+      "streaming": {
+        "enabled": true
+      }
+    }
+  ],
+  "channel_list": {
+    "pico": {
+      "enabled": true,
+      "type": "pico",
+      "settings": {
+        "token": "YOUR_PICO_TOKEN",
+        "streaming": {
+          "enabled": true
+        }
+      }
+    }
+  }
+}
+```
+
+| Campo | Tipo | Predefinito | Descrizione |
+| ----- | ---- | ----------- | ----------- |
+| `channel_list.<name>.settings.streaming.enabled` | bool | `false` | Permette a questo canale di mostrare l'output streaming del provider |
+| `channel_list.<name>.settings.streaming.throttle_seconds` | int | Predefinito Pico dopo l'attivazione: `0` | Intervallo minimo tra aggiornamenti intermedi; il contenuto finale viene sempre inviato |
+| `channel_list.<name>.settings.streaming.min_growth_chars` | int | Predefinito Pico dopo l'attivazione: `1` | Crescita minima del testo prima di inviare un aggiornamento intermedio; il contenuto finale viene sempre inviato |
+| `model_list[].streaming.enabled` | bool | `false` | Permette a questa entry di modello di provare richieste provider streaming |
+
+Le variabili d'ambiente legacy di Telegram restano compatibili: `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_ENABLED`, `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_THROTTLE_SECONDS` e `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_MIN_GROWTH_CHARS`. Si applicano solo alle settings Telegram e non attivano né modificano `settings.streaming` di Pico.
+
+Il comportamento in caso di errore è intenzionalmente conservativo: se lo streaming fallisce prima che venga inviato un chunk visibile, PicoClaw riprova una volta tramite il normale percorso `Chat()`. Se un chunk è già stato mostrato all'utente, PicoClaw non invia una seconda risposta non streaming, evitando output duplicato.
+
 ### Struttura del Workspace

 PicoClaw salva i dati nel workspace configurato (predefinito: `~/.picoclaw/workspace`):
@@ -365,6 +365,55 @@ HEARTBEAT_OK を返信        ユーザーが直接結果を受信

 旧 `providers` 設定は**非推奨**となり、V2 で削除されました。既存の V0/V1 設定は自動的に移行されます。[docs/migration/model-list-migration.md](../migration/model-list-migration.md) を参照してください。

+#### ストリーミング設定
+
+Provider ストリーミングは二重の opt-in 方式で、デフォルトでは無効です。現在の channel に `settings.streaming.enabled: true` があり、アクティブなモデルエントリに `streaming.enabled: true` があり、さらに provider と channel の両方がストリーミングをサポートしている場合にのみ、agent はストリーミングリクエストを試行します。いずれかの条件が欠ける場合、PicoClaw は通常の非ストリーミングリクエスト経路を使います。
+
+Pico WebUI が最初に完全対応した channel です。Pico は既存の `message.create` wire message で最初の assistant メッセージを作成し、その後 `message.update` で同じメッセージを更新します。新しい Pico wire message type は追加されません。
+
+ストリーミングを使わない場合は `streaming` を省略してください。`streaming` ブロックの省略は無効を意味するため、`"streaming": {"enabled": false}` を書く必要はありません。
+
+有効化例：
+
+```json
+{
+  "model_list": [
+    {
+      "model_name": "gpt-5.4",
+      "provider": "openai",
+      "model": "gpt-5.4",
+      "api_keys": ["sk-your-openai-key"],
+      "streaming": {
+        "enabled": true
+      }
+    }
+  ],
+  "channel_list": {
+    "pico": {
+      "enabled": true,
+      "type": "pico",
+      "settings": {
+        "token": "YOUR_PICO_TOKEN",
+        "streaming": {
+          "enabled": true
+        }
+      }
+    }
+  }
+}
+```
+
+| フィールド | 型 | デフォルト | 説明 |
+| ---------- | -- | ---------- | ---- |
+| `channel_list.<name>.settings.streaming.enabled` | bool | `false` | この channel で provider のストリーミング出力を表示できるようにします |
+| `channel_list.<name>.settings.streaming.throttle_seconds` | int | Pico で有効化後のデフォルト：`0` | 中間更新の最小間隔。最終内容は常に flush されます |
+| `channel_list.<name>.settings.streaming.min_growth_chars` | int | Pico で有効化後のデフォルト：`1` | 次の中間更新を送るために必要な最小文字増加数。最終内容は常に flush されます |
+| `model_list[].streaming.enabled` | bool | `false` | このモデルエントリで provider ストリーミングリクエストを試行できるようにします |
+
+既存の Telegram 環境変数 `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_ENABLED`、`PICOCLAW_CHANNELS_TELEGRAM_STREAMING_THROTTLE_SECONDS`、`PICOCLAW_CHANNELS_TELEGRAM_STREAMING_MIN_GROWTH_CHARS` は互換性のため引き続き使えます。これらは Telegram settings にのみ適用され、Pico の `settings.streaming` を有効化または変更しません。
+
+失敗時の動作は保守的です。可視 chunk が送信される前にストリーミングが失敗した場合、PicoClaw は通常の `Chat()` 経路で一度だけ再試行します。すでに chunk がユーザーに表示されている場合は、表示済み出力の重複を避けるため、二つ目の非ストリーミング回答は送信しません。
+
 ### Provider アーキテクチャ

 PicoClaw はプロトコルファミリーで Provider をルーティングします：
@@ -744,6 +744,55 @@ Resolution rules:
 - If `provider` is omitted, PicoClaw treats the first `/` segment in `model` as the provider and everything after that first `/` as the runtime model ID.
 - This means `"model": "openrouter/openai/gpt-5.4"` still works as a compatibility form and sends `openai/gpt-5.4` to OpenRouter.

+#### Streaming Configuration
+
+Provider streaming uses a double opt-in and is disabled by default. The agent only tries streaming when the current channel has `settings.streaming.enabled: true`, the active model entry has `streaming.enabled: true`, and both the provider and channel support streaming. If any condition is missing, PicoClaw uses the normal non-streaming request path.
+
+Pico WebUI is the first fully wired channel. Pico creates the first assistant message with the existing `message.create` wire message, then updates that same message with `message.update`; no new Pico wire message type is introduced.
+
+Leave `streaming` unset when you do not want streaming. An omitted `streaming` block means disabled; you do not need to write `"streaming": {"enabled": false}`.
+
+Opt-in example:
+
+```json
+{
+  "model_list": [
+    {
+      "model_name": "gpt-5.4",
+      "provider": "openai",
+      "model": "gpt-5.4",
+      "api_keys": ["sk-your-openai-key"],
+      "streaming": {
+        "enabled": true
+      }
+    }
+  ],
+  "channel_list": {
+    "pico": {
+      "enabled": true,
+      "type": "pico",
+      "settings": {
+        "token": "YOUR_PICO_TOKEN",
+        "streaming": {
+          "enabled": true
+        }
+      }
+    }
+  }
+}
+```
+
+| Field | Type | Default | Description |
+| ----- | ---- | ------- | ----------- |
+| `channel_list.<name>.settings.streaming.enabled` | bool | `false` | Allows this channel to display provider streaming output |
+| `channel_list.<name>.settings.streaming.throttle_seconds` | int | Pico default after enabling: `0` | Minimum interval for intermediate updates; final content is always flushed |
+| `channel_list.<name>.settings.streaming.min_growth_chars` | int | Pico default after enabling: `1` | Minimum character growth before sending an intermediate update; final content is always flushed |
+| `model_list[].streaming.enabled` | bool | `false` | Allows this model entry to try provider streaming requests |
+
+Legacy Telegram environment variables remain compatible: `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_ENABLED`, `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_THROTTLE_SECONDS`, and `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_MIN_GROWTH_CHARS`. They only apply to Telegram settings and do not enable or modify Pico `settings.streaming`.
+
+Failure behavior is intentionally conservative: if streaming fails before any visible chunk is sent, PicoClaw retries once through the normal `Chat()` path. If a chunk has already been shown to the user, PicoClaw does not send a second non-streaming answer, because that would duplicate visible output.
+
 #### Vendor-Specific Examples

 > **Tip**: You can omit `api_key` fields and store them in `.security.yml` for better security. See [Security Configuration](#-security-configuration-recommended).
@@ -31,6 +31,55 @@ PICOCLAW_HOME=/opt/picoclaw picoclaw agent
 PICOCLAW_HOME=/srv/picoclaw PICOCLAW_CONFIG=/srv/picoclaw/main.json picoclaw gateway
 ```

+### Konfigurasi Streaming
+
+Provider streaming menggunakan double opt-in dan dimatikan secara lalai. Agent hanya mencuba streaming apabila saluran semasa mempunyai `settings.streaming.enabled: true`, entry model aktif mempunyai `streaming.enabled: true`, dan kedua-dua provider serta saluran menyokong streaming. Jika mana-mana syarat tiada, PicoClaw menggunakan laluan permintaan bukan streaming biasa.
+
+Pico WebUI ialah saluran pertama yang disambungkan sepenuhnya. Pico mencipta mesej assistant pertama dengan wire message sedia ada `message.create`, kemudian mengemas kini mesej yang sama dengan `message.update`; tiada jenis wire message Pico baharu ditambah.
+
+Biarkan `streaming` tidak ditetapkan jika anda tidak mahu streaming. Blok `streaming` yang tiada bermaksud dimatikan; anda tidak perlu menulis `"streaming": {"enabled": false}`.
+
+Contoh mengaktifkan streaming:
+
+```json
+{
+  "model_list": [
+    {
+      "model_name": "gpt-5.4",
+      "provider": "openai",
+      "model": "gpt-5.4",
+      "api_keys": ["sk-your-openai-key"],
+      "streaming": {
+        "enabled": true
+      }
+    }
+  ],
+  "channel_list": {
+    "pico": {
+      "enabled": true,
+      "type": "pico",
+      "settings": {
+        "token": "YOUR_PICO_TOKEN",
+        "streaming": {
+          "enabled": true
+        }
+      }
+    }
+  }
+}
+```
+
+| Kunci | Jenis | Lalai | Penerangan |
+| ----- | ----- | ----- | ---------- |
+| `channel_list.<name>.settings.streaming.enabled` | bool | `false` | Membenarkan saluran ini memaparkan output streaming provider |
+| `channel_list.<name>.settings.streaming.throttle_seconds` | int | Lalai Pico selepas diaktifkan: `0` | Jarak masa minimum antara kemas kini pertengahan; kandungan akhir sentiasa dihantar |
+| `channel_list.<name>.settings.streaming.min_growth_chars` | int | Lalai Pico selepas diaktifkan: `1` | Pertambahan aksara minimum sebelum menghantar kemas kini pertengahan; kandungan akhir sentiasa dihantar |
+| `model_list[].streaming.enabled` | bool | `false` | Membenarkan entry model ini mencuba permintaan provider streaming |
+
+Pemboleh ubah persekitaran Telegram lama masih serasi: `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_ENABLED`, `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_THROTTLE_SECONDS`, dan `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_MIN_GROWTH_CHARS`. Ia hanya digunakan untuk settings Telegram dan tidak mengaktifkan atau mengubah `settings.streaming` Pico.
+
+Tingkah laku kegagalan adalah konservatif: jika streaming gagal sebelum mana-mana chunk kelihatan dihantar, PicoClaw mencuba semula sekali melalui laluan `Chat()` biasa. Jika chunk sudah dipaparkan kepada pengguna, PicoClaw tidak menghantar jawapan bukan streaming kedua untuk mengelakkan output berganda.
+
 ### Susun Atur Workspace

 PicoClaw menyimpan data dalam workspace yang dikonfigurasikan (lalai: `~/.picoclaw/workspace`):
@@ -365,6 +365,55 @@ Configure múltiplos endpoints para o mesmo nome de modelo — PicoClaw fará ro

 A configuração antiga `providers` está **depreciada** e foi removida no V2. Configs V0/V1 existentes são auto-migradas. Veja [docs/migration/model-list-migration.md](../migration/model-list-migration.md).

+#### Configuração de Streaming
+
+O streaming do provider usa double opt-in e fica desativado por padrão. O agent só tenta streaming quando o canal atual tem `settings.streaming.enabled: true`, a entrada de modelo ativa tem `streaming.enabled: true`, e tanto o provider quanto o canal suportam streaming. Se qualquer condição estiver ausente, o PicoClaw usa o caminho normal de requisição sem streaming.
+
+O Pico WebUI é o primeiro canal totalmente integrado. O Pico cria a primeira mensagem assistant com o wire message existente `message.create` e depois atualiza a mesma mensagem com `message.update`; nenhum novo tipo de wire message do Pico é introduzido.
+
+Deixe `streaming` ausente quando não quiser streaming. Um bloco `streaming` omitido significa desativado; você não precisa escrever `"streaming": {"enabled": false}`.
+
+Exemplo de ativação:
+
+```json
+{
+  "model_list": [
+    {
+      "model_name": "gpt-5.4",
+      "provider": "openai",
+      "model": "gpt-5.4",
+      "api_keys": ["sk-your-openai-key"],
+      "streaming": {
+        "enabled": true
+      }
+    }
+  ],
+  "channel_list": {
+    "pico": {
+      "enabled": true,
+      "type": "pico",
+      "settings": {
+        "token": "YOUR_PICO_TOKEN",
+        "streaming": {
+          "enabled": true
+        }
+      }
+    }
+  }
+}
+```
+
+| Campo | Tipo | Padrão | Descrição |
+| ----- | ---- | ------ | --------- |
+| `channel_list.<name>.settings.streaming.enabled` | bool | `false` | Permite que este canal exiba output streaming do provider |
+| `channel_list.<name>.settings.streaming.throttle_seconds` | int | Padrão do Pico após ativar: `0` | Intervalo mínimo entre atualizações intermediárias; o conteúdo final sempre é enviado |
+| `channel_list.<name>.settings.streaming.min_growth_chars` | int | Padrão do Pico após ativar: `1` | Crescimento mínimo de texto antes de enviar outra atualização intermediária; o conteúdo final sempre é enviado |
+| `model_list[].streaming.enabled` | bool | `false` | Permite que esta entrada de modelo tente requisições de provider streaming |
+
+As variáveis de ambiente legadas do Telegram continuam compatíveis: `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_ENABLED`, `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_THROTTLE_SECONDS` e `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_MIN_GROWTH_CHARS`. Elas se aplicam apenas às settings do Telegram e não ativam nem modificam `settings.streaming` do Pico.
+
+O comportamento de falha é intencionalmente conservador: se o streaming falhar antes de qualquer chunk visível ser enviado, o PicoClaw tenta novamente uma vez pelo caminho normal `Chat()`. Se um chunk já foi mostrado ao usuário, o PicoClaw não envia uma segunda resposta sem streaming, evitando output duplicado.
+
 ### Arquitetura de Providers

 PicoClaw roteia providers por família de protocolo:
@@ -365,6 +365,55 @@ Cấu hình nhiều endpoint cho cùng tên mô hình — PicoClaw sẽ tự đ

 Cấu hình `providers` cũ đã **bị deprecated** và đã được loại bỏ trong V2. Các cấu hình V0/V1 hiện có sẽ được tự động migrate. Xem [docs/migration/model-list-migration.md](../migration/model-list-migration.md).

+#### Cấu Hình Streaming
+
+Provider streaming dùng cơ chế double opt-in và bị tắt theo mặc định. Agent chỉ thử streaming khi channel hiện tại có `settings.streaming.enabled: true`, entry model đang dùng có `streaming.enabled: true`, và cả provider lẫn channel đều hỗ trợ streaming. Nếu thiếu bất kỳ điều kiện nào, PicoClaw dùng đường dẫn yêu cầu không streaming thông thường.
+
+Pico WebUI là channel đầu tiên được nối đầy đủ. Pico tạo message assistant đầu tiên bằng wire message hiện có `message.create`, sau đó cập nhật chính message đó bằng `message.update`; không thêm loại wire message Pico mới.
+
+Hãy để trống `streaming` khi bạn không muốn dùng streaming. Bỏ qua block `streaming` nghĩa là đã tắt; bạn không cần viết `"streaming": {"enabled": false}`.
+
+Ví dụ bật streaming:
+
+```json
+{
+  "model_list": [
+    {
+      "model_name": "gpt-5.4",
+      "provider": "openai",
+      "model": "gpt-5.4",
+      "api_keys": ["sk-your-openai-key"],
+      "streaming": {
+        "enabled": true
+      }
+    }
+  ],
+  "channel_list": {
+    "pico": {
+      "enabled": true,
+      "type": "pico",
+      "settings": {
+        "token": "YOUR_PICO_TOKEN",
+        "streaming": {
+          "enabled": true
+        }
+      }
+    }
+  }
+}
+```
+
+| Trường | Kiểu | Mặc định | Mô tả |
+| ------ | ---- | -------- | ----- |
+| `channel_list.<name>.settings.streaming.enabled` | bool | `false` | Cho phép channel này hiển thị output streaming từ provider |
+| `channel_list.<name>.settings.streaming.throttle_seconds` | int | Mặc định Pico sau khi bật: `0` | Khoảng cách tối thiểu giữa các cập nhật trung gian; nội dung cuối luôn được flush |
+| `channel_list.<name>.settings.streaming.min_growth_chars` | int | Mặc định Pico sau khi bật: `1` | Số ký tự tăng tối thiểu trước khi gửi cập nhật trung gian; nội dung cuối luôn được flush |
+| `model_list[].streaming.enabled` | bool | `false` | Cho phép entry model này thử yêu cầu provider streaming |
+
+Các biến môi trường Telegram cũ vẫn tương thích: `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_ENABLED`, `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_THROTTLE_SECONDS`, và `PICOCLAW_CHANNELS_TELEGRAM_STREAMING_MIN_GROWTH_CHARS`. Chúng chỉ áp dụng cho Telegram settings và không bật hoặc thay đổi `settings.streaming` của Pico.
+
+Hành vi lỗi được giữ thận trọng: nếu streaming lỗi trước khi gửi bất kỳ chunk hiển thị nào, PicoClaw thử lại một lần qua đường dẫn `Chat()` thông thường. Nếu đã có chunk hiển thị cho người dùng, PicoClaw không gửi thêm một câu trả lời non-streaming thứ hai để tránh lặp output.
+
 ### Kiến Trúc Provider

 PicoClaw định tuyến provider theo họ giao thức:
@@ -293,7 +293,7 @@ PicoClaw 默认在沙箱环境中运行。Agent 只能访问配置的工作区
 | `tools.exec.custom_deny_patterns` | string[] | `[]` | 自定义阻止的正则表达式模式 |
 | `tools.exec.custom_allow_patterns` | string[] | `[]` | 自定义允许的正则表达式模式 |

-> **安全提示：** Symlink 保护默认启用——所有文件路径在白名单匹配前都会通过 `filepath.EvalSymlinks` 解析，防止符号链接逃逸攻击。
+> **安全提示：** Symlink 保护默认启用——所有文件路径在允许列表匹配前都会通过 `filepath.EvalSymlinks` 解析，防止符号链接逃逸攻击。

 #### 已知限制：构建工具的子进程

@@ -537,6 +537,55 @@ Agent 读取 HEARTBEAT.md
 - 如果未设置 `provider`，PicoClaw 会把 `model` 第一个 `/` 之前的字段当作 provider，并把第一个 `/` 之后的全部内容当作最终模型 ID。
 - 这意味着 `"model": "openrouter/openai/gpt-5.4"` 这样的兼容写法仍然可用，并会把 `openai/gpt-5.4` 发送给 OpenRouter。

+#### 流式输出配置
+
+Provider 流式输出采用双开关，默认关闭。只有当前 channel 的 `settings.streaming.enabled` 和当前模型条目的 `streaming.enabled` 都为 `true`，并且 provider 与 channel 都支持流式能力时，Agent 才会尝试流式请求；任一条件不满足时仍使用普通非流式请求。
+
+当前完整落地的是 Pico WebUI。Pico 使用已有的 `message.create` 创建第一条 assistant 消息，随后用 `message.update` 更新同一条消息，不新增协议消息类型。
+
+不需要流式时请省略 `streaming` 配置块。省略表示关闭，不需要写 `"streaming": {"enabled": false}`。
+
+开启示例：
+
+```json
+{
+  "model_list": [
+    {
+      "model_name": "gpt-5.4",
+      "provider": "openai",
+      "model": "gpt-5.4",
+      "api_keys": ["sk-your-openai-key"],
+      "streaming": {
+        "enabled": true
+      }
+    }
+  ],
+  "channel_list": {
+    "pico": {
+      "enabled": true,
+      "type": "pico",
+      "settings": {
+        "token": "YOUR_PICO_TOKEN",
+        "streaming": {
+          "enabled": true
+        }
+      }
+    }
+  }
+}
+```
+
+| 字段 | 类型 | 默认值 | 说明 |
+|------|------|--------|------|
+| `channel_list.<name>.settings.streaming.enabled` | bool | `false` | 是否允许该 channel 尝试展示 provider 流式输出 |
+| `channel_list.<name>.settings.streaming.throttle_seconds` | int | Pico 开启后默认 `0` | 中间更新的最小时间间隔，最终内容不受此限制 |
+| `channel_list.<name>.settings.streaming.min_growth_chars` | int | Pico 开启后默认 `1` | 中间更新相比上次发送至少增长的字符数，最终内容不受此限制 |
+| `model_list[].streaming.enabled` | bool | `false` | 是否允许该模型条目尝试 provider 流式请求 |
+
+Telegram 旧环境变量仍兼容：`PICOCLAW_CHANNELS_TELEGRAM_STREAMING_ENABLED`、`PICOCLAW_CHANNELS_TELEGRAM_STREAMING_THROTTLE_SECONDS`、`PICOCLAW_CHANNELS_TELEGRAM_STREAMING_MIN_GROWTH_CHARS`。这些环境变量只作用于 Telegram settings，不会开启或修改 Pico 的 `settings.streaming`。
+
+失败处理保持保守：如果还没有任何可见 chunk 就失败，PicoClaw 会回退到普通 `Chat()` 路径重试一次；如果已经有 chunk 展示给用户，则不会再发送一条非流式最终答案，避免界面重复输出。
+
 #### 各厂商配置示例

 <details>
@@ -113,10 +113,13 @@ Cette conception permet également le **support multi-agents** avec une sélecti
 | `max_tokens_field` | string | Non | Remplace le nom du champ max tokens dans le corps de la requête (ex : `max_completion_tokens` pour les modèles o1) |
 | `thinking_level` | string | Non | Niveau de pensée étendue : `off`, `low`, `medium`, `high`, `xhigh` ou `adaptive` |
 | `extra_body` | object | Non | Champs supplémentaires à injecter dans chaque corps de requête |
+| `streaming.enabled` | bool | Non | Opt-in pour le streaming provider sur cette entrée de modèle. Par défaut `false`, et le channel actif doit aussi avoir `settings.streaming.enabled` à `true` |
 | `rpm` | int | Non | Limite de requêtes par minute |
 | `fallbacks` | string[] | Non | Noms des modèles de secours pour le basculement automatique |
 | `enabled` | bool | Non | Activer ou désactiver cette entrée de modèle (par défaut : `true`) |

+Lorsque le streaming est désactivé, omettez le bloc `streaming`. Écrire `"streaming": {"enabled": false}` est optionnel et n'est pas nécessaire.
+
 #### Exemples par Vendor

 **OpenAI**
@@ -114,10 +114,13 @@
 | `max_tokens_field` | string | いいえ | リクエストボディの max tokens フィールド名を上書き（例：o1 モデルでは `max_completion_tokens`） |
 | `thinking_level` | string | いいえ | 拡張思考レベル：`off`、`low`、`medium`、`high`、`xhigh`、`adaptive` |
 | `extra_body` | object | いいえ | 各リクエストボディに注入する追加フィールド |
+| `streaming.enabled` | bool | いいえ | このモデルエントリで provider ストリーミングを試行するための opt-in。デフォルトは `false` で、アクティブな channel の `settings.streaming.enabled` も `true` である必要があります |
 | `rpm` | int | いいえ | 1 分あたりのリクエストレート制限 |
 | `fallbacks` | string[] | いいえ | 自動フェイルオーバーのフォールバックモデル名 |
 | `enabled` | bool | いいえ | このモデルエントリを有効にするかどうか（デフォルト：`true`） |

+ストリーミングを無効にする場合は `streaming` ブロックを省略してください。`"streaming": {"enabled": false}` を書くことは任意であり、必須ではありません。
+
 #### ベンダー別設定例

 **OpenAI**
@@ -131,10 +131,13 @@ This design also enables **multi-agent support** with flexible provider selectio
 | `tool_schema_transform` | string | No | Optional compatibility transform for tool parameter schemas. Default: disabled. Supported values: `simple`.                                                                                             |
 | `extra_body` | object | No | Additional fields to inject into every request body                                                                                                                                                                                         |
 | `custom_headers` | object | No | Additional HTTP headers to inject into every request (e.g., `{"X-Source":"coding-plan"}`). If a key matches a built-in header, the custom value overrides the built-in one (e.g., `Authorization`, `User-Agent`, `Content-Type`, `Accept`). |
+| `streaming.enabled` | bool | No | Opt-in for provider streaming on this model entry. Defaults to `false` and also requires the active channel's `settings.streaming.enabled` to be `true`. |
 | `rpm` | int | No | Per-minute request rate limit                                                                                                                                                                                                               |
 | `fallbacks` | string[] | No | Fallback model names for automatic failover                                                                                                                                                                                                 |
 | `enabled` | bool | No | Whether this model entry is active (default: `true`)                                                                                                                                                                                        |

+When streaming is disabled, omit the `streaming` block. Writing `"streaming": {"enabled": false}` is optional and not needed in generated or hand-written config.
+
 #### Tool Schema Compatibility

 By default, PicoClaw now forwards tool JSON Schemas unchanged.
@@ -113,10 +113,13 @@ Este design também permite **suporte multi-agente** com seleção flexível de
 | `max_tokens_field` | string | Não | Substitui o nome do campo max tokens no corpo da requisição (ex: `max_completion_tokens` para modelos o1) |
 | `thinking_level` | string | Não | Nível de pensamento estendido: `off`, `low`, `medium`, `high`, `xhigh` ou `adaptive` |
 | `extra_body` | object | Não | Campos adicionais para injetar em cada corpo de requisição |
+| `streaming.enabled` | bool | Não | Opt-in para provider streaming nesta entrada de modelo. O padrão é `false` e o canal ativo também precisa de `settings.streaming.enabled` como `true` |
 | `rpm` | int | Não | Limite de requisições por minuto |
 | `fallbacks` | string[] | Não | Nomes dos modelos de fallback para failover automático |
 | `enabled` | bool | Não | Ativar ou desativar esta entrada de modelo (padrão: `true`) |

+Quando streaming estiver desativado, omita o bloco `streaming`. Escrever `"streaming": {"enabled": false}` é opcional e não é necessário.
+
 #### Exemplos por Vendor

 **OpenAI**
@@ -113,10 +113,13 @@ Thiết kế này cũng cho phép **hỗ trợ đa agent** với lựa chọn pr
 | `max_tokens_field` | string | Không | Ghi đè tên trường max tokens trong request body (ví dụ: `max_completion_tokens` cho model o1) |
 | `thinking_level` | string | Không | Mức độ tư duy mở rộng: `off`, `low`, `medium`, `high`, `xhigh` hoặc `adaptive` |
 | `extra_body` | object | Không | Các trường bổ sung để chèn vào mỗi request body |
+| `streaming.enabled` | bool | Không | Opt-in cho provider streaming trên entry model này. Mặc định là `false` và channel đang hoạt động cũng cần `settings.streaming.enabled` là `true` |
 | `rpm` | int | Không | Giới hạn tốc độ yêu cầu mỗi phút |
 | `fallbacks` | string[] | Không | Tên model dự phòng cho failover tự động |
 | `enabled` | bool | Không | Kích hoạt hay vô hiệu hóa entry model này (mặc định: `true`) |

+Khi không dùng streaming, hãy bỏ qua block `streaming`. Viết `"streaming": {"enabled": false}` là tùy chọn và không cần thiết.
+
 #### Ví Dụ Theo Vendor

 **OpenAI**
@@ -123,6 +123,7 @@
 | `proxy` | string | 否 | 此模型条目的 HTTP 代理 URL |
 | `user_agent` | string | 否 | 自定义 `User-Agent` 请求头（支持 OpenAI 兼容、Gemini、Anthropic 和 Azure provider） |
 | `request_timeout` | int | 否 | 请求超时时间（秒），默认值因 provider 而异 |
+| `streaming.enabled` | bool | 否 | 是否允许此模型条目尝试 provider 流式请求，默认 `false`。它只表达模型/端点能力 opt-in，实际还需要当前 channel 的 `settings.streaming.enabled` 同时开启 |
 | `max_tokens_field` | string | 否 | 覆盖请求体中 max tokens 的字段名（如 o1 模型使用 `max_completion_tokens`） |
 | `thinking_level` | string | 否 | 扩展思考级别：`off`、`low`、`medium`、`high`、`xhigh` 或 `adaptive` |
 | `extra_body` | object | 否 | 注入到每个请求体中的额外字段 |
@@ -131,6 +132,8 @@
 | `fallbacks` | string[] | 否 | 自动故障转移的备用模型名称 |
 | `enabled` | bool | 否 | 是否启用此模型条目（默认：`true`） |

+不需要流式时请省略 `streaming` 配置块。写 `"streaming": {"enabled": false}` 是可选的，手写或生成配置时都不需要。
+
 #### `provider` / `model` 解析规则

 PicoClaw 按下面的规则解析 `provider` 和最终发给上游的模型 ID：
@@ -31,6 +31,10 @@ func (a *messageBusAdapter) PublishOutboundMedia(ctx context.Context, msg bus.Ou
 	return a.inner.PublishOutboundMedia(ctx, msg)
 }

+func (a *messageBusAdapter) GetStreamer(ctx context.Context, channel, chatID, sessionKey string) (bus.Streamer, bool) {
+	return a.inner.GetStreamer(ctx, channel, chatID, sessionKey)
+}
+
 func (a *messageBusAdapter) InboundChan() <-chan bus.InboundMessage {
 	return a.inner.InboundChan()
 }
@@ -119,9 +119,11 @@ const (
 	pendingTurnPrefix          = "pending-"
 	metadataKeyMessageKind     = "message_kind"
 	metadataKeyToolCalls       = "tool_calls"
+	metadataKeyOutboundKind    = "outbound_kind"
 	messageKindThought         = "thought"
 	messageKindToolFeedback    = "tool_feedback"
 	messageKindToolCalls       = "tool_calls"
+	outboundKindFinal          = "final"
 	metadataKeyAccountID       = "account_id"
 	metadataKeyGuildID         = "guild_id"
 	metadataKeyTeamID          = "team_id"
@@ -585,7 +587,7 @@ func (al *AgentLoop) runAgentLoop(
 			opts.Dispatch.SessionKey,
 			opts.Dispatch.SessionScope,
 		)
-		al.bus.PublishOutbound(ctx, bus.OutboundMessage{
+		msg := bus.OutboundMessage{
 			Context: outboundContextFromInbound(
 				opts.Dispatch.InboundContext,
 				opts.Dispatch.Channel(),
@@ -597,7 +599,9 @@ func (al *AgentLoop) runAgentLoop(
 			Scope:        scope,
 			Content:      result.finalContent,
 			ContextUsage: computeContextUsage(agent, opts.Dispatch.SessionKey),
-		})
+		}
+		markFinalOutbound(&msg)
+		al.bus.PublishOutbound(ctx, msg)
 	}

 	if result.finalContent != "" {
@@ -75,12 +75,14 @@ func (al *AgentLoop) PublishResponseIfNeeded(ctx context.Context, channel, chatI
 	}

 	msg := bus.OutboundMessage{
-		Context: bus.NewOutboundContext(channel, chatID, ""),
-		Content: response,
+		Context:    bus.NewOutboundContext(channel, chatID, ""),
+		SessionKey: sessionKey,
+		Content:    response,
 	}
 	if sessionKey != "" {
 		msg.ContextUsage = computeContextUsage(al.agentForSession(sessionKey), sessionKey)
 	}
+	markFinalOutbound(&msg)
 	al.bus.PublishOutbound(ctx, msg)
 	logger.InfoCF("agent", "Published outbound response",
 		map[string]any{
@@ -100,7 +102,7 @@ func (al *AgentLoop) targetReasoningChannelID(channelName string) (chatID string
 	return ""
 }

-func (al *AgentLoop) publishPicoReasoning(ctx context.Context, reasoningContent, chatID string) {
+func (al *AgentLoop) publishPicoReasoning(ctx context.Context, reasoningContent, chatID, sessionKey string) {
 	if reasoningContent == "" || chatID == "" {
 		return
 	}
@@ -120,7 +122,8 @@ func (al *AgentLoop) publishPicoReasoning(ctx context.Context, reasoningContent,
 				metadataKeyMessageKind: messageKindThought,
 			},
 		},
-		Content: reasoningContent,
+		SessionKey: sessionKey,
+		Content:    reasoningContent,
 	}); err != nil {
 		if errors.Is(err, context.DeadlineExceeded) || errors.Is(err, context.Canceled) ||
 			errors.Is(err, bus.ErrBusClosed) {
@@ -284,6 +284,55 @@ func TestPublishResponseIfNeeded_DismissesToolFeedbackWhenMessageToolAlreadySent
 	}
 }

+func TestPublishResponseIfNeeded_MarksFinalOutbound(t *testing.T) {
+	al, _, msgBus, provider, cleanup := newTestAgentLoop(t)
+	defer cleanup()
+	_ = provider
+
+	al.PublishResponseIfNeeded(context.Background(), "pico", "pico:session-1", "session-1", "final reply")
+
+	select {
+	case outbound := <-msgBus.OutboundChan():
+		if outbound.Content != "final reply" {
+			t.Fatalf("outbound content = %q, want final reply", outbound.Content)
+		}
+		if outbound.Context.Raw[metadataKeyOutboundKind] != outboundKindFinal {
+			t.Fatalf("outbound kind = %q, want %q", outbound.Context.Raw[metadataKeyOutboundKind], outboundKindFinal)
+		}
+		if outbound.SessionKey != "session-1" {
+			t.Fatalf("outbound session key = %q, want session-1", outbound.SessionKey)
+		}
+	case <-time.After(time.Second):
+		t.Fatal("expected final outbound")
+	}
+}
+
+func TestPublishPicoReasoningIncludesSessionKey(t *testing.T) {
+	al, _, msgBus, provider, cleanup := newTestAgentLoop(t)
+	defer cleanup()
+	_ = provider
+
+	al.publishPicoReasoning(context.Background(), "reasoning", "pico-chat", "session-1")
+
+	select {
+	case outbound := <-msgBus.OutboundChan():
+		if outbound.Channel != "pico" || outbound.ChatID != "pico-chat" {
+			t.Fatalf("unexpected outbound target: %+v", outbound)
+		}
+		if outbound.Content != "reasoning" {
+			t.Fatalf("outbound content = %q, want reasoning", outbound.Content)
+		}
+		if outbound.SessionKey != "session-1" {
+			t.Fatalf("outbound session key = %q, want session-1", outbound.SessionKey)
+		}
+		if outbound.Context.Raw[metadataKeyMessageKind] != messageKindThought {
+			t.Fatalf("message kind = %q, want %q", outbound.Context.Raw[metadataKeyMessageKind], messageKindThought)
+		}
+	case <-time.After(time.Second):
+		t.Fatal("expected pico reasoning outbound")
+	}
+}
+
 func TestProcessMessage_IncludesCurrentSenderInDynamicContext(t *testing.T) {
 	tmpDir, err := os.MkdirTemp("", "agent-test-*")
 	if err != nil {
@@ -86,6 +86,16 @@ func outboundMessageForTurn(ts *turnState, content string) bus.OutboundMessage {
 	}
 }

+func markFinalOutbound(msg *bus.OutboundMessage) {
+	if msg == nil {
+		return
+	}
+	if msg.Context.Raw == nil {
+		msg.Context.Raw = make(map[string]string, 1)
+	}
+	msg.Context.Raw[metadataKeyOutboundKind] = outboundKindFinal
+}
+
 func outboundMessageForTurnWithKind(ts *turnState, content, kind string) bus.OutboundMessage {
 	msg := outboundMessageForTurn(ts, content)
 	if strings.TrimSpace(kind) == "" {
@@ -21,6 +21,9 @@ type MessageBus interface {
 	// PublishOutboundMedia sends an outbound media message.
 	PublishOutboundMedia(ctx context.Context, msg bus.OutboundMediaMessage) error

+	// GetStreamer returns a channel streamer when the active channel supports streaming.
+	GetStreamer(ctx context.Context, channel, chatID, sessionKey string) (bus.Streamer, bool)
+
 	// InboundChan returns the channel for receiving inbound messages.
 	InboundChan() <-chan bus.InboundMessage
 }
@@ -29,6 +29,36 @@ func modelConfigIdentityKey(mc *config.ModelConfig) string {
 	return ""
 }

+func effectiveDefaultProvider(defaultProvider string) string {
+	defaultProvider = strings.TrimSpace(defaultProvider)
+	if defaultProvider == "" {
+		return "openai"
+	}
+	return providers.NormalizeProvider(defaultProvider)
+}
+
+func modelProviderAndIDForResolution(defaultProvider string, mc *config.ModelConfig) (provider string, modelID string) {
+	if mc == nil {
+		return "", ""
+	}
+	return providers.ExtractProtocol(mc)
+}
+
+func cloneModelConfigForResolution(
+	defaultProvider string,
+	mc *config.ModelConfig,
+	workspace string,
+) *config.ModelConfig {
+	if mc == nil {
+		return nil
+	}
+	clone := *mc
+	if clone.Workspace == "" {
+		clone.Workspace = workspace
+	}
+	return &clone
+}
+
 func candidateFromModelConfig(
 	defaultProvider string,
 	mc *config.ModelConfig,
@@ -37,7 +67,7 @@ func candidateFromModelConfig(
 		return providers.FallbackCandidate{}, false
 	}

-	protocol, modelID := providers.ExtractProtocol(mc)
+	protocol, modelID := modelProviderAndIDForResolution(defaultProvider, mc)
 	if strings.TrimSpace(modelID) == "" {
 		return providers.FallbackCandidate{}, false
 	}
@@ -50,7 +80,7 @@ func candidateFromModelConfig(
 	}, true
 }

-func lookupModelConfigByRef(cfg *config.Config, raw string) *config.ModelConfig {
+func lookupModelConfigByRef(cfg *config.Config, raw string, defaultProvider ...string) *config.ModelConfig {
 	raw = strings.TrimSpace(raw)
 	if raw == "" || cfg == nil {
 		return nil
@@ -66,6 +96,10 @@ func lookupModelConfigByRef(cfg *config.Config, raw string) *config.ModelConfig
 		rawKey = providers.ModelKey(rawRef.Provider, rawRef.Model)
 	}

+	fallbackProvider := ""
+	if len(defaultProvider) > 0 {
+		fallbackProvider = effectiveDefaultProvider(defaultProvider[0])
+	}
 	for i := range cfg.ModelList {
 		mc := cfg.ModelList[i]
 		if mc == nil {
@@ -75,12 +109,14 @@ func lookupModelConfigByRef(cfg *config.Config, raw string) *config.ModelConfig
 		if fullModel == "" {
 			continue
 		}
+		protocol, modelID := modelProviderAndIDForResolution(fallbackProvider, mc)
 		if fullModel == raw {
 			return mc
 		}
-		protocol, modelID := providers.ExtractProtocol(mc)
 		if modelID == raw {
-			return mc
+			if fallbackProvider == "" || providers.NormalizeProvider(protocol) == fallbackProvider {
+				return mc
+			}
 		}
 		if rawKey != "" && providers.ModelKey(protocol, modelID) == rawKey {
 			return mc
@@ -99,8 +135,9 @@ func resolveModelCandidate(
 	if raw == "" {
 		return providers.FallbackCandidate{}, false
 	}
+	defaultProvider = effectiveDefaultProvider(defaultProvider)

-	if mc := lookupModelConfigByRef(cfg, raw); mc != nil {
+	if mc := lookupModelConfigByRef(cfg, raw, defaultProvider); mc != nil {
 		return candidateFromModelConfig(defaultProvider, mc)
 	}

@@ -177,3 +214,48 @@ func resolvedModelConfig(cfg *config.Config, modelName, workspace string) (*conf

 	return &clone, nil
 }
+
+func resolveActiveModelConfig(
+	cfg *config.Config,
+	workspace string,
+	candidates []providers.FallbackCandidate,
+	activeModel string,
+	defaultProvider string,
+) *config.ModelConfig {
+	if cfg == nil {
+		return nil
+	}
+	defaultProvider = effectiveDefaultProvider(defaultProvider)
+
+	if len(candidates) > 0 {
+		candidate := candidates[0]
+		identityKey := strings.TrimSpace(candidate.IdentityKey)
+		if identityKey != "" {
+			for _, mc := range cfg.ModelList {
+				if mc == nil || modelConfigIdentityKey(mc) != identityKey {
+					continue
+				}
+				protocol, modelID := modelProviderAndIDForResolution(defaultProvider, mc)
+				if providers.ModelKey(protocol, modelID) == providers.ModelKey(candidate.Provider, candidate.Model) {
+					return cloneModelConfigForResolution(defaultProvider, mc, workspace)
+				}
+			}
+		}
+		for _, mc := range cfg.ModelList {
+			if mc == nil {
+				continue
+			}
+			protocol, modelID := modelProviderAndIDForResolution(defaultProvider, mc)
+			if providers.ModelKey(protocol, modelID) == providers.ModelKey(candidate.Provider, candidate.Model) {
+				return cloneModelConfigForResolution(defaultProvider, mc, workspace)
+			}
+		}
+		return nil
+	}
+
+	if mc := lookupModelConfigByRef(cfg, activeModel, defaultProvider); mc != nil {
+		return cloneModelConfigForResolution(defaultProvider, mc, workspace)
+	}
+
+	return nil
+}
@@ -0,0 +1,116 @@
+package agent
+
+import (
+	"testing"
+
+	"github.com/sipeed/picoclaw/pkg/config"
+	"github.com/sipeed/picoclaw/pkg/providers"
+)
+
+func TestResolveActiveModelConfig_PrefersCandidateIdentityKey(t *testing.T) {
+	cfg := &config.Config{
+		ModelList: []*config.ModelConfig{
+			{
+				ModelName: "glm-4.7",
+				Provider:  "zhipu",
+				Model:     "glm-4.7",
+				Streaming: config.ModelStreamingConfig{Enabled: false},
+			},
+			{
+				ModelName: "suanneng-glm-4.7",
+				Provider:  "zhipu",
+				Model:     "glm-4.7",
+				Streaming: config.ModelStreamingConfig{Enabled: true},
+			},
+		},
+	}
+
+	got := resolveActiveModelConfig(
+		cfg,
+		"/workspace",
+		[]providers.FallbackCandidate{{
+			Provider:    "zhipu",
+			Model:       "glm-4.7",
+			IdentityKey: "model_name:suanneng-glm-4.7",
+		}},
+		"glm-4.7",
+		"openai",
+	)
+
+	if got == nil {
+		t.Fatal("resolveActiveModelConfig() = nil, want model config")
+	}
+	if got.ModelName != "suanneng-glm-4.7" {
+		t.Fatalf("model_name = %q, want %q", got.ModelName, "suanneng-glm-4.7")
+	}
+	if !got.Streaming.Enabled {
+		t.Fatal("streaming.enabled = false, want true from identity-matched model config")
+	}
+}
+
+func TestResolveActiveModelConfig_LoadBalancedAliasUsesSelectedCandidate(t *testing.T) {
+	cfg := &config.Config{
+		ModelList: []*config.ModelConfig{
+			{
+				ModelName: "lb-model",
+				Model:     "openai/primary",
+				Streaming: config.ModelStreamingConfig{Enabled: false},
+			},
+			{
+				ModelName: "lb-model",
+				Model:     "openai/secondary",
+				Streaming: config.ModelStreamingConfig{Enabled: true},
+			},
+		},
+	}
+
+	got := resolveActiveModelConfig(
+		cfg,
+		"/workspace",
+		[]providers.FallbackCandidate{{
+			Provider:    "openai",
+			Model:       "secondary",
+			IdentityKey: "model_name:lb-model",
+		}},
+		"lb-model",
+		"openai",
+	)
+
+	if got == nil {
+		t.Fatal("resolveActiveModelConfig() = nil, want model config")
+	}
+	if got.Model != "openai/secondary" {
+		t.Fatalf("model = %q, want openai/secondary", got.Model)
+	}
+	if !got.Streaming.Enabled {
+		t.Fatal("streaming.enabled = false, want true from selected load-balanced entry")
+	}
+}
+
+func TestResolveActiveModelConfig_DoesNotFallbackToOpenAIForDefaultProviderCandidate(t *testing.T) {
+	cfg := &config.Config{
+		ModelList: []*config.ModelConfig{
+			{
+				ModelName: "openai-gpt",
+				Provider:  "openai",
+				Model:     "gpt-4o",
+				Streaming: config.ModelStreamingConfig{Enabled: true},
+			},
+		},
+	}
+
+	got := resolveActiveModelConfig(
+		cfg,
+		"/workspace",
+		[]providers.FallbackCandidate{{
+			Provider: "nvidia",
+			Model:    "gpt-4o",
+		}},
+		"gpt-4o",
+		"nvidia",
+	)
+
+	if got != nil {
+		t.Fatalf("resolveActiveModelConfig() = %#v, want nil for non-active provider config", got)
+	}
+}
@@ -58,6 +58,7 @@ func (p *Pipeline) Finalize(
 					Message: err.Error(),
 				},
 			)
+			cancelConfiguredStreamingLLM(turnCtx, exec)
 			return turnResult{status: TurnEndStatusError}, err
 		}
 	}
@@ -73,6 +74,41 @@ func (p *Pipeline) Finalize(
 		)
 	}

+	contextUsage := computeContextUsage(ts.agent, ts.sessionKey)
+	streamErr := finalizeConfiguredStreamingLLM(turnCtx, ts, exec, finalContent, contextUsage)
+	// If streaming never became visible, keep the legacy Pico interim publish path
+	// so the final answer is still delivered outside normal SendResponse.
+	if ((streamErr != nil && !isConfiguredStreamingVisibleError(streamErr)) || exec.streamingFallback) &&
+		!ts.opts.SendResponse && ts.opts.AllowInterimPicoPublish && finalContent != "" {
+		agentID, sessionKey, scope := outboundTurnMetadata(
+			ts.agent.ID,
+			ts.opts.Dispatch.SessionKey,
+			ts.opts.Dispatch.SessionScope,
+		)
+		msg := bus.OutboundMessage{
+			Context: outboundContextFromInbound(
+				ts.opts.Dispatch.InboundContext,
+				ts.opts.Dispatch.Channel(),
+				ts.opts.Dispatch.ChatID(),
+				ts.opts.Dispatch.ReplyToMessageID(),
+			),
+			AgentID:      agentID,
+			SessionKey:   sessionKey,
+			Scope:        scope,
+			Content:      finalContent,
+			ContextUsage: contextUsage,
+		}
+		markFinalOutbound(&msg)
+		_ = al.bus.PublishOutbound(turnCtx, msg)
+	}
+	if streamErr != nil && isConfiguredStreamingVisibleError(streamErr) {
+		ts.setPhase(TurnPhaseCompleted)
+		return turnResult{
+			finalContent: finalContent,
+			status:       TurnEndStatusError,
+			followUps:    append([]bus.InboundMessage(nil), ts.followUps...),
+		}, streamErr
+	}
 	ts.setPhase(TurnPhaseCompleted)
 	return turnResult{
 		finalContent: finalContent,
@@ -98,15 +98,21 @@ func (p *Pipeline) CallLLM(
 		switch decision.normalizedAction() {
 		case HookActionContinue, HookActionModify:
 			if llmReq != nil {
+				prevModel := exec.llmModel
 				exec.llmModel = llmReq.Model
 				exec.callMessages = llmReq.Messages
 				exec.providerToolDefs = llmReq.Tools
 				exec.llmOpts = llmReq.Options
+				if strings.TrimSpace(exec.llmModel) != "" && exec.llmModel != prevModel {
+					p.applyBeforeLLMModelRewrite(ts, exec)
+				}
 			}
 		case HookActionAbortTurn:
+			cancelConfiguredStreamingLLM(turnCtx, exec)
 			exec.abortedByHook = true
 			return ControlBreak, nil
 		case HookActionHardAbort:
+			cancelConfiguredStreamingLLM(turnCtx, exec)
 			_ = ts.requestHardAbort()
 			exec.abortedByHardAbort = true
 			return ControlBreak, nil
@@ -155,14 +161,30 @@ func (p *Pipeline) CallLLM(
 		al.activeRequests.Add(1)
 		defer al.activeRequests.Done()

+		if response, handled, streamErr := p.tryConfiguredStreamingLLM(
+			providerCtx,
+			ts,
+			exec,
+			messagesForCall,
+			toolDefsForCall,
+		); handled {
+			return response, streamErr
+		}
+
 		if len(exec.activeCandidates) > 1 && p.Fallback != nil {
 			fbResult, fbErr := p.Fallback.Execute(
 				providerCtx,
 				exec.activeCandidates,
 				func(ctx context.Context, provider, model string) (*providers.LLMResponse, error) {
-					candidateProvider := exec.activeProvider
-					if cp, ok := ts.agent.CandidateProviders[providers.ModelKey(provider, model)]; ok {
-						candidateProvider = cp
+					candidateProvider, err := providerForFallbackCandidate(
+						ts.agent,
+						exec.activeProvider,
+						exec.activeCandidates,
+						provider,
+						model,
+					)
+					if err != nil {
+						return nil, err
 					}
 					return candidateProvider.Chat(ctx, messagesForCall, toolDefsForCall, model, exec.llmOpts)
 				},
@@ -203,6 +225,9 @@ func (p *Pipeline) CallLLM(
 			exec.abortedByHardAbort = true
 			return ControlBreak, nil
 		}
+		if isConfiguredStreamingVisibleError(err) {
+			break
+		}

 		// Retry without media if vision is unsupported
 		if hasMediaRefs(exec.callMessages) && isVisionUnsupportedError(err) && retry < maxRetries {
@@ -415,9 +440,11 @@ func (p *Pipeline) CallLLM(
 				exec.response = llmResp.Response
 			}
 		case HookActionAbortTurn:
+			cancelConfiguredStreamingLLM(turnCtx, exec)
 			exec.abortedByHook = true
 			return ControlBreak, nil
 		case HookActionHardAbort:
+			cancelConfiguredStreamingLLM(turnCtx, exec)
 			_ = ts.requestHardAbort()
 			exec.abortedByHardAbort = true
 			return ControlBreak, nil
@@ -438,10 +465,20 @@ func (p *Pipeline) CallLLM(
 		// Pico tool-call turns publish their reasoning/content/tool summary as a
 		// structured sequence after the tool-call payload is normalized below.
 	} else if ts.channel == "pico" {
-		// Publish pico thoughts before the turn context is canceled at return time.
-		// The async variant can race with turn teardown and intermittently drop the
-		// thought message in CI even though the LLM produced reasoning content.
-		al.publishPicoReasoning(turnCtx, reasoningContent, ts.chatID)
+		if exec.streamingPublisher != nil && exec.streamingPublisher.ReasoningPublished() {
+			if err := exec.streamingPublisher.FinalizeReasoning(turnCtx, reasoningContent); err != nil {
+				logger.WarnCF("agent", "Failed to finalize streamed pico reasoning", map[string]any{
+					"channel": ts.channel,
+					"chat_id": ts.chatID,
+					"error":   err.Error(),
+				})
+			}
+		} else {
+			// Publish pico thoughts before the turn context is canceled at return time.
+			// The async variant can race with turn teardown and intermittently drop the
+			// thought message in CI even though the LLM produced reasoning content.
+			al.publishPicoReasoning(turnCtx, reasoningContent, ts.chatID, ts.sessionKey)
+		}
 	} else {
 		go al.handleReasoning(
 			turnCtx,
@@ -483,6 +520,7 @@ func (p *Pipeline) CallLLM(
 			responseContent = exec.response.ReasoningContent
 		}
 		if steerMsgs := al.dequeueSteeringMessagesForScope(ts.sessionKey); len(steerMsgs) > 0 {
+			cancelConfiguredStreamingLLM(turnCtx, exec)
 			logger.InfoCF("agent", "Steering arrived after direct LLM response; continuing turn",
 				map[string]any{
 					"agent_id":       ts.agent.ID,
@@ -492,6 +530,7 @@ func (p *Pipeline) CallLLM(
 			exec.pendingMessages = append(exec.pendingMessages, steerMsgs...)
 			return ControlContinue, nil
 		}
+
 		exec.finalContent = responseContent
 		logger.InfoCF("agent", "LLM response without tool calls (direct answer)",
 			map[string]any{
@@ -501,6 +540,7 @@ func (p *Pipeline) CallLLM(
 			})
 		return ControlBreak, nil
 	}
+	cancelConfiguredStreamingLLM(turnCtx, exec)

 	// Tool-call path: normalize and prepare for tool execution
 	exec.normalizedToolCalls = make([]providers.ToolCall, 0, len(exec.response.ToolCalls))
@@ -575,3 +615,44 @@ func (p *Pipeline) CallLLM(

 	return ControlToolLoop, nil
 }
+
+func (p *Pipeline) applyBeforeLLMModelRewrite(ts *turnState, exec *turnExecution) {
+	if p == nil || ts == nil || ts.agent == nil || exec == nil {
+		return
+	}
+	rawModel := strings.TrimSpace(exec.llmModel)
+	if rawModel == "" {
+		return
+	}
+
+	defaultProvider := "openai"
+	if p.Cfg != nil {
+		if provider := strings.TrimSpace(p.Cfg.Agents.Defaults.Provider); provider != "" {
+			defaultProvider = provider
+		}
+	}
+	defaultProvider = effectiveDefaultProvider(defaultProvider)
+	candidates := resolveModelCandidates(p.Cfg, defaultProvider, rawModel, nil)
+	exec.activeCandidates = candidates
+	exec.activeModel = resolvedCandidateModel(candidates, rawModel)
+	exec.llmModel = exec.activeModel
+	exec.activeModelConfig = resolveActiveModelConfig(p.Cfg, ts.agent.Workspace, candidates, rawModel, defaultProvider)
+}
+
+func providerForFallbackCandidate(
+	agent *AgentInstance,
+	activeProvider providers.LLMProvider,
+	activeCandidates []providers.FallbackCandidate,
+	provider string,
+	model string,
+) (providers.LLMProvider, error) {
+	if agent != nil {
+		if cp, ok := agent.CandidateProviders[providers.ModelKey(provider, model)]; ok && cp != nil {
+			return cp, nil
+		}
+	}
+	if activeProvider == nil {
+		return nil, fmt.Errorf("fallback model %q has no active provider", model)
+	}
+	return activeProvider, nil
+}
@@ -99,6 +99,13 @@ func (p *Pipeline) SetupTurn(ctx context.Context, ts *turnState) (*turnExecution
 	)
 	exec.activeCandidates = activeCandidates
 	exec.activeModel = activeModel
+	exec.activeModelConfig = resolveActiveModelConfig(
+		p.Cfg,
+		ts.agent.Workspace,
+		activeCandidates,
+		activeModel,
+		p.Cfg.Agents.Defaults.Provider,
+	)
 	exec.activeProvider = activeProvider
 	exec.usedLight = usedLight

@@ -0,0 +1,478 @@
+package agent
+
+import (
+	"context"
+	"errors"
+	"fmt"
+	"reflect"
+	"strings"
+	"time"
+
+	"github.com/sipeed/picoclaw/pkg/bus"
+	"github.com/sipeed/picoclaw/pkg/config"
+	"github.com/sipeed/picoclaw/pkg/logger"
+	"github.com/sipeed/picoclaw/pkg/providers"
+)
+
+func (p *Pipeline) tryConfiguredStreamingLLM(
+	ctx context.Context,
+	ts *turnState,
+	exec *turnExecution,
+	messagesForCall []providers.Message,
+	toolDefsForCall []providers.ToolDefinition,
+) (*providers.LLMResponse, bool, error) {
+	exec.streamingPublisher = nil
+	exec.streamingFallback = false
+	if !p.configuredStreamingEligible(ts, exec) {
+		return nil, false, nil
+	}
+	streamProvider, ok := exec.activeProvider.(providers.StreamingProvider)
+	if !ok {
+		logger.DebugCF("agent", "configured streaming not used", map[string]any{
+			"agent_id": ts.agent.ID,
+			"channel":  ts.channel,
+			"model":    exec.activeModel,
+			"reason":   "provider_not_streaming",
+		})
+		return nil, false, nil
+	}
+
+	streamer, ok := p.Bus.GetStreamer(ctx, ts.channel, ts.chatID, ts.sessionKey)
+	if !ok || streamer == nil {
+		logger.DebugCF("agent", "configured streaming not used", map[string]any{
+			"agent_id": ts.agent.ID,
+			"channel":  ts.channel,
+			"chat_id":  ts.chatID,
+			"model":    exec.activeModel,
+			"reason":   "streamer_unavailable",
+		})
+		return nil, false, nil
+	}
+
+	publisher := &streamingChunkPublisher{
+		streamer: streamer,
+		channel:  ts.channel,
+		chatID:   ts.chatID,
+	}
+
+	logger.DebugCF("agent", "configured streaming enabled", map[string]any{
+		"agent_id": ts.agent.ID,
+		"channel":  ts.channel,
+		"chat_id":  ts.chatID,
+		"model":    exec.llmModel,
+	})
+
+	chunkCount := 0
+	firstChunkAt := time.Time{}
+	lastChunkAt := time.Time{}
+	recordChunk := func() {
+		now := time.Now()
+		chunkCount++
+		if firstChunkAt.IsZero() {
+			firstChunkAt = now
+		}
+		lastChunkAt = now
+	}
+	var response *providers.LLMResponse
+	var streamErr error
+	if eventProvider, ok := exec.activeProvider.(providers.StreamingEventProvider); ok {
+		response, streamErr = eventProvider.ChatStreamEvents(
+			ctx,
+			messagesForCall,
+			toolDefsForCall,
+			exec.llmModel,
+			exec.llmOpts,
+			func(chunk providers.StreamChunk) {
+				recordChunk()
+				if strings.TrimSpace(chunk.ReasoningContent) != "" {
+					publisher.UpdateReasoning(ctx, chunk.ReasoningContent)
+				}
+				if strings.TrimSpace(chunk.Content) != "" {
+					publisher.Update(ctx, chunk.Content)
+				}
+			},
+		)
+	} else {
+		response, streamErr = streamProvider.ChatStream(
+			ctx,
+			messagesForCall,
+			toolDefsForCall,
+			exec.llmModel,
+			exec.llmOpts,
+			func(accumulated string) {
+				recordChunk()
+				publisher.Update(ctx, accumulated)
+			},
+		)
+	}
+	logConfiguredStreamingSummary(ts, exec, chunkCount, firstChunkAt, lastChunkAt, streamErr)
+	if streamErr == nil {
+		if updateErr := publisher.Err(); updateErr != nil {
+			logFields := map[string]any{
+				"agent_id": ts.agent.ID,
+				"channel":  ts.channel,
+				"model":    exec.llmModel,
+				"error":    updateErr.Error(),
+			}
+			if publisher.Published() {
+				logger.WarnCF("agent", "ChatStream update failed after visible output", logFields)
+				return nil, true, configuredStreamingVisibleError{err: updateErr}
+			}
+			logger.WarnCF("agent", "ChatStream update failed before visible output; retrying with Chat", logFields)
+			publisher.Cancel(ctx)
+			fallbackResponse, err := exec.activeProvider.Chat(
+				ctx,
+				messagesForCall,
+				toolDefsForCall,
+				exec.llmModel,
+				exec.llmOpts,
+			)
+			if err == nil && fallbackResponse != nil {
+				exec.streamingFallback = true
+			}
+			return fallbackResponse, true, err
+		}
+	}
+	if streamErr != nil {
+		if !publisher.Published() {
+			logger.WarnCF("agent", "ChatStream failed before visible output; retrying with Chat", map[string]any{
+				"agent_id": ts.agent.ID,
+				"channel":  ts.channel,
+				"model":    exec.llmModel,
+				"error":    streamErr.Error(),
+			})
+			publisher.Cancel(ctx)
+			fallbackResponse, err := exec.activeProvider.Chat(
+				ctx,
+				messagesForCall,
+				toolDefsForCall,
+				exec.llmModel,
+				exec.llmOpts,
+			)
+			if err == nil && fallbackResponse != nil {
+				exec.streamingFallback = true
+			}
+			return fallbackResponse, true, err
+		}
+		return nil, true, configuredStreamingVisibleError{err: streamErr}
+	}
+
+	if response != nil {
+		exec.streamingPublisher = publisher
+	}
+
+	return response, true, nil
+}
+
+func logConfiguredStreamingSummary(
+	ts *turnState,
+	exec *turnExecution,
+	chunkCount int,
+	firstChunkAt time.Time,
+	lastChunkAt time.Time,
+	streamErr error,
+) {
+	fields := map[string]any{
+		"chunks": chunkCount,
+	}
+	if ts != nil {
+		fields["agent_id"] = ts.agent.ID
+		fields["channel"] = ts.channel
+	}
+	if exec != nil {
+		fields["model"] = exec.llmModel
+	}
+	if !firstChunkAt.IsZero() && !lastChunkAt.IsZero() {
+		fields["chunk_span_ms"] = lastChunkAt.Sub(firstChunkAt).Milliseconds()
+	}
+	if streamErr != nil {
+		fields["error"] = streamErr.Error()
+	}
+	logger.DebugCF("agent", "configured streaming completed", fields)
+}
+
+type configuredStreamingVisibleError struct {
+	err error
+}
+
+func (e configuredStreamingVisibleError) Error() string {
+	if e.err == nil {
+		return "configured streaming failed after visible output"
+	}
+	return e.err.Error()
+}
+
+func (e configuredStreamingVisibleError) Unwrap() error {
+	return e.err
+}
+
+func isConfiguredStreamingVisibleError(err error) bool {
+	var visibleErr configuredStreamingVisibleError
+	return errors.As(err, &visibleErr)
+}
+
+func finalizeConfiguredStreamingLLM(
+	ctx context.Context,
+	ts *turnState,
+	exec *turnExecution,
+	content string,
+	contextUsage *bus.ContextUsage,
+) error {
+	if exec == nil || exec.streamingPublisher == nil {
+		return nil
+	}
+	publisher := exec.streamingPublisher
+	exec.streamingPublisher = nil
+	visibleBeforeFinalize := publisher.Published()
+	if err := publisher.Finalize(ctx, content, contextUsage); err != nil {
+		if visibleBeforeFinalize {
+			logger.WarnCF("agent", "stream final flush failed after visible output", map[string]any{
+				"agent_id": ts.agent.ID,
+				"channel":  ts.channel,
+				"model":    exec.llmModel,
+				"error":    err.Error(),
+			})
+			return configuredStreamingVisibleError{err: err}
+		}
+		publisher.Cancel(ctx)
+		logger.WarnCF("agent", "stream final flush failed", map[string]any{
+			"agent_id": ts.agent.ID,
+			"channel":  ts.channel,
+			"model":    exec.llmModel,
+			"error":    err.Error(),
+		})
+		return err
+	}
+	return nil
+}
+
+func cancelConfiguredStreamingLLM(ctx context.Context, exec *turnExecution) {
+	if exec == nil || exec.streamingPublisher == nil {
+		return
+	}
+	publisher := exec.streamingPublisher
+	exec.streamingPublisher = nil
+	publisher.Cancel(ctx)
+}
+
+func (p *Pipeline) configuredStreamingEligible(ts *turnState, exec *turnExecution) bool {
+	if p == nil || ts == nil || exec == nil || p.Bus == nil {
+		logger.DebugCF("agent", "configured streaming not used", map[string]any{
+			"reason": "missing_pipeline_state",
+		})
+		return false
+	}
+	if strings.TrimSpace(ts.channel) == "" || strings.TrimSpace(ts.chatID) == "" {
+		logger.DebugCF("agent", "configured streaming not used", map[string]any{
+			"agent_id": ts.agent.ID,
+			"channel":  ts.channel,
+			"chat_id":  ts.chatID,
+			"model":    exec.activeModel,
+			"reason":   "missing_channel_context",
+		})
+		return false
+	}
+	if !ts.opts.SendResponse && !ts.opts.AllowInterimPicoPublish {
+		logger.DebugCF("agent", "configured streaming not used", map[string]any{
+			"agent_id": ts.agent.ID,
+			"channel":  ts.channel,
+			"chat_id":  ts.chatID,
+			"model":    exec.activeModel,
+			"reason":   "turn_output_disabled",
+		})
+		return false
+	}
+	if len(exec.activeCandidates) != 1 {
+		logger.DebugCF("agent", "configured streaming not used", map[string]any{
+			"agent_id":   ts.agent.ID,
+			"channel":    ts.channel,
+			"model":      exec.activeModel,
+			"candidates": len(exec.activeCandidates),
+			"reason":     "fallback_candidates_enabled",
+		})
+		return false
+	}
+	if exec.activeModelConfig == nil || !exec.activeModelConfig.Streaming.Enabled {
+		modelName := ""
+		modelStreaming := false
+		if exec.activeModelConfig != nil {
+			modelName = exec.activeModelConfig.ModelName
+			modelStreaming = exec.activeModelConfig.Streaming.Enabled
+		}
+		logger.DebugCF("agent", "configured streaming not used", map[string]any{
+			"agent_id":         ts.agent.ID,
+			"channel":          ts.channel,
+			"model":            exec.activeModel,
+			"model_name":       modelName,
+			"model_streaming":  modelStreaming,
+			"has_model_config": exec.activeModelConfig != nil,
+			"reason":           "model_streaming_disabled",
+		})
+		return false
+	}
+	channelStreaming, ok := p.channelStreamingConfig(ts.channel)
+	if !ok || !channelStreaming.Enabled {
+		logger.DebugCF("agent", "configured streaming not used", map[string]any{
+			"agent_id":           ts.agent.ID,
+			"channel":            ts.channel,
+			"model":              exec.activeModel,
+			"channel_streaming":  channelStreaming.Enabled,
+			"has_channel_config": ok,
+			"reason":             "channel_streaming_disabled",
+		})
+		return false
+	}
+	return true
+}
+
+func (p *Pipeline) channelStreamingConfig(channelName string) (config.StreamingConfig, bool) {
+	if p == nil || p.Cfg == nil || p.Cfg.Channels == nil {
+		return config.StreamingConfig{}, false
+	}
+	ch := p.Cfg.Channels[channelName]
+	if ch == nil {
+		return config.StreamingConfig{}, false
+	}
+	decoded, err := ch.GetDecoded()
+	if err != nil {
+		logger.WarnCF("agent", "channel streaming config decode failed", map[string]any{
+			"channel": channelName,
+			"error":   err.Error(),
+		})
+		return config.StreamingConfig{}, false
+	}
+	return streamingConfigFromDecodedSettings(decoded)
+}
+
+func streamingConfigFromDecodedSettings(decoded any) (config.StreamingConfig, bool) {
+	value := reflect.ValueOf(decoded)
+	if !value.IsValid() {
+		return config.StreamingConfig{}, false
+	}
+	if value.Kind() == reflect.Ptr {
+		if value.IsNil() {
+			return config.StreamingConfig{}, false
+		}
+		value = value.Elem()
+	}
+	if value.Kind() != reflect.Struct {
+		return config.StreamingConfig{}, false
+	}
+
+	field := value.FieldByName("Streaming")
+	if !field.IsValid() || !field.CanInterface() {
+		return config.StreamingConfig{}, false
+	}
+	streaming, ok := field.Interface().(config.StreamingConfig)
+	return streaming, ok
+}
+
+type streamingChunkPublisher struct {
+	streamer           bus.Streamer
+	channel            string
+	chatID             string
+	published          bool
+	reasoningPublished bool
+	err                error
+}
+
+func (p *streamingChunkPublisher) Update(ctx context.Context, accumulated string) {
+	if p == nil || p.streamer == nil || strings.TrimSpace(accumulated) == "" {
+		return
+	}
+	if err := p.streamer.Update(ctx, accumulated); err != nil {
+		p.err = err
+		logger.WarnCF("agent", "stream update failed", map[string]any{
+			"channel": p.channel,
+			"chat_id": p.chatID,
+			"error":   err.Error(),
+		})
+		return
+	}
+	p.published = true
+}
+
+func (p *streamingChunkPublisher) UpdateReasoning(ctx context.Context, accumulated string) {
+	if p == nil || p.streamer == nil || strings.TrimSpace(accumulated) == "" {
+		return
+	}
+	reasoningStreamer, ok := p.streamer.(bus.ReasoningStreamer)
+	if !ok {
+		return
+	}
+	if err := reasoningStreamer.UpdateReasoning(ctx, accumulated); err != nil {
+		p.err = err
+		logger.WarnCF("agent", "stream reasoning update failed", map[string]any{
+			"channel": p.channel,
+			"chat_id": p.chatID,
+			"error":   err.Error(),
+		})
+		return
+	}
+	p.reasoningPublished = true
+}
+
+func (p *streamingChunkPublisher) Published() bool {
+	return p != nil && p.published
+}
+
+func (p *streamingChunkPublisher) ReasoningPublished() bool {
+	return p != nil && p.reasoningPublished
+}
+
+func (p *streamingChunkPublisher) Err() error {
+	if p == nil {
+		return nil
+	}
+	return p.err
+}
+
+func (p *streamingChunkPublisher) Finalize(ctx context.Context, content string, contextUsage *bus.ContextUsage) error {
+	if p == nil || p.streamer == nil {
+		return nil
+	}
+	if strings.TrimSpace(content) == "" && !p.published {
+		return nil
+	}
+	var err error
+	if streamer, ok := p.streamer.(bus.ContextUsageStreamer); ok {
+		err = streamer.FinalizeWithContext(ctx, content, contextUsage)
+	} else {
+		err = p.streamer.Finalize(ctx, content)
+	}
+	if err != nil {
+		return fmt.Errorf("stream finalize: %w", err)
+	}
+	p.published = true
+	return nil
+}
+
+func (p *streamingChunkPublisher) FinalizeReasoning(ctx context.Context, content string) error {
+	if p == nil || p.streamer == nil || !p.reasoningPublished || strings.TrimSpace(content) == "" {
+		return nil
+	}
+	reasoningStreamer, ok := p.streamer.(bus.ReasoningStreamer)
+	if !ok {
+		return nil
+	}
+	if err := reasoningStreamer.FinalizeReasoning(ctx, content); err != nil {
+		return fmt.Errorf("stream reasoning finalize: %w", err)
+	}
+	return nil
+}
+
+func (p *streamingChunkPublisher) ClearFinalizedStreamMarker() {
+	if p == nil || p.streamer == nil {
+		return
+	}
+	if cleaner, ok := p.streamer.(interface{ ClearFinalizedStreamMarker() }); ok {
+		cleaner.ClearFinalizedStreamMarker()
+	}
+}
+
+func (p *streamingChunkPublisher) Cancel(ctx context.Context) {
+	if p == nil || p.streamer == nil {
+		return
+	}
+	p.streamer.Cancel(ctx)
+}
@@ -11,6 +11,7 @@ import (
 	"time"

 	"github.com/sipeed/picoclaw/pkg/bus"
+	"github.com/sipeed/picoclaw/pkg/config"
 	"github.com/sipeed/picoclaw/pkg/logger"
 	"github.com/sipeed/picoclaw/pkg/providers"
 	"github.com/sipeed/picoclaw/pkg/session"
@@ -124,15 +125,18 @@ type turnExecution struct {
 	iteration int

 	// Per-iteration state set by Pipeline.PreLLM
-	activeCandidates []providers.FallbackCandidate
-	activeModel      string
-	activeProvider   providers.LLMProvider
-	usedLight        bool
+	activeCandidates  []providers.FallbackCandidate
+	activeModel       string
+	activeModelConfig *config.ModelConfig
+	activeProvider    providers.LLMProvider
+	usedLight         bool

 	// LLM call per-iteration state
 	response            *providers.LLMResponse
 	normalizedToolCalls []providers.ToolCall
 	allResponsesHandled bool
+	streamingPublisher  *streamingChunkPublisher
+	streamingFallback   bool
 	callMessages        []providers.Message
 	providerToolDefs    []providers.ToolDefinition
 	llmModel            string
@@ -26,7 +26,7 @@ const defaultBusBufferSize = 64
 type StreamDelegate interface {
 	// GetStreamer returns a Streamer for the given channel+chatID if the channel
 	// supports streaming. Returns nil, false if streaming is unavailable.
-	GetStreamer(ctx context.Context, channel, chatID string) (Streamer, bool)
+	GetStreamer(ctx context.Context, channel, chatID, sessionKey string) (Streamer, bool)
 }

 // Streamer pushes incremental content to a streaming-capable channel.
@@ -37,6 +37,20 @@ type Streamer interface {
 	Cancel(ctx context.Context)
 }

+// ContextUsageStreamer can attach final context-window usage metadata when a
+// streaming channel's final message replaces the normal outbound response.
+type ContextUsageStreamer interface {
+	Streamer
+	FinalizeWithContext(ctx context.Context, content string, usage *ContextUsage) error
+}
+
+// ReasoningStreamer can show incremental model reasoning/thought content
+// separately from the final user-visible answer stream.
+type ReasoningStreamer interface {
+	UpdateReasoning(ctx context.Context, content string) error
+	FinalizeReasoning(ctx context.Context, content string) error
+}
+
 type MessageBus struct {
 	inbound       chan InboundMessage
 	outbound      chan OutboundMessage
@@ -182,10 +196,10 @@ func (mb *MessageBus) SetEventPublisher(p EventPublisher) {
 	mb.eventPublisher.Store(p)
 }

-// GetStreamer returns a Streamer for the given channel+chatID via the delegate.
-func (mb *MessageBus) GetStreamer(ctx context.Context, channel, chatID string) (Streamer, bool) {
+// GetStreamer returns a Streamer for the given channel+chatID+session via the delegate.
+func (mb *MessageBus) GetStreamer(ctx context.Context, channel, chatID, sessionKey string) (Streamer, bool) {
 	if d, ok := mb.streamDelegate.Load().(StreamDelegate); ok && d != nil {
-		return d.GetStreamer(ctx, channel, chatID)
+		return d.GetStreamer(ctx, channel, chatID, sessionKey)
 	}
 	return nil, false
 }
@@ -41,6 +41,8 @@ const (
 	janitorInterval = 10 * time.Second
 	typingStopTTL   = 5 * time.Minute
 	placeholderTTL  = 10 * time.Minute
+
+	streamAuxiliaryTombstoneTTL = 30 * time.Second
 )

 // typingEntry wraps a typing stop function with a creation timestamp for TTL eviction.
@@ -82,22 +84,23 @@ type channelWorker struct {
 }

 type Manager struct {
-	channels      map[string]Channel
-	workers       map[string]*channelWorker
-	bus           *bus.MessageBus
-	runtimeEvents runtimeevents.Bus
-	config        *config.Config
-	mediaStore    media.MediaStore
-	dispatchTask  *asyncTask
-	mux           *dynamicServeMux
-	httpServer    *http.Server
-	httpListeners []net.Listener
-	mu            sync.RWMutex
-	placeholders  sync.Map          // "channel:chatID" → placeholderID (string)
-	typingStops   sync.Map          // "channel:chatID" → func()
-	reactionUndos sync.Map          // "channel:chatID" → reactionEntry
-	streamActive  sync.Map          // "channel:chatID" → true (set when streamer.Finalize sent the message)
-	channelHashes map[string]string // channel name → config hash
+	channels                  map[string]Channel
+	workers                   map[string]*channelWorker
+	bus                       *bus.MessageBus
+	runtimeEvents             runtimeevents.Bus
+	config                    *config.Config
+	mediaStore                media.MediaStore
+	dispatchTask              *asyncTask
+	mux                       *dynamicServeMux
+	httpServer                *http.Server
+	httpListeners             []net.Listener
+	mu                        sync.RWMutex
+	placeholders              sync.Map          // "channel:chatID" → placeholderID (string)
+	typingStops               sync.Map          // "channel:chatID" → func()
+	reactionUndos             sync.Map          // "channel:chatID" → reactionEntry
+	streamActive              sync.Map          // streamSuppressionKey → true (set when streamer.Finalize sent the message)
+	streamAuxiliaryTombstones sync.Map          // streamSuppressionKey → time.Time (drops late auxiliary messages after stream final)
+	channelHashes             map[string]string // channel name → config hash
 }

 type mediaStoreSetter interface {
@@ -166,6 +169,20 @@ func outboundMessageIsToolFeedback(msg bus.OutboundMessage) bool {
 	return strings.EqualFold(strings.TrimSpace(msg.Context.Raw["message_kind"]), "tool_feedback")
 }

+func outboundMessageHasAuxiliaryKind(msg bus.OutboundMessage) bool {
+	if len(msg.Context.Raw) == 0 {
+		return false
+	}
+	return strings.TrimSpace(msg.Context.Raw["message_kind"]) != ""
+}
+
+func outboundMessageIsFinal(msg bus.OutboundMessage) bool {
+	if len(msg.Context.Raw) == 0 {
+		return false
+	}
+	return strings.EqualFold(strings.TrimSpace(msg.Context.Raw["outbound_kind"]), "final")
+}
+
 func outboundMessageBypassesPlaceholderEdit(msg bus.OutboundMessage) bool {
 	if len(msg.Context.Raw) == 0 {
 		return false
@@ -182,6 +199,14 @@ func outboundMediaChatID(msg bus.OutboundMediaMessage) string {
 	return msg.ChatID
 }

+func streamSuppressionKey(channel, chatID, sessionKey string) string {
+	key := channel + ":" + chatID
+	if strings.TrimSpace(sessionKey) == "" {
+		return key
+	}
+	return key + ":" + sessionKey
+}
+
 func trackedToolFeedbackMessageChatID(ch Channel, chatID string, outboundCtx *bus.InboundContext) string {
 	if resolver, ok := ch.(toolFeedbackMessageTargetResolver); ok {
 		if resolved := strings.TrimSpace(resolver.ToolFeedbackMessageChatID(chatID, outboundCtx)); resolved != "" {
@@ -324,6 +349,7 @@ func (m *Manager) RecordReactionUndo(channel, chatID string, undo func()) {
 func (m *Manager) preSend(ctx context.Context, name string, msg bus.OutboundMessage, ch Channel) ([]string, bool) {
 	chatID := outboundMessageChatID(msg)
 	key := name + ":" + chatID
+	streamKey := streamSuppressionKey(name, chatID, msg.SessionKey)

 	// 1. Stop typing
 	if v, loaded := m.typingStops.LoadAndDelete(key); loaded {
@@ -340,38 +366,58 @@ func (m *Manager) preSend(ctx context.Context, name string, msg bus.OutboundMess
 	}

 	isToolFeedback := outboundMessageIsToolFeedback(msg)
+	isAuxiliaryMessage := outboundMessageHasAuxiliaryKind(msg)
+	isFinalMessage := outboundMessageIsFinal(msg)
 	separateToolFeedbackMessages := m.toolFeedbackSeparateMessagesEnabled()

-	// 3. If a stream already finalized this chat, stale tool feedback must be
-	// dropped without consuming the final-response marker. Streaming finalization
-	// bypasses the worker queue, so older queued feedback can arrive before the
-	// normal final outbound message that cleans up the marker and placeholder.
-	if isToolFeedback {
-		if _, loaded := m.streamActive.Load(key); loaded {
+	// 3. If a stream already finalized this chat, stale auxiliary messages must
+	// be dropped without consuming the final-response marker. Streaming
+	// finalization bypasses the worker queue, so older queued feedback/thoughts
+	// can arrive before the normal final outbound message that cleans up the
+	// marker and placeholder.
+	if isAuxiliaryMessage {
+		if _, loaded := m.streamActive.Load(streamKey); loaded {
+			return nil, true
+		}
+		if m.streamAuxiliaryTombstoneActive(streamKey) {
 			return nil, true
 		}
 	}

-	// 4. If a stream already finalized this message, delete the placeholder and skip send
-	if _, loaded := m.streamActive.LoadAndDelete(key); loaded {
-		if v, loaded := m.placeholders.LoadAndDelete(key); loaded {
-			if entry, ok := v.(placeholderEntry); ok && entry.id != "" {
-				// Prefer deleting the placeholder (cleaner UX than editing to same content)
-				if deleter, ok := ch.(MessageDeleter); ok {
-					deleter.DeleteMessage(ctx, chatID, entry.id) // best effort
-				} else if editor, ok := ch.(MessageEditor); ok {
-					editor.EditMessage(ctx, chatID, entry.id, msg.Content) // fallback
+	// 4. If a stream already finalized this turn, skip only the duplicate final
+	// outbound. Earlier queued visible messages must still be delivered.
+	if isFinalMessage {
+		if _, loaded := m.streamActive.LoadAndDelete(streamKey); loaded {
+			if v, loaded := m.placeholders.LoadAndDelete(key); loaded {
+				if entry, ok := v.(placeholderEntry); ok && entry.id != "" {
+					// Prefer deleting the placeholder (cleaner UX than editing to same content)
+					if deleter, ok := ch.(MessageDeleter); ok {
+						deleter.DeleteMessage(ctx, chatID, entry.id) // best effort
+					} else if editor, ok := ch.(MessageEditor); ok {
+						editor.EditMessage(ctx, chatID, entry.id, msg.Content) // fallback
+					}
 				}
 			}
-		}
-		if !isToolFeedback {
-			if separateToolFeedbackMessages {
-				clearTrackedToolFeedbackMessage(ch, chatID, &msg.Context)
-			} else {
-				dismissTrackedToolFeedbackMessage(ctx, ch, chatID, &msg.Context)
+			if !isToolFeedback {
+				if separateToolFeedbackMessages {
+					clearTrackedToolFeedbackMessage(ch, chatID, &msg.Context)
+				} else {
+					dismissTrackedToolFeedbackMessage(ctx, ch, chatID, &msg.Context)
+				}
 			}
+			return nil, true
 		}
-		return nil, true
+	}
+
+	if _, loaded := m.streamActive.Load(streamKey); loaded {
+		return nil, false
+	}
+	if m.streamActiveForChat(name, chatID) {
+		return nil, false
+	}
+
+	if !isAuxiliaryMessage {
+		m.streamAuxiliaryTombstones.Delete(streamKey)
 	}

 	if separateToolFeedbackMessages {
@@ -424,6 +470,7 @@ func (m *Manager) preSend(ctx context.Context, name string, msg bus.OutboundMess
 func (m *Manager) preSendMedia(ctx context.Context, name string, msg bus.OutboundMediaMessage, ch Channel) {
 	chatID := outboundMediaChatID(msg)
 	key := name + ":" + chatID
+	streamKey := streamSuppressionKey(name, chatID, msg.SessionKey)

 	// 1. Stop typing
 	if v, loaded := m.typingStops.LoadAndDelete(key); loaded {
@@ -439,8 +486,9 @@ func (m *Manager) preSendMedia(ctx context.Context, name string, msg bus.Outboun
 		}
 	}

-	// 3. Clear any finalized stream marker for this chat before media delivery.
-	m.streamActive.LoadAndDelete(key)
+	// 3. Clear any finalized stream markers for this chat before media delivery.
+	m.streamActive.LoadAndDelete(streamKey)
+	m.streamAuxiliaryTombstones.Delete(streamKey)

 	if m.toolFeedbackSeparateMessagesEnabled() {
 		clearTrackedToolFeedbackMessage(ch, chatID, &msg.Context)
@@ -507,7 +555,7 @@ func (m *Manager) SetMediaStore(store media.MediaStore) {

 // GetStreamer implements bus.StreamDelegate.
 // It checks if the named channel supports streaming and returns a Streamer.
-func (m *Manager) GetStreamer(ctx context.Context, channelName, chatID string) (bus.Streamer, bool) {
+func (m *Manager) GetStreamer(ctx context.Context, channelName, chatID, sessionKey string) (bus.Streamer, bool) {
 	m.mu.RLock()
 	ch, exists := m.channels[channelName]
 	m.mu.RUnlock()
@@ -531,51 +579,295 @@ func (m *Manager) GetStreamer(ctx context.Context, channelName, chatID string) (
 	}

 	// Mark streamActive on Finalize so preSend knows to clean up the placeholder
-	key := channelName + ":" + chatID
-	return &finalizeHookStreamer{
-		Streamer: streamer,
-		onFinalize: func(finalizeCtx context.Context) {
-			if m.toolFeedbackSeparateMessagesEnabled() {
-				clearTrackedToolFeedbackMessage(
-					ch,
-					chatID,
-					&bus.InboundContext{
-						Channel: channelName,
-						ChatID:  chatID,
-					},
-				)
-			} else {
-				dismissTrackedToolFeedbackMessage(
-					finalizeCtx,
-					ch,
-					chatID,
-					&bus.InboundContext{
-						Channel: channelName,
-						ChatID:  chatID,
-					},
-				)
+	// and late auxiliary messages cannot leak after streaming produced a final.
+	streamKey := streamSuppressionKey(channelName, chatID, sessionKey)
+	placeholderKey := channelName + ":" + chatID
+	clearMarker := func() {
+		m.streamActive.Delete(streamKey)
+	}
+	onFinalize := func(finalizeCtx context.Context, finalContent string) {
+		if m.toolFeedbackSeparateMessagesEnabled() {
+			clearTrackedToolFeedbackMessage(
+				ch,
+				chatID,
+				&bus.InboundContext{
+					Channel: channelName,
+					ChatID:  chatID,
+				},
+			)
+		} else {
+			dismissTrackedToolFeedbackMessage(
+				finalizeCtx,
+				ch,
+				chatID,
+				&bus.InboundContext{
+					Channel: channelName,
+					ChatID:  chatID,
+				},
+			)
+		}
+		if v, loaded := m.placeholders.LoadAndDelete(placeholderKey); loaded {
+			if entry, ok := v.(placeholderEntry); ok && entry.id != "" {
+				if deleter, ok := ch.(MessageDeleter); ok {
+					deleter.DeleteMessage(finalizeCtx, chatID, entry.id) // best effort
+				} else if editor, ok := ch.(MessageEditor); ok {
+					editor.EditMessage(finalizeCtx, chatID, entry.id, finalContent) // best effort fallback
+				}
 			}
-			m.streamActive.Store(key, true)
-		},
+		}
+		m.streamActive.Store(streamKey, true)
+		m.streamAuxiliaryTombstones.Store(streamKey, time.Now())
+	}
+
+	if m.config != nil && m.config.Agents.Defaults.SplitOnMarker {
+		return &splitMarkerStreamer{
+			current:     streamer,
+			reasoning:   reasoningStreamerFrom(streamer),
+			begin:       func(beginCtx context.Context) (bus.Streamer, error) { return sc.BeginStream(beginCtx, chatID) },
+			onFinalize:  onFinalize,
+			clearMarker: clearMarker,
+		}, true
+	}
+
+	return &finalizeHookStreamer{
+		Streamer:    streamer,
+		clearMarker: clearMarker,
+		onFinalize:  onFinalize,
 	}, true
 }

+func reasoningStreamerFrom(streamer bus.Streamer) bus.ReasoningStreamer {
+	if reasoningStreamer, ok := streamer.(bus.ReasoningStreamer); ok {
+		return reasoningStreamer
+	}
+	return nil
+}
+
+// splitMarkerStreamer turns accumulated streaming text containing
+// MessageSplitMarker into separate channel stream messages.
+type splitMarkerStreamer struct {
+	mu             sync.Mutex
+	current        bus.Streamer
+	reasoning      bus.ReasoningStreamer
+	begin          func(context.Context) (bus.Streamer, error)
+	completedParts int
+	finalized      bool
+	onFinalize     func(context.Context, string)
+	clearMarker    func()
+}
+
+func (s *splitMarkerStreamer) Update(ctx context.Context, content string) error {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	return s.updateLocked(ctx, content)
+}
+
+func (s *splitMarkerStreamer) Finalize(ctx context.Context, content string) error {
+	return s.FinalizeWithContext(ctx, content, nil)
+}
+
+func (s *splitMarkerStreamer) FinalizeWithContext(ctx context.Context, content string, usage *bus.ContextUsage) error {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	if err := s.finalizeLocked(ctx, content, usage); err != nil {
+		return err
+	}
+	s.runFinalizeHook(ctx, content)
+	return nil
+}
+
+func (s *splitMarkerStreamer) UpdateReasoning(ctx context.Context, content string) error {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	if s.reasoning == nil {
+		return nil
+	}
+	return s.reasoning.UpdateReasoning(ctx, content)
+}
+
+func (s *splitMarkerStreamer) FinalizeReasoning(ctx context.Context, content string) error {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	if s.reasoning == nil {
+		return nil
+	}
+	return s.reasoning.FinalizeReasoning(ctx, content)
+}
+
+func (s *splitMarkerStreamer) Cancel(ctx context.Context) {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	if s.current != nil {
+		s.current.Cancel(ctx)
+	}
+}
+
+func (s *splitMarkerStreamer) ClearFinalizedStreamMarker() {
+	if s.clearMarker != nil {
+		s.clearMarker()
+	}
+}
+
+func (s *splitMarkerStreamer) updateLocked(ctx context.Context, content string) error {
+	parts := strings.Split(content, MessageSplitMarker)
+	completedLimit := len(parts) - 1
+	if err := s.finalizeCompletedPartsLocked(ctx, parts, completedLimit, nil); err != nil {
+		return err
+	}
+	active := strings.TrimSpace(parts[len(parts)-1])
+	if active == "" {
+		return nil
+	}
+	if err := s.ensureCurrentLocked(ctx); err != nil {
+		return err
+	}
+	return s.current.Update(ctx, active)
+}
+
+func (s *splitMarkerStreamer) finalizeLocked(ctx context.Context, content string, usage *bus.ContextUsage) error {
+	parts := strings.Split(content, MessageSplitMarker)
+	return s.finalizeCompletedPartsLocked(ctx, parts, len(parts), usage)
+}
+
+func (s *splitMarkerStreamer) finalizeCompletedPartsLocked(
+	ctx context.Context,
+	parts []string,
+	limit int,
+	usage *bus.ContextUsage,
+) error {
+	for s.completedParts < limit {
+		content := strings.TrimSpace(parts[s.completedParts])
+		isLast := s.completedParts == limit-1
+		if content != "" {
+			if err := s.ensureCurrentLocked(ctx); err != nil {
+				return err
+			}
+			if isLast && usage != nil {
+				if contextStreamer, ok := s.current.(bus.ContextUsageStreamer); ok {
+					if err := contextStreamer.FinalizeWithContext(ctx, content, usage); err != nil {
+						return err
+					}
+				} else if err := s.current.Finalize(ctx, content); err != nil {
+					return err
+				}
+			} else if err := s.current.Finalize(ctx, content); err != nil {
+				return err
+			}
+			s.current = nil
+		}
+		s.completedParts++
+	}
+	return nil
+}
+
+func (s *splitMarkerStreamer) ensureCurrentLocked(ctx context.Context) error {
+	if s.current != nil {
+		return nil
+	}
+	if s.begin == nil {
+		return fmt.Errorf("streamer is not initialized")
+	}
+	streamer, err := s.begin(ctx)
+	if err != nil {
+		return err
+	}
+	s.current = streamer
+	return nil
+}
+
+func (s *splitMarkerStreamer) runFinalizeHook(ctx context.Context, content string) {
+	if s.finalized {
+		return
+	}
+	s.finalized = true
+	if s.onFinalize != nil {
+		s.onFinalize(ctx, content)
+	}
+}
+
+func (m *Manager) streamAuxiliaryTombstoneActive(key string) bool {
+	v, ok := m.streamAuxiliaryTombstones.Load(key)
+	if !ok {
+		return false
+	}
+	createdAt, ok := v.(time.Time)
+	if !ok || time.Since(createdAt) > streamAuxiliaryTombstoneTTL {
+		m.streamAuxiliaryTombstones.Delete(key)
+		return false
+	}
+	return true
+}
+
+func (m *Manager) streamActiveForChat(channel, chatID string) bool {
+	chatKey := streamSuppressionKey(channel, chatID, "")
+	found := false
+	m.streamActive.Range(func(key, _ any) bool {
+		keyString, ok := key.(string)
+		if !ok {
+			return true
+		}
+		if keyString == chatKey || strings.HasPrefix(keyString, chatKey+":") {
+			found = true
+			return false
+		}
+		return true
+	})
+	return found
+}
+
 // finalizeHookStreamer wraps a Streamer to run a hook on Finalize.
 type finalizeHookStreamer struct {
 	Streamer
-	onFinalize func(context.Context)
+	onFinalize  func(context.Context, string)
+	clearMarker func()
 }

 func (s *finalizeHookStreamer) Finalize(ctx context.Context, content string) error {
 	if err := s.Streamer.Finalize(ctx, content); err != nil {
 		return err
 	}
-	if s.onFinalize != nil {
-		s.onFinalize(ctx)
+	s.runFinalizeHook(ctx, content)
+	return nil
+}
+
+func (s *finalizeHookStreamer) FinalizeWithContext(ctx context.Context, content string, usage *bus.ContextUsage) error {
+	if streamer, ok := s.Streamer.(bus.ContextUsageStreamer); ok {
+		if err := streamer.FinalizeWithContext(ctx, content, usage); err != nil {
+			return err
+		}
+	} else if err := s.Streamer.Finalize(ctx, content); err != nil {
+		return err
+	}
+	s.runFinalizeHook(ctx, content)
+	return nil
+}
+
+func (s *finalizeHookStreamer) UpdateReasoning(ctx context.Context, content string) error {
+	if streamer, ok := s.Streamer.(bus.ReasoningStreamer); ok {
+		return streamer.UpdateReasoning(ctx, content)
 	}
 	return nil
 }

+func (s *finalizeHookStreamer) FinalizeReasoning(ctx context.Context, content string) error {
+	if streamer, ok := s.Streamer.(bus.ReasoningStreamer); ok {
+		return streamer.FinalizeReasoning(ctx, content)
+	}
+	return nil
+}
+
+func (s *finalizeHookStreamer) runFinalizeHook(ctx context.Context, content string) {
+	if s.onFinalize != nil {
+		s.onFinalize(ctx, content)
+	}
+}
+
+func (s *finalizeHookStreamer) ClearFinalizedStreamMarker() {
+	if s.clearMarker != nil {
+		s.clearMarker()
+	}
+}
+
 // initChannel is a helper that looks up a factory by type name and creates the channel.
 // typeName is the channel type used for factory lookup (e.g., "telegram").
 // channelName is the config map key used as the channel's runtime name (e.g., "my_telegram").
@@ -1063,7 +1355,11 @@ func (m *Manager) runWorker(ctx context.Context, name string, w *channelWorker)

 			// Step 1: Try marker-based splitting if enabled.
 			// Tool feedback must stay a single message, so it skips marker splitting.
-			if m.config != nil && m.config.Agents.Defaults.SplitOnMarker && !outboundMessageIsToolFeedback(msg) {
+			// Stream-final duplicate responses must also stay intact so preSend can
+			// consume the whole final message before any marker chunk leaks.
+			if m.finalizedStreamActiveForMessage(name, msg) {
+				chunks = []string{msg.Content}
+			} else if m.config != nil && m.config.Agents.Defaults.SplitOnMarker && !outboundMessageIsToolFeedback(msg) {
 				if markerChunks := SplitByMarker(msg.Content); len(markerChunks) > 1 {
 					for _, chunk := range markerChunks {
 						chunkMsg := msg
@@ -1090,6 +1386,18 @@ func (m *Manager) runWorker(ctx context.Context, name string, w *channelWorker)
 	}
 }

+func (m *Manager) finalizedStreamActiveForMessage(channelName string, msg bus.OutboundMessage) bool {
+	if m == nil || !outboundMessageIsFinal(msg) {
+		return false
+	}
+	chatID := outboundMessageChatID(msg)
+	if strings.TrimSpace(channelName) == "" || strings.TrimSpace(chatID) == "" {
+		return false
+	}
+	_, active := m.streamActive.Load(streamSuppressionKey(channelName, chatID, msg.SessionKey))
+	return active
+}
+
 // splitOutboundMessageContent splits regular outbound content by maxLen, but
 // keeps tool feedback in a single message by truncating the explanation body.
 func splitOutboundMessageContent(msg bus.OutboundMessage, maxLen int) []string {
@@ -1389,9 +1697,9 @@ func (m *Manager) sendMediaWithRetry(
 	return nil, lastErr
 }

-// runTTLJanitor periodically scans the typingStops and placeholders maps
-// and evicts entries that have exceeded their TTL. This prevents memory
-// accumulation when outbound paths fail to trigger preSend (e.g. LLM errors).
+// runTTLJanitor periodically scans the typingStops, placeholders, and stream
+// tombstone maps and evicts entries that have exceeded their TTL. This prevents
+// memory accumulation when outbound paths fail to trigger preSend (e.g. LLM errors).
 func (m *Manager) runTTLJanitor(ctx context.Context) {
 	ticker := time.NewTicker(janitorInterval)
 	defer ticker.Stop()
@@ -1429,6 +1737,12 @@ func (m *Manager) runTTLJanitor(ctx context.Context) {
 				}
 				return true
 			})
+			m.streamAuxiliaryTombstones.Range(func(key, value any) bool {
+				if createdAt, ok := value.(time.Time); !ok || now.Sub(createdAt) > streamAuxiliaryTombstoneTTL {
+					m.streamAuxiliaryTombstones.Delete(key)
+				}
+				return true
+			})
 		}
 	}
 }
@@ -104,7 +104,8 @@ func (m *mockDeletingMediaChannel) DismissToolFeedbackMessage(_ context.Context,
 }

 type mockStreamer struct {
-	finalizeFn func(context.Context, string) error
+	finalizeFn            func(context.Context, string) error
+	finalizeWithContextFn func(context.Context, string, *bus.ContextUsage) error
 }

 func (m *mockStreamer) Update(context.Context, string) error { return nil }
@@ -116,15 +117,68 @@ func (m *mockStreamer) Finalize(ctx context.Context, content string) error {
 	return nil
 }

+func (m *mockStreamer) FinalizeWithContext(ctx context.Context, content string, usage *bus.ContextUsage) error {
+	if m.finalizeWithContextFn != nil {
+		return m.finalizeWithContextFn(ctx, content, usage)
+	}
+	return m.Finalize(ctx, content)
+}
+
 func (m *mockStreamer) Cancel(context.Context) {}

+type mockReasoningStreamer struct {
+	mockStreamer
+	reasoningUpdates []string
+	reasoningFinal   string
+}
+
+func (m *mockReasoningStreamer) UpdateReasoning(_ context.Context, content string) error {
+	m.reasoningUpdates = append(m.reasoningUpdates, content)
+	return nil
+}
+
+func (m *mockReasoningStreamer) FinalizeReasoning(_ context.Context, content string) error {
+	m.reasoningFinal = content
+	return nil
+}
+
+type recordingStreamSegment struct {
+	updates       []string
+	finals        []string
+	finalUsage    *bus.ContextUsage
+	canceledCount int
+}
+
+func (s *recordingStreamSegment) Update(_ context.Context, content string) error {
+	s.updates = append(s.updates, content)
+	return nil
+}
+
+func (s *recordingStreamSegment) Finalize(ctx context.Context, content string) error {
+	return s.FinalizeWithContext(ctx, content, nil)
+}
+
+func (s *recordingStreamSegment) FinalizeWithContext(_ context.Context, content string, usage *bus.ContextUsage) error {
+	s.finals = append(s.finals, content)
+	s.finalUsage = usage
+	return nil
+}
+
+func (s *recordingStreamSegment) Cancel(context.Context) {
+	s.canceledCount++
+}
+
 type mockStreamingChannel struct {
 	mockMessageEditor
 	streamer        Streamer
+	beginStreamFn   func(context.Context, string) (Streamer, error)
 	resolveChatIDFn func(chatID string, outboundCtx *bus.InboundContext) string
 }

-func (m *mockStreamingChannel) BeginStream(context.Context, string) (Streamer, error) {
+func (m *mockStreamingChannel) BeginStream(ctx context.Context, chatID string) (Streamer, error) {
+	if m.beginStreamFn != nil {
+		return m.beginStreamFn(ctx, chatID)
+	}
 	if m.streamer == nil {
 		return nil, errors.New("missing streamer")
 	}
@@ -1476,6 +1530,9 @@ func TestPreSend_StaleToolFeedbackDoesNotConsumeStreamActiveMarker(t *testing.T)
 		Context: bus.InboundContext{
 			Channel: "test",
 			ChatID:  "123",
+			Raw: map[string]string{
+				"outbound_kind": "final",
+			},
 		},
 	})

@@ -1494,6 +1551,206 @@ func TestPreSend_StaleToolFeedbackDoesNotConsumeStreamActiveMarker(t *testing.T)
 	}
 }

+func TestPreSend_StaleThoughtDoesNotConsumeStreamActiveMarker(t *testing.T) {
+	m := newTestManager()
+	m.streamActive.Store("test:123", true)
+	m.streamAuxiliaryTombstones.Store("test:123", time.Now())
+	m.RecordPlaceholder("test", "123", "placeholder-1")
+
+	var editedContent string
+	ch := &mockMessageEditor{
+		editFn: func(_ context.Context, chatID, messageID, content string) error {
+			if chatID != "123" || messageID != "placeholder-1" {
+				t.Fatalf("unexpected edit target: %s/%s", chatID, messageID)
+			}
+			editedContent = content
+			return nil
+		},
+	}
+
+	thought := testOutboundMessage(bus.OutboundMessage{
+		Channel: "test",
+		ChatID:  "123",
+		Content: "late reasoning",
+		Context: bus.InboundContext{
+			Channel: "test",
+			ChatID:  "123",
+			Raw: map[string]string{
+				"message_kind": "thought",
+			},
+		},
+	})
+
+	msgIDs, handled := m.preSend(context.Background(), "test", thought, ch)
+	if !handled {
+		t.Fatal("expected stale thought to be dropped after stream finalize")
+	}
+	if len(msgIDs) != 0 {
+		t.Fatalf("expected no delivered message IDs for stale thought, got %v", msgIDs)
+	}
+	if _, ok := m.streamActive.Load("test:123"); !ok {
+		t.Fatal("expected streamActive marker to remain for the final outbound message")
+	}
+	if _, ok := m.placeholders.Load("test:123"); !ok {
+		t.Fatal("expected placeholder cleanup to remain deferred to the final outbound message")
+	}
+	if ch.editedMessages != 0 {
+		t.Fatalf("expected no placeholder edit for stale thought, got %d edits", ch.editedMessages)
+	}
+
+	finalMsg := testOutboundMessage(bus.OutboundMessage{
+		Channel: "test",
+		ChatID:  "123",
+		Content: "final streamed reply",
+		Context: bus.InboundContext{
+			Channel: "test",
+			ChatID:  "123",
+			Raw: map[string]string{
+				"outbound_kind": "final",
+			},
+		},
+	})
+
+	_, handled = m.preSend(context.Background(), "test", finalMsg, ch)
+	if !handled {
+		t.Fatal("expected final outbound message to consume streamActive marker")
+	}
+	if _, ok := m.streamActive.Load("test:123"); ok {
+		t.Fatal("expected streamActive marker to be cleared by final outbound message")
+	}
+	if _, ok := m.placeholders.Load("test:123"); ok {
+		t.Fatal("expected placeholder to be cleaned up by final outbound message")
+	}
+	if editedContent != "final streamed reply" {
+		t.Fatalf("editedContent = %q, want final streamed reply", editedContent)
+	}
+
+	lateThought := testOutboundMessage(bus.OutboundMessage{
+		Channel: "test",
+		ChatID:  "123",
+		Content: "later reasoning",
+		Context: bus.InboundContext{
+			Channel: "test",
+			ChatID:  "123",
+			Raw: map[string]string{
+				"message_kind": "thought",
+			},
+		},
+	})
+	msgIDs, handled = m.preSend(context.Background(), "test", lateThought, ch)
+	if !handled {
+		t.Fatal("expected tombstone to drop late thought after final outbound was suppressed")
+	}
+	if len(msgIDs) != 0 {
+		t.Fatalf("expected no delivered message IDs for late thought, got %v", msgIDs)
+	}
+}
+
+func TestPreSend_StreamActiveDoesNotConsumeEarlierVisibleMessage(t *testing.T) {
+	m := newTestManager()
+	m.streamActive.Store("test:123", true)
+	m.streamAuxiliaryTombstones.Store("test:123", time.Now())
+	m.RecordPlaceholder("test", "123", "placeholder-1")
+
+	editCalls := 0
+	ch := &mockMessageEditor{
+		editFn: func(_ context.Context, chatID, messageID, content string) error {
+			editCalls++
+			if chatID != "123" || messageID != "placeholder-1" || content != "final streamed reply" {
+				t.Fatalf("unexpected placeholder edit for %s/%s: %q", chatID, messageID, content)
+			}
+			return nil
+		},
+	}
+
+	earlierVisible := testOutboundMessage(bus.OutboundMessage{
+		Channel: "test",
+		ChatID:  "123",
+		Content: "earlier visible message",
+		Context: bus.InboundContext{
+			Channel: "test",
+			ChatID:  "123",
+		},
+	})
+	_, handled := m.preSend(context.Background(), "test", earlierVisible, ch)
+	if handled {
+		t.Fatal("expected earlier visible message to be delivered normally")
+	}
+	if editCalls != 0 {
+		t.Fatalf("placeholder edits after earlier visible message = %d, want 0", editCalls)
+	}
+	if _, ok := m.streamActive.Load("test:123"); !ok {
+		t.Fatal("expected streamActive marker to remain for final outbound")
+	}
+	if _, ok := m.streamAuxiliaryTombstones.Load("test:123"); !ok {
+		t.Fatal("expected auxiliary tombstone to remain")
+	}
+	if _, ok := m.placeholders.Load("test:123"); !ok {
+		t.Fatal("expected placeholder cleanup to remain deferred to final outbound")
+	}
+
+	finalMsg := testOutboundMessage(bus.OutboundMessage{
+		Channel: "test",
+		ChatID:  "123",
+		Content: "final streamed reply",
+		Context: bus.InboundContext{
+			Channel: "test",
+			ChatID:  "123",
+			Raw: map[string]string{
+				"outbound_kind": "final",
+			},
+		},
+	})
+	_, handled = m.preSend(context.Background(), "test", finalMsg, ch)
+	if !handled {
+		t.Fatal("expected final outbound message to consume streamActive marker")
+	}
+	if _, ok := m.streamActive.Load("test:123"); ok {
+		t.Fatal("expected streamActive marker to be cleared by final outbound message")
+	}
+	if editCalls != 1 {
+		t.Fatalf("placeholder edits after final outbound = %d, want 1", editCalls)
+	}
+}
+
+func TestPreSend_StreamActiveDoesNotConsumeOtherSessionFinal(t *testing.T) {
+	m := newTestManager()
+	m.streamActive.Store("test:123", true)
+	m.RecordPlaceholder("test", "123", "placeholder-1")
+
+	ch := &mockMessageEditor{
+		editFn: func(_ context.Context, _, _, _ string) error {
+			t.Fatal("placeholder edit should remain deferred for the streaming session")
+			return nil
+		},
+	}
+
+	otherSessionFinal := testOutboundMessage(bus.OutboundMessage{
+		Channel:    "test",
+		ChatID:     "123",
+		SessionKey: "session-other",
+		Content:    "other session final",
+		Context: bus.InboundContext{
+			Channel: "test",
+			ChatID:  "123",
+			Raw: map[string]string{
+				"outbound_kind": "final",
+			},
+		},
+	})
+
+	_, handled := m.preSend(context.Background(), "test", otherSessionFinal, ch)
+	if handled {
+		t.Fatal("expected final outbound from a different session to be delivered normally")
+	}
+	if _, ok := m.streamActive.Load("test:123"); !ok {
+		t.Fatal("expected streaming marker to remain for the streaming session")
+	}
+	if _, ok := m.placeholders.Load("test:123"); !ok {
+		t.Fatal("expected placeholder cleanup to remain deferred to the streaming session")
+	}
+}
+
 func TestPreSendMedia_LeavesTrackedMessageForChannelSend(t *testing.T) {
 	m := newTestManager()
 	ch := &mockDeletingMediaChannel{}
@@ -1610,7 +1867,7 @@ func TestGetStreamer_FinalizeDismissesTrackedToolFeedback(t *testing.T) {
 	}
 	m.channels["test"] = ch

-	streamer, ok := m.GetStreamer(context.Background(), "test", "123")
+	streamer, ok := m.GetStreamer(context.Background(), "test", "123", "")
 	if !ok {
 		t.Fatal("expected streamer to be available")
 	}
@@ -1625,6 +1882,312 @@ func TestGetStreamer_FinalizeDismissesTrackedToolFeedback(t *testing.T) {
 	}
 }

+func TestGetStreamer_FinalizeCleansPlaceholderImmediately(t *testing.T) {
+	m := newTestManager()
+	m.RecordPlaceholder("test", "123", "placeholder-1")
+	var editedContent string
+	editCalls := 0
+	ch := &mockStreamingChannel{
+		mockMessageEditor: mockMessageEditor{
+			editFn: func(_ context.Context, chatID, messageID, content string) error {
+				if chatID != "123" || messageID != "placeholder-1" {
+					t.Fatalf("unexpected edit target: %s/%s", chatID, messageID)
+				}
+				editCalls++
+				editedContent = content
+				return nil
+			},
+		},
+		streamer: &mockStreamer{},
+	}
+	m.channels["test"] = ch
+
+	streamer, ok := m.GetStreamer(context.Background(), "test", "123", "")
+	if !ok {
+		t.Fatal("expected streamer to be available")
+	}
+	if err := streamer.Finalize(context.Background(), "final reply"); err != nil {
+		t.Fatalf("Finalize() error = %v", err)
+	}
+	if editedContent != "final reply" {
+		t.Fatalf("edited placeholder content = %q, want final reply", editedContent)
+	}
+	if _, placeholderExists := m.placeholders.Load("test:123"); placeholderExists {
+		t.Fatal("expected placeholder to be cleaned up during finalize")
+	}
+	if _, streamActiveExists := m.streamActive.Load("test:123"); !streamActiveExists {
+		t.Fatal("expected streamActive marker to be recorded after finalize")
+	}
+	cleaner, ok := streamer.(interface{ ClearFinalizedStreamMarker() })
+	if !ok {
+		t.Fatal("expected streamer to expose marker cleanup")
+	}
+	cleaner.ClearFinalizedStreamMarker()
+	if _, streamActiveExists := m.streamActive.Load("test:123"); streamActiveExists {
+		t.Fatal("expected streamActive marker to be cleared")
+	}
+	if _, ok := m.streamAuxiliaryTombstones.Load("test:123"); !ok {
+		t.Fatal("expected auxiliary tombstone to remain after final marker cleanup")
+	}
+
+	lateThought := testOutboundMessage(bus.OutboundMessage{
+		Channel: "test",
+		ChatID:  "123",
+		Content: "late reasoning",
+		Context: bus.InboundContext{
+			Channel: "test",
+			ChatID:  "123",
+			Raw: map[string]string{
+				"message_kind": "thought",
+			},
+		},
+	})
+	msgIDs, handled := m.preSend(context.Background(), "test", lateThought, ch)
+	if !handled {
+		t.Fatal("expected auxiliary tombstone to drop late thought")
+	}
+	if len(msgIDs) != 0 {
+		t.Fatalf("expected no delivered message IDs for late thought, got %v", msgIDs)
+	}
+	if editCalls != 1 {
+		t.Fatalf("expected late thought not to edit placeholder, got %d edits", editCalls)
+	}
+
+	finalOutbound := testOutboundMessage(bus.OutboundMessage{
+		Channel: "test",
+		ChatID:  "123",
+		Content: "visible final reply",
+		Context: bus.InboundContext{
+			Channel: "test",
+			ChatID:  "123",
+		},
+	})
+	_, handled = m.preSend(context.Background(), "test", finalOutbound, ch)
+	if handled {
+		t.Fatal("expected cleared final marker to let normal outbound send")
+	}
+	if _, ok := m.streamAuxiliaryTombstones.Load("test:123"); ok {
+		t.Fatal("expected normal outbound to clear auxiliary tombstone")
+	}
+}
+
+func TestGetStreamer_FinalizeCleansPlaceholderWithSessionKey(t *testing.T) {
+	m := newTestManager()
+	m.RecordPlaceholder("test", "123", "placeholder-1")
+	ch := &mockStreamingChannel{
+		mockMessageEditor: mockMessageEditor{
+			editFn: func(_ context.Context, chatID, messageID, content string) error {
+				if chatID != "123" || messageID != "placeholder-1" || content != "final reply" {
+					t.Fatalf("unexpected edit for %s/%s: %q", chatID, messageID, content)
+				}
+				return nil
+			},
+		},
+		streamer: &mockStreamer{},
+	}
+	m.channels["test"] = ch
+
+	streamer, ok := m.GetStreamer(context.Background(), "test", "123", "session-1")
+	if !ok {
+		t.Fatal("expected streamer to be available")
+	}
+	if err := streamer.Finalize(context.Background(), "final reply"); err != nil {
+		t.Fatalf("Finalize() error = %v", err)
+	}
+	if _, placeholderExists := m.placeholders.Load("test:123"); placeholderExists {
+		t.Fatal("expected placeholder to be cleaned up during finalize")
+	}
+	if _, streamActiveExists := m.streamActive.Load("test:123:session-1"); !streamActiveExists {
+		t.Fatal("expected session streamActive marker to be recorded after finalize")
+	}
+}
+
+func TestGetStreamer_PreservesContextUsageStreamer(t *testing.T) {
+	m := newTestManager()
+	var gotUsage *bus.ContextUsage
+	ch := &mockStreamingChannel{
+		streamer: &mockStreamer{
+			finalizeWithContextFn: func(_ context.Context, content string, usage *bus.ContextUsage) error {
+				if content != "final reply" {
+					t.Fatalf("unexpected finalize content: %q", content)
+				}
+				gotUsage = usage
+				return nil
+			},
+		},
+	}
+	m.channels["test"] = ch
+
+	streamer, ok := m.GetStreamer(context.Background(), "test", "123", "")
+	if !ok {
+		t.Fatal("expected streamer to be available")
+	}
+	contextStreamer, ok := streamer.(bus.ContextUsageStreamer)
+	if !ok {
+		t.Fatal("manager-wrapped streamer should preserve ContextUsageStreamer")
+	}
+	usage := &bus.ContextUsage{UsedTokens: 10, TotalTokens: 100, CompressAtTokens: 80, UsedPercent: 10}
+	if err := contextStreamer.FinalizeWithContext(context.Background(), "final reply", usage); err != nil {
+		t.Fatalf("FinalizeWithContext() error = %v", err)
+	}
+	if gotUsage != usage {
+		t.Fatalf("context usage = %#v, want original usage", gotUsage)
+	}
+	if _, ok := m.streamActive.Load("test:123"); !ok {
+		t.Fatal("expected streamActive marker to be recorded after finalize with context")
+	}
+}
+
+func TestGetStreamer_PreservesReasoningStreamer(t *testing.T) {
+	m := newTestManager()
+	inner := &mockReasoningStreamer{}
+	ch := &mockStreamingChannel{
+		streamer: inner,
+	}
+	m.channels["test"] = ch
+
+	streamer, ok := m.GetStreamer(context.Background(), "test", "123", "")
+	if !ok {
+		t.Fatal("expected streamer to be available")
+	}
+	reasoningStreamer, ok := streamer.(bus.ReasoningStreamer)
+	if !ok {
+		t.Fatal("manager-wrapped streamer should preserve ReasoningStreamer")
+	}
+	if err := reasoningStreamer.UpdateReasoning(context.Background(), "thinking"); err != nil {
+		t.Fatalf("UpdateReasoning() error = %v", err)
+	}
+	if err := reasoningStreamer.FinalizeReasoning(context.Background(), "final thought"); err != nil {
+		t.Fatalf("FinalizeReasoning() error = %v", err)
+	}
+	if got := inner.reasoningUpdates; len(got) != 1 || got[0] != "thinking" {
+		t.Fatalf("reasoning updates = %v, want [thinking]", got)
+	}
+	if inner.reasoningFinal != "final thought" {
+		t.Fatalf("reasoning final = %q, want final thought", inner.reasoningFinal)
+	}
+}
+
+func TestGetStreamer_SplitOnMarkerStreamsSeparateSegments(t *testing.T) {
+	m := newTestManager()
+	m.config = &config.Config{
+		Agents: config.AgentsConfig{
+			Defaults: config.AgentDefaults{
+				SplitOnMarker: true,
+			},
+		},
+	}
+
+	var segments []*recordingStreamSegment
+	ch := &mockStreamingChannel{
+		beginStreamFn: func(context.Context, string) (Streamer, error) {
+			segment := &recordingStreamSegment{}
+			segments = append(segments, segment)
+			return segment, nil
+		},
+	}
+	m.channels["test"] = ch
+
+	streamer, ok := m.GetStreamer(context.Background(), "test", "123", "session-1")
+	if !ok {
+		t.Fatal("expected streamer to be available")
+	}
+	contextStreamer, ok := streamer.(bus.ContextUsageStreamer)
+	if !ok {
+		t.Fatal("split streamer should preserve ContextUsageStreamer")
+	}
+
+	if err := streamer.Update(context.Background(), "hello"); err != nil {
+		t.Fatalf("Update(first) error = %v", err)
+	}
+	if err := streamer.Update(context.Background(), "hello<|[SPLIT]|>world"); err != nil {
+		t.Fatalf("Update(split) error = %v", err)
+	}
+	if err := streamer.Update(context.Background(), "hello<|[SPLIT]|>world!"); err != nil {
+		t.Fatalf("Update(second segment) error = %v", err)
+	}
+	usage := &bus.ContextUsage{UsedTokens: 10, TotalTokens: 100}
+	if err := contextStreamer.FinalizeWithContext(
+		context.Background(),
+		"hello<|[SPLIT]|>world!",
+		usage,
+	); err != nil {
+		t.Fatalf("FinalizeWithContext() error = %v", err)
+	}
+
+	if len(segments) != 2 {
+		t.Fatalf("segments = %d, want 2", len(segments))
+	}
+	if got := segments[0].updates; len(got) != 1 || got[0] != "hello" {
+		t.Fatalf("segment 0 updates = %v, want [hello]", got)
+	}
+	if got := segments[0].finals; len(got) != 1 || got[0] != "hello" {
+		t.Fatalf("segment 0 finals = %v, want [hello]", got)
+	}
+	if got := segments[1].updates; len(got) != 2 || got[0] != "world" || got[1] != "world!" {
+		t.Fatalf("segment 1 updates = %v, want [world world!]", got)
+	}
+	if got := segments[1].finals; len(got) != 1 || got[0] != "world!" {
+		t.Fatalf("segment 1 finals = %v, want [world!]", got)
+	}
+	if segments[1].finalUsage != usage {
+		t.Fatalf("final usage = %#v, want original usage", segments[1].finalUsage)
+	}
+	if _, ok := m.streamActive.Load("test:123:session-1"); !ok {
+		t.Fatal("expected streamActive marker to be recorded after split stream finalize")
+	}
+}
+
+func TestGetStreamer_SplitOnMarkerKeepsReasoningOnInitialStreamer(t *testing.T) {
+	m := newTestManager()
+	m.config = &config.Config{
+		Agents: config.AgentsConfig{
+			Defaults: config.AgentDefaults{
+				SplitOnMarker: true,
+			},
+		},
+	}
+
+	initial := &mockReasoningStreamer{}
+	next := &recordingStreamSegment{}
+	callCount := 0
+	ch := &mockStreamingChannel{
+		beginStreamFn: func(context.Context, string) (Streamer, error) {
+			callCount++
+			if callCount == 1 {
+				return initial, nil
+			}
+			return next, nil
+		},
+	}
+	m.channels["test"] = ch
+
+	streamer, ok := m.GetStreamer(context.Background(), "test", "123", "")
+	if !ok {
+		t.Fatal("expected streamer to be available")
+	}
+	if err := streamer.Update(context.Background(), "hello<|[SPLIT]|>world"); err != nil {
+		t.Fatalf("Update() error = %v", err)
+	}
+	reasoningStreamer, ok := streamer.(bus.ReasoningStreamer)
+	if !ok {
+		t.Fatal("split streamer should preserve ReasoningStreamer")
+	}
+	if err := reasoningStreamer.UpdateReasoning(context.Background(), "thinking"); err != nil {
+		t.Fatalf("UpdateReasoning() error = %v", err)
+	}
+	if err := reasoningStreamer.FinalizeReasoning(context.Background(), "final thought"); err != nil {
+		t.Fatalf("FinalizeReasoning() error = %v", err)
+	}
+
+	if got := initial.reasoningUpdates; len(got) != 1 || got[0] != "thinking" {
+		t.Fatalf("initial reasoning updates = %v, want [thinking]", got)
+	}
+	if initial.reasoningFinal != "final thought" {
+		t.Fatalf("initial reasoning final = %q, want final thought", initial.reasoningFinal)
+	}
+}
+
 func TestGetStreamer_FinalizeSeparateMessagesClearsTrackedToolFeedback(t *testing.T) {
 	m := newTestManager()
 	m.config = &config.Config{
@@ -1650,7 +2213,7 @@ func TestGetStreamer_FinalizeSeparateMessagesClearsTrackedToolFeedback(t *testin
 	}
 	m.channels["test"] = ch

-	streamer, ok := m.GetStreamer(context.Background(), "test", "123")
+	streamer, ok := m.GetStreamer(context.Background(), "test", "123", "")
 	if !ok {
 		t.Fatal("expected streamer to be available")
 	}
@@ -1692,7 +2255,7 @@ func TestGetStreamer_FinalizeDismissesResolvedTrackedToolFeedback(t *testing.T)
 	}
 	m.channels["test"] = ch

-	streamer, ok := m.GetStreamer(context.Background(), "test", "-100123/42")
+	streamer, ok := m.GetStreamer(context.Background(), "test", "-100123/42", "")
 	if !ok {
 		t.Fatal("expected streamer to be available")
 	}
@@ -1761,7 +2324,7 @@ func TestGetStreamer_FinalizeFailureDoesNotDismissTrackedToolFeedback(t *testing
 	}
 	m.channels["test"] = ch

-	streamer, ok := m.GetStreamer(context.Background(), "test", "123")
+	streamer, ok := m.GetStreamer(context.Background(), "test", "123", "")
 	if !ok {
 		t.Fatal("expected streamer to be available")
 	}
@@ -1839,6 +2402,68 @@ func TestRunWorker_ToolFeedbackSkipsMarkerSplitting(t *testing.T) {
 	}
 }

+func TestRunWorker_FinalizedStreamSuppressesMarkerSplitBeforeSending(t *testing.T) {
+	m := newTestManager()
+	m.config = &config.Config{
+		Agents: config.AgentsConfig{
+			Defaults: config.AgentDefaults{
+				SplitOnMarker: true,
+			},
+		},
+	}
+
+	var (
+		mu       sync.Mutex
+		received []string
+	)
+	ch := &mockChannel{
+		sendFn: func(_ context.Context, msg bus.OutboundMessage) error {
+			mu.Lock()
+			received = append(received, msg.Content)
+			mu.Unlock()
+			return nil
+		},
+	}
+
+	w := &channelWorker{
+		ch:      ch,
+		queue:   make(chan bus.OutboundMessage, 1),
+		done:    make(chan struct{}),
+		limiter: rate.NewLimiter(rate.Inf, 1),
+	}
+
+	ctx, cancel := context.WithCancel(context.Background())
+	defer cancel()
+	go m.runWorker(ctx, "test", w)
+
+	streamKey := streamSuppressionKey("test", "123", "session-1")
+	m.streamActive.Store(streamKey, true)
+	w.queue <- testOutboundMessage(bus.OutboundMessage{
+		Channel:    "test",
+		ChatID:     "123",
+		SessionKey: "session-1",
+		Content:    "streamed full reply<|[SPLIT]|>duplicate chunk",
+		Context: bus.InboundContext{
+			Channel: "test",
+			ChatID:  "123",
+			Raw: map[string]string{
+				"outbound_kind": "final",
+			},
+		},
+	})
+
+	time.Sleep(100 * time.Millisecond)
+
+	mu.Lock()
+	defer mu.Unlock()
+	if len(received) != 0 {
+		t.Fatalf("received split duplicate messages = %v, want none", received)
+	}
+	if _, ok := m.streamActive.Load(streamKey); ok {
+		t.Fatal("expected finalized stream marker to be consumed")
+	}
+}
+
 func TestPreSend_PlaceholderEditFails_FallsThrough(t *testing.T) {
 	m := newTestManager()

@@ -464,8 +464,9 @@ func (c *PicoChannel) SendPlaceholder(ctx context.Context, chatID string) (strin

 	msgID := uuid.New().String()
 	outMsg := newMessage(TypeMessageCreate, map[string]any{
-		PayloadKeyContent: text,
-		"message_id":      msgID,
+		PayloadKeyContent:     text,
+		PayloadKeyPlaceholder: true,
+		"message_id":          msgID,
 	})

 	if err := c.broadcastToSession(chatID, outMsg); err != nil {
@@ -475,6 +476,195 @@ func (c *PicoChannel) SendPlaceholder(ctx context.Context, chatID string) (strin
 	return msgID, nil
 }

+// BeginStream implements channels.StreamingCapable for Pico WebUI.
+func (c *PicoChannel) BeginStream(ctx context.Context, chatID string) (channels.Streamer, error) {
+	if c == nil || c.config == nil || !c.config.Streaming.Enabled {
+		return nil, fmt.Errorf("streaming disabled in config")
+	}
+	if !c.IsRunning() {
+		return nil, channels.ErrNotRunning
+	}
+	streamCfg := c.config.Streaming.WithDefaults(0, 1)
+	return &picoStreamer{
+		channel:          c,
+		chatID:           chatID,
+		throttleInterval: time.Duration(streamCfg.ThrottleSeconds) * time.Second,
+		minGrowth:        streamCfg.MinGrowthChars,
+	}, nil
+}
+
+type picoStreamer struct {
+	channel          *PicoChannel
+	chatID           string
+	messageID        string
+	reasoningID      string
+	throttleInterval time.Duration
+	minGrowth        int
+	lastLen          int
+	lastAt           time.Time
+	lastContent      string
+	reasoningLastLen int
+	reasoningLastAt  time.Time
+	reasoningContent string
+	mu               sync.Mutex
+}
+
+func (s *picoStreamer) Update(ctx context.Context, content string) error {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	return s.updateLocked(ctx, content, false, nil)
+}
+
+func (s *picoStreamer) Finalize(ctx context.Context, content string) error {
+	return s.FinalizeWithContext(ctx, content, nil)
+}
+
+func (s *picoStreamer) FinalizeWithContext(ctx context.Context, content string, contextUsage *bus.ContextUsage) error {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	return s.updateLocked(ctx, content, true, contextUsage)
+}
+
+func (s *picoStreamer) UpdateReasoning(ctx context.Context, content string) error {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	return s.updateReasoningLocked(ctx, content, false)
+}
+
+func (s *picoStreamer) FinalizeReasoning(ctx context.Context, content string) error {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	return s.updateReasoningLocked(ctx, content, true)
+}
+
+func (s *picoStreamer) Cancel(ctx context.Context) {
+	if s == nil {
+		return
+	}
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	if s.channel == nil || s.messageID == "" {
+		if s.channel != nil && s.reasoningID != "" {
+			_ = s.channel.DeleteMessage(ctx, s.chatID, s.reasoningID)
+			s.reasoningID = ""
+		}
+		return
+	}
+	_ = s.channel.DeleteMessage(ctx, s.chatID, s.messageID)
+	s.messageID = ""
+	if s.reasoningID != "" {
+		_ = s.channel.DeleteMessage(ctx, s.chatID, s.reasoningID)
+		s.reasoningID = ""
+	}
+}
+
+func (s *picoStreamer) updateLocked(
+	ctx context.Context,
+	content string,
+	force bool,
+	contextUsage *bus.ContextUsage,
+) error {
+	if s == nil || s.channel == nil {
+		return fmt.Errorf("streamer is not initialized")
+	}
+	if strings.TrimSpace(content) == "" && s.messageID == "" {
+		return nil
+	}
+
+	now := time.Now()
+	contentLen := len([]rune(content))
+	if s.messageID != "" && !force {
+		growth := contentLen - s.lastLen
+		if now.Sub(s.lastAt) < s.throttleInterval || growth < s.minGrowth {
+			return nil
+		}
+	}
+
+	return s.sendLocked(ctx, content, contextUsage)
+}
+
+func (s *picoStreamer) updateReasoningLocked(ctx context.Context, content string, force bool) error {
+	if s == nil || s.channel == nil {
+		return fmt.Errorf("streamer is not initialized")
+	}
+	if strings.TrimSpace(content) == "" && s.reasoningID == "" {
+		return nil
+	}
+
+	now := time.Now()
+	contentLen := len([]rune(content))
+	if s.reasoningID != "" && !force {
+		growth := contentLen - s.reasoningLastLen
+		if now.Sub(s.reasoningLastAt) < s.throttleInterval || growth < s.minGrowth {
+			return nil
+		}
+	}
+
+	return s.sendReasoningLocked(ctx, content)
+}
+
+func (s *picoStreamer) sendLocked(ctx context.Context, content string, contextUsage *bus.ContextUsage) error {
+	now := time.Now()
+	contentLen := len([]rune(content))
+
+	if s.messageID == "" {
+		s.messageID = uuid.New().String()
+		payload := map[string]any{
+			PayloadKeyContent: content,
+			"message_id":      s.messageID,
+		}
+		setContextUsagePayload(payload, contextUsage)
+		outMsg := newMessage(TypeMessageCreate, payload)
+		if err := s.channel.broadcastToSession(s.chatID, outMsg); err != nil {
+			return err
+		}
+	} else if content != s.lastContent || contextUsage != nil {
+		if err := s.channel.editMessage(ctx, s.chatID, s.messageID, content, contextUsage); err != nil {
+			return err
+		}
+	}
+
+	s.lastContent = content
+	s.lastLen = contentLen
+	s.lastAt = now
+	return nil
+}
+
+func (s *picoStreamer) sendReasoningLocked(ctx context.Context, content string) error {
+	now := time.Now()
+	contentLen := len([]rune(content))
+
+	if s.reasoningID == "" {
+		s.reasoningID = uuid.New().String()
+		payload := map[string]any{
+			PayloadKeyContent: content,
+			"message_id":      s.reasoningID,
+			PayloadKeyKind:    MessageKindThought,
+			PayloadKeyThought: true,
+		}
+		outMsg := newMessage(TypeMessageCreate, payload)
+		if err := s.channel.broadcastToSession(s.chatID, outMsg); err != nil {
+			return err
+		}
+	} else if content != s.reasoningContent {
+		payload := map[string]any{
+			PayloadKeyContent: content,
+			"message_id":      s.reasoningID,
+			PayloadKeyKind:    MessageKindThought,
+			PayloadKeyThought: true,
+		}
+		outMsg := newMessage(TypeMessageUpdate, payload)
+		if err := s.channel.broadcastToSession(s.chatID, outMsg); err != nil {
+			return err
+		}
+	}
+
+	s.reasoningContent = content
+	s.reasoningLastLen = contentLen
+	s.reasoningLastAt = now
+	return nil
+}
+
 // SendMedia implements channels.MediaSender for the Pico web UI.
 // Media is delivered as a normal assistant message carrying structured
 // attachments plus an authenticated same-origin download URL.
@@ -226,6 +226,9 @@ func TestSendPlaceholder_EmitsNormalMessageWithoutKind(t *testing.T) {
 		if got := payload[PayloadKeyContent]; got != "Thinking..." {
 			t.Fatalf("placeholder content = %#v, want %q", got, "Thinking...")
 		}
+		if got := payload[PayloadKeyPlaceholder]; got != true {
+			t.Fatalf("placeholder marker = %#v, want true", got)
+		}
 		if got, ok := payload[PayloadKeyKind]; ok {
 			t.Fatalf("placeholder kind = %#v, want absent", got)
 		}
@@ -234,6 +237,279 @@ func TestSendPlaceholder_EmitsNormalMessageWithoutKind(t *testing.T) {
 	}
 }

+func TestBeginStream_CreatesAndUpdatesSameMessage(t *testing.T) {
+	ch := newTestPicoChannel(t)
+	ch.config.Streaming = config.StreamingConfig{
+		Enabled:         true,
+		ThrottleSeconds: 1,
+		MinGrowthChars:  1,
+	}
+	if err := ch.Start(context.Background()); err != nil {
+		t.Fatalf("Start() error = %v", err)
+	}
+	defer ch.Stop(context.Background())
+
+	clientConn, received, cleanup := newTestPicoWebSocket(t)
+	defer cleanup()
+	ch.addConnForTest(&picoConn{id: "conn-1", conn: clientConn, sessionID: "sess-1"})
+
+	streamer, err := ch.BeginStream(context.Background(), "pico:sess-1")
+	if err != nil {
+		t.Fatalf("BeginStream() error = %v", err)
+	}
+	if err := streamer.Update(context.Background(), "hello"); err != nil {
+		t.Fatalf("Update(first) error = %v", err)
+	}
+	first := mustReceivePicoMessage(t, received)
+	if first.Type != TypeMessageCreate {
+		t.Fatalf("first type = %q, want %q", first.Type, TypeMessageCreate)
+	}
+	msgID, _ := first.Payload["message_id"].(string)
+	if msgID == "" {
+		t.Fatalf("first message_id = %#v, want non-empty", first.Payload["message_id"])
+	}
+	if got := first.Payload[PayloadKeyContent]; got != "hello" {
+		t.Fatalf("first content = %#v, want hello", got)
+	}
+
+	rawStreamer := streamer.(*picoStreamer)
+	rawStreamer.mu.Lock()
+	rawStreamer.lastAt = time.Now().Add(-2 * time.Second)
+	rawStreamer.mu.Unlock()
+	secondContent := "hello world with enough growth to pass the default streaming threshold"
+	if err := streamer.Update(context.Background(), secondContent); err != nil {
+		t.Fatalf("Update(second) error = %v", err)
+	}
+	second := mustReceivePicoMessage(t, received)
+	if second.Type != TypeMessageUpdate {
+		t.Fatalf("second type = %q, want %q", second.Type, TypeMessageUpdate)
+	}
+	if got := second.Payload["message_id"]; got != msgID {
+		t.Fatalf("second message_id = %#v, want %q", got, msgID)
+	}
+	if got := second.Payload[PayloadKeyContent]; got != secondContent {
+		t.Fatalf("second content = %#v, want %q", got, secondContent)
+	}
+}
+
+func TestBeginStream_DefaultStreamingShowsSmallIncrements(t *testing.T) {
+	ch := newTestPicoChannel(t)
+	ch.config.Streaming = config.StreamingConfig{Enabled: true}
+	if err := ch.Start(context.Background()); err != nil {
+		t.Fatalf("Start() error = %v", err)
+	}
+	defer ch.Stop(context.Background())
+
+	clientConn, received, cleanup := newTestPicoWebSocket(t)
+	defer cleanup()
+	ch.addConnForTest(&picoConn{id: "conn-1", conn: clientConn, sessionID: "sess-1"})
+
+	streamer, err := ch.BeginStream(context.Background(), "pico:sess-1")
+	if err != nil {
+		t.Fatalf("BeginStream() error = %v", err)
+	}
+	if err := streamer.Update(context.Background(), "h"); err != nil {
+		t.Fatalf("Update(first) error = %v", err)
+	}
+	first := mustReceivePicoMessage(t, received)
+	if first.Type != TypeMessageCreate {
+		t.Fatalf("first type = %q, want %q", first.Type, TypeMessageCreate)
+	}
+	msgID, _ := first.Payload["message_id"].(string)
+	if msgID == "" {
+		t.Fatalf("first message_id = %#v, want non-empty", first.Payload["message_id"])
+	}
+
+	if err := streamer.Update(context.Background(), "he"); err != nil {
+		t.Fatalf("Update(second) error = %v", err)
+	}
+	second := mustReceivePicoMessage(t, received)
+	if second.Type != TypeMessageUpdate {
+		t.Fatalf("second type = %q, want %q", second.Type, TypeMessageUpdate)
+	}
+	if got := second.Payload["message_id"]; got != msgID {
+		t.Fatalf("second message_id = %#v, want %q", got, msgID)
+	}
+	if got := second.Payload[PayloadKeyContent]; got != "he" {
+		t.Fatalf("second content = %#v, want he", got)
+	}
+}
+
+func TestBeginStream_StreamsReasoningAsThoughtUpdates(t *testing.T) {
+	ch := newTestPicoChannel(t)
+	ch.config.Streaming = config.StreamingConfig{Enabled: true}
+	if err := ch.Start(context.Background()); err != nil {
+		t.Fatalf("Start() error = %v", err)
+	}
+	defer ch.Stop(context.Background())
+
+	clientConn, received, cleanup := newTestPicoWebSocket(t)
+	defer cleanup()
+	ch.addConnForTest(&picoConn{id: "conn-1", conn: clientConn, sessionID: "sess-1"})
+
+	streamer, err := ch.BeginStream(context.Background(), "pico:sess-1")
+	if err != nil {
+		t.Fatalf("BeginStream() error = %v", err)
+	}
+	reasoningStreamer, ok := streamer.(bus.ReasoningStreamer)
+	if !ok {
+		t.Fatal("pico stream should support reasoning updates")
+	}
+	if err := reasoningStreamer.UpdateReasoning(context.Background(), "thinking"); err != nil {
+		t.Fatalf("UpdateReasoning(first) error = %v", err)
+	}
+	first := mustReceivePicoMessage(t, received)
+	if first.Type != TypeMessageCreate {
+		t.Fatalf("first type = %q, want %q", first.Type, TypeMessageCreate)
+	}
+	msgID, _ := first.Payload["message_id"].(string)
+	if msgID == "" {
+		t.Fatalf("first message_id = %#v, want non-empty", first.Payload["message_id"])
+	}
+	if got := first.Payload[PayloadKeyKind]; got != MessageKindThought {
+		t.Fatalf("first kind = %#v, want %q", got, MessageKindThought)
+	}
+	if got := first.Payload[PayloadKeyContent]; got != "thinking" {
+		t.Fatalf("first content = %#v, want thinking", got)
+	}
+
+	if err := reasoningStreamer.UpdateReasoning(context.Background(), "thinking more"); err != nil {
+		t.Fatalf("UpdateReasoning(second) error = %v", err)
+	}
+	second := mustReceivePicoMessage(t, received)
+	if second.Type != TypeMessageUpdate {
+		t.Fatalf("second type = %q, want %q", second.Type, TypeMessageUpdate)
+	}
+	if got := second.Payload["message_id"]; got != msgID {
+		t.Fatalf("second message_id = %#v, want %q", got, msgID)
+	}
+	if got := second.Payload[PayloadKeyKind]; got != MessageKindThought {
+		t.Fatalf("second kind = %#v, want %q", got, MessageKindThought)
+	}
+	if got := second.Payload[PayloadKeyContent]; got != "thinking more" {
+		t.Fatalf("second content = %#v, want thinking more", got)
+	}
+}
+
+func TestBeginStream_ThrottlesIntermediateUpdatesAndFinalFlushes(t *testing.T) {
+	ch := newTestPicoChannel(t)
+	ch.config.Streaming = config.StreamingConfig{
+		Enabled:         true,
+		ThrottleSeconds: 60,
+		MinGrowthChars:  100,
+	}
+	if err := ch.Start(context.Background()); err != nil {
+		t.Fatalf("Start() error = %v", err)
+	}
+	defer ch.Stop(context.Background())
+
+	clientConn, received, cleanup := newTestPicoWebSocket(t)
+	defer cleanup()
+	ch.addConnForTest(&picoConn{id: "conn-1", conn: clientConn, sessionID: "sess-1"})
+
+	streamer, err := ch.BeginStream(context.Background(), "pico:sess-1")
+	if err != nil {
+		t.Fatalf("BeginStream() error = %v", err)
+	}
+	if err := streamer.Update(context.Background(), "first"); err != nil {
+		t.Fatalf("Update(first) error = %v", err)
+	}
+	if err := streamer.Update(context.Background(), "first plus short growth"); err != nil {
+		t.Fatalf("Update(throttled) error = %v", err)
+	}
+	if err := streamer.Update(context.Background(), "first"+strings.Repeat("x", 120)); err != nil {
+		t.Fatalf("Update(enough growth too soon) error = %v", err)
+	}
+
+	first := mustReceivePicoMessage(t, received)
+	if first.Type != TypeMessageCreate {
+		t.Fatalf("first type = %q, want %q", first.Type, TypeMessageCreate)
+	}
+	msgID, _ := first.Payload["message_id"].(string)
+	assertNoPicoMessage(t, received)
+
+	rawStreamer := streamer.(*picoStreamer)
+	rawStreamer.mu.Lock()
+	rawStreamer.lastAt = time.Now().Add(-61 * time.Second)
+	rawStreamer.mu.Unlock()
+	if err := streamer.Update(context.Background(), "first plus small growth"); err != nil {
+		t.Fatalf("Update(enough time too little growth) error = %v", err)
+	}
+	assertNoPicoMessage(t, received)
+
+	if err := streamer.Finalize(context.Background(), "first plus final text"); err != nil {
+		t.Fatalf("Finalize() error = %v", err)
+	}
+	final := mustReceivePicoMessage(t, received)
+	if final.Type != TypeMessageUpdate {
+		t.Fatalf("final type = %q, want %q", final.Type, TypeMessageUpdate)
+	}
+	if got := final.Payload["message_id"]; got != msgID {
+		t.Fatalf("final message_id = %#v, want %q", got, msgID)
+	}
+	if got := final.Payload[PayloadKeyContent]; got != "first plus final text" {
+		t.Fatalf("final content = %#v, want final text", got)
+	}
+	assertNoPicoMessage(t, received)
+}
+
+func TestBeginStream_FinalizeIncludesContextUsage(t *testing.T) {
+	ch := newTestPicoChannel(t)
+	ch.config.Streaming = config.StreamingConfig{
+		Enabled:         true,
+		ThrottleSeconds: 0,
+		MinGrowthChars:  0,
+	}
+	if err := ch.Start(context.Background()); err != nil {
+		t.Fatalf("Start() error = %v", err)
+	}
+	defer ch.Stop(context.Background())
+
+	clientConn, received, cleanup := newTestPicoWebSocket(t)
+	defer cleanup()
+	ch.addConnForTest(&picoConn{id: "conn-1", conn: clientConn, sessionID: "sess-1"})
+
+	streamer, err := ch.BeginStream(context.Background(), "pico:sess-1")
+	if err != nil {
+		t.Fatalf("BeginStream() error = %v", err)
+	}
+	if err := streamer.Update(context.Background(), "partial"); err != nil {
+		t.Fatalf("Update() error = %v", err)
+	}
+	first := mustReceivePicoMessage(t, received)
+	msgID, _ := first.Payload["message_id"].(string)
+
+	contextStreamer, ok := streamer.(interface {
+		FinalizeWithContext(ctx context.Context, content string, usage *bus.ContextUsage) error
+	})
+	if !ok {
+		t.Fatal("streamer should support FinalizeWithContext")
+	}
+	if err := contextStreamer.FinalizeWithContext(context.Background(), "final", &bus.ContextUsage{
+		UsedTokens:       10,
+		TotalTokens:      100,
+		CompressAtTokens: 80,
+		UsedPercent:      10,
+	}); err != nil {
+		t.Fatalf("FinalizeWithContext() error = %v", err)
+	}
+
+	final := mustReceivePicoMessage(t, received)
+	if final.Type != TypeMessageUpdate {
+		t.Fatalf("final type = %q, want %q", final.Type, TypeMessageUpdate)
+	}
+	if got := final.Payload["message_id"]; got != msgID {
+		t.Fatalf("final message_id = %#v, want %q", got, msgID)
+	}
+	rawUsage, ok := final.Payload["context_usage"].(map[string]any)
+	if !ok {
+		t.Fatalf("final context_usage = %#v, want map", final.Payload["context_usage"])
+	}
+	if got := rawUsage["used_tokens"]; got != float64(10) {
+		t.Fatalf("used_tokens = %#v, want 10", got)
+	}
+}
+
 func TestCreateAndAddConnection_RespectsMaxConnectionsConcurrently(t *testing.T) {
 	ch := newTestPicoChannel(t)

@@ -491,6 +767,26 @@ func TestHandleMediaDownload_ServesStoredFile(t *testing.T) {
 	}
 }

+func mustReceivePicoMessage(t *testing.T, received <-chan PicoMessage) PicoMessage {
+	t.Helper()
+	select {
+	case msg := <-received:
+		return msg
+	case <-time.After(time.Second):
+		t.Fatal("expected pico message")
+	}
+	return PicoMessage{}
+}
+
+func assertNoPicoMessage(t *testing.T, received <-chan PicoMessage) {
+	t.Helper()
+	select {
+	case msg := <-received:
+		t.Fatalf("unexpected pico message: %+v", msg)
+	case <-time.After(150 * time.Millisecond):
+	}
+}
+
 func (c *PicoChannel) addConnForTest(pc *picoConn) {
 	c.connsMu.Lock()
 	defer c.connsMu.Unlock()
@@ -22,10 +22,11 @@ const (
 	TypeError         = "error"
 	TypePong          = "pong"

-	PayloadKeyContent   = "content"
-	PayloadKeyThought   = "thought"
-	PayloadKeyKind      = "kind"
-	PayloadKeyToolCalls = "tool_calls"
+	PayloadKeyContent     = "content"
+	PayloadKeyThought     = "thought"
+	PayloadKeyKind        = "kind"
+	PayloadKeyPlaceholder = "placeholder"
+	PayloadKeyToolCalls   = "tool_calls"

 	MessageKindThought   = "thought"
 	MessageKindToolCalls = "tool_calls"
@@ -1471,7 +1471,7 @@ func (c *TelegramChannel) BeginStream(ctx context.Context, chatID string) (chann
 		return nil, err
 	}

-	streamCfg := c.tgCfg.Streaming
+	streamCfg := c.tgCfg.Streaming.WithDefaults(3, 200)
 	return &telegramStreamer{
 		bot:              c.bot,
 		chatID:           cid,
@@ -1483,8 +1483,8 @@ func (c *TelegramChannel) BeginStream(ctx context.Context, chatID string) (chann
 }

 // telegramStreamer streams partial LLM output via Telegram's sendMessageDraft API.
-// On first API error (e.g. bot lacks forum mode), it silently degrades: Update
-// becomes a no-op, while Finalize still delivers the final message.
+// Draft update failures are returned to the agent, which decides whether the
+// stream was already visible enough to keep or should fall back to Chat().
 type telegramStreamer struct {
 	bot              *telego.Bot
 	chatID           int64
@@ -1495,6 +1495,7 @@ type telegramStreamer struct {
 	lastLen          int
 	lastAt           time.Time
 	failed           bool
+	draftTouched     bool
 	mu               sync.Mutex
 }

@@ -1503,7 +1504,7 @@ func (s *telegramStreamer) Update(ctx context.Context, content string) error {
 	defer s.mu.Unlock()

 	if s.failed {
-		return nil
+		return fmt.Errorf("telegram streaming disabled after previous draft failure")
 	}

 	// Throttle: skip if not enough time or content has passed
@@ -1514,6 +1515,7 @@ func (s *telegramStreamer) Update(ctx context.Context, content string) error {
 	}

 	htmlContent := markdownToTelegramHTML(content)
+	s.draftTouched = true

 	err := s.bot.SendMessageDraft(ctx, &telego.SendMessageDraftParams{
 		ChatID:          s.chatID,
@@ -1523,12 +1525,11 @@ func (s *telegramStreamer) Update(ctx context.Context, content string) error {
 		ParseMode:       telego.ModeHTML,
 	})
 	if err != nil {
-		// First error → degrade silently (e.g. no forum mode)
 		logger.WarnCF("telegram", "sendMessageDraft failed, disabling streaming", map[string]any{
 			"error": err.Error(),
 		})
 		s.failed = true
-		return nil // don't propagate — Finalize will still deliver
+		return fmt.Errorf("telegram draft update: %w", err)
 	}

 	s.lastLen = len(content)
@@ -1554,11 +1555,33 @@ func (s *telegramStreamer) Finalize(ctx context.Context, content string) error {
 			return fmt.Errorf("telegram finalize: %w", err)
 		}
 	}
+	s.Cancel(ctx)
 	return nil
 }

 func (s *telegramStreamer) Cancel(ctx context.Context) {
-	// Draft auto-expires on Telegram's side; nothing to clean up.
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	s.clearDraft(ctx)
+}
+
+func (s *telegramStreamer) clearDraft(ctx context.Context) {
+	if !s.draftTouched {
+		return
+	}
+	if err := s.bot.SendMessageDraft(ctx, &telego.SendMessageDraftParams{
+		ChatID:          s.chatID,
+		MessageThreadID: s.threadID,
+		DraftID:         s.draftID,
+		Text:            " ",
+	}); err != nil {
+		logger.DebugCF("telegram", "failed to clear streaming draft", map[string]any{
+			"chat_id": s.chatID,
+			"error":   err.Error(),
+		})
+	}
+	s.lastLen = 0
+	s.draftTouched = false
 }

 // cryptoRandInt returns a non-zero random int using crypto/rand.
@@ -762,6 +762,120 @@ func TestBeginStream_UpdateUsesForumThreadID(t *testing.T) {
 	assert.Equal(t, "partial", params.Text)
 }

+func TestBeginStream_UsesDefaultThrottleWhenOnlyEnabled(t *testing.T) {
+	caller := &stubCaller{
+		callFn: func(ctx context.Context, url string, data *ta.RequestData) (*ta.Response, error) {
+			return &ta.Response{Ok: true, Result: []byte("true")}, nil
+		},
+	}
+	ch := newTestChannel(t, caller)
+	ch.tgCfg.Streaming = config.StreamingConfig{Enabled: true}
+
+	streamer, err := ch.BeginStream(context.Background(), "12345")
+	require.NoError(t, err)
+	require.NoError(t, streamer.Update(context.Background(), "partial"))
+	require.NoError(t, streamer.Update(context.Background(), "partial plus one"))
+
+	require.Len(t, caller.calls, 1, "second small update should be throttled by defaults")
+}
+
+func TestBeginStream_UpdateReturnsErrorWhenDraftFails(t *testing.T) {
+	callCount := 0
+	caller := &stubCaller{
+		callFn: func(ctx context.Context, url string, data *ta.RequestData) (*ta.Response, error) {
+			callCount++
+			if callCount == 1 {
+				return nil, errors.New("draft unsupported")
+			}
+			return &ta.Response{Ok: true, Result: []byte("true")}, nil
+		},
+	}
+	ch := newTestChannel(t, caller)
+	ch.tgCfg.Streaming = config.StreamingConfig{Enabled: true}
+
+	streamer, err := ch.BeginStream(context.Background(), "12345")
+	require.NoError(t, err)
+
+	err = streamer.Update(context.Background(), "partial")
+	require.Error(t, err)
+	assert.Contains(t, err.Error(), "draft unsupported")
+
+	streamer.Cancel(context.Background())
+	require.Len(t, caller.calls, 2)
+	assert.Contains(t, caller.calls[1].URL, "sendMessageDraft")
+
+	var params struct {
+		ChatID  int64  `json:"chat_id"`
+		DraftID int    `json:"draft_id"`
+		Text    string `json:"text"`
+	}
+	require.NoError(t, json.Unmarshal(caller.calls[1].Data.BodyRaw, &params))
+	assert.Equal(t, int64(12345), params.ChatID)
+	assert.NotZero(t, params.DraftID)
+	assert.Equal(t, " ", params.Text)
+}
+
+func TestBeginStream_CancelClearsExistingDraft(t *testing.T) {
+	caller := &stubCaller{
+		callFn: func(ctx context.Context, url string, data *ta.RequestData) (*ta.Response, error) {
+			return &ta.Response{Ok: true, Result: []byte("true")}, nil
+		},
+	}
+	ch := newTestChannel(t, caller)
+	ch.tgCfg.Streaming = config.StreamingConfig{Enabled: true}
+
+	streamer, err := ch.BeginStream(context.Background(), "12345")
+	require.NoError(t, err)
+	require.NoError(t, streamer.Update(context.Background(), "partial"))
+	streamer.Cancel(context.Background())
+
+	require.Len(t, caller.calls, 2)
+	assert.Contains(t, caller.calls[1].URL, "sendMessageDraft")
+
+	var params struct {
+		ChatID  int64  `json:"chat_id"`
+		DraftID int    `json:"draft_id"`
+		Text    string `json:"text"`
+	}
+	require.NoError(t, json.Unmarshal(caller.calls[1].Data.BodyRaw, &params))
+	assert.Equal(t, int64(12345), params.ChatID)
+	assert.NotZero(t, params.DraftID)
+	assert.Equal(t, " ", params.Text)
+}
+
+func TestBeginStream_FinalizeClearsExistingDraft(t *testing.T) {
+	caller := &stubCaller{
+		callFn: func(ctx context.Context, url string, data *ta.RequestData) (*ta.Response, error) {
+			if strings.Contains(url, "sendMessage") && !strings.Contains(url, "sendMessageDraft") {
+				return successResponse(t), nil
+			}
+			return &ta.Response{Ok: true, Result: []byte("true")}, nil
+		},
+	}
+	ch := newTestChannel(t, caller)
+	ch.tgCfg.Streaming = config.StreamingConfig{Enabled: true}
+
+	streamer, err := ch.BeginStream(context.Background(), "12345")
+	require.NoError(t, err)
+	require.NoError(t, streamer.Update(context.Background(), "partial"))
+	require.NoError(t, streamer.Finalize(context.Background(), "final"))
+
+	require.Len(t, caller.calls, 3)
+	assert.Contains(t, caller.calls[0].URL, "sendMessageDraft")
+	assert.Contains(t, caller.calls[1].URL, "sendMessage")
+	assert.Contains(t, caller.calls[2].URL, "sendMessageDraft")
+
+	var params struct {
+		ChatID  int64  `json:"chat_id"`
+		DraftID int    `json:"draft_id"`
+		Text    string `json:"text"`
+	}
+	require.NoError(t, json.Unmarshal(caller.calls[2].Data.BodyRaw, &params))
+	assert.Equal(t, int64(12345), params.ChatID)
+	assert.NotZero(t, params.DraftID)
+	assert.Equal(t, " ", params.Text)
+}
+
 func TestBeginStream_FinalizeUsesForumThreadID(t *testing.T) {
 	caller := &stubCaller{
 		callFn: func(ctx context.Context, url string, data *ta.RequestData) (*ta.Response, error) {
@@ -164,6 +164,9 @@ func (c *WeComChannel) Stop(_ context.Context) error {
 }

 func (c *WeComChannel) BeginStream(_ context.Context, chatID string) (channels.Streamer, error) {
+	if c == nil || c.config == nil || !c.config.Streaming.Enabled {
+		return nil, fmt.Errorf("streaming disabled in config")
+	}
 	if !c.IsRunning() {
 		return nil, channels.ErrNotRunning
 	}
@@ -100,6 +100,7 @@ func TestBeginStream_UpdateAndFinalize(t *testing.T) {
 	t.Parallel()

 	ch := newTestWeComChannel(t, bus.NewMessageBus())
+	ch.config.Streaming.Enabled = true
 	ch.SetRunning(true)
 	ch.queueTurn("chat-1", wecomTurn{
 		ReqID:     "req-1",
@@ -158,6 +159,24 @@ func TestBeginStream_UpdateAndFinalize(t *testing.T) {
 	}
 }

+func TestBeginStream_RequiresStreamingEnabled(t *testing.T) {
+	t.Parallel()
+
+	ch := newTestWeComChannel(t, bus.NewMessageBus())
+	ch.SetRunning(true)
+	ch.queueTurn("chat-1", wecomTurn{
+		ReqID:     "req-1",
+		ChatID:    "chat-1",
+		ChatType:  1,
+		StreamID:  "stream-1",
+		CreatedAt: time.Now(),
+	})
+
+	if _, err := ch.BeginStream(context.Background(), "chat-1"); err == nil {
+		t.Fatal("BeginStream() error = nil, want disabled streaming error")
+	}
+}
+
 func TestSend_StreamFailureFallsBackToActualChatID(t *testing.T) {
 	t.Parallel()

@@ -468,9 +468,25 @@ func (p *PlaceholderConfig) GetRandomText() string {
 }

 type StreamingConfig struct {
-	Enabled         bool `json:"enabled,omitempty"          env:"PICOCLAW_CHANNELS_TELEGRAM_STREAMING_ENABLED"`
-	ThrottleSeconds int  `json:"throttle_seconds,omitempty" env:"PICOCLAW_CHANNELS_TELEGRAM_STREAMING_THROTTLE_SECONDS"`
-	MinGrowthChars  int  `json:"min_growth_chars,omitempty" env:"PICOCLAW_CHANNELS_TELEGRAM_STREAMING_MIN_GROWTH_CHARS"`
+	Enabled         bool `json:"enabled,omitempty"`
+	ThrottleSeconds int  `json:"throttle_seconds,omitempty"`
+	MinGrowthChars  int  `json:"min_growth_chars,omitempty"`
+}
+
+func (c StreamingConfig) IsZero() bool {
+	return !c.Enabled && c.ThrottleSeconds == 0 && c.MinGrowthChars == 0
+}
+
+func (c StreamingConfig) WithDefaults(throttleSeconds, minGrowthChars int) StreamingConfig {
+	if c.Enabled {
+		if c.ThrottleSeconds == 0 {
+			c.ThrottleSeconds = throttleSeconds
+		}
+		if c.MinGrowthChars == 0 {
+			c.MinGrowthChars = minGrowthChars
+		}
+	}
+	return c
 }

 type WhatsAppSettings struct {
@@ -483,7 +499,7 @@ type TelegramSettings struct {
 	Token             SecureString    `json:"token,omitzero"       yaml:"token,omitempty" env:"PICOCLAW_CHANNELS_TELEGRAM_TOKEN"`
 	BaseURL           string          `json:"base_url"             yaml:"-"               env:"PICOCLAW_CHANNELS_TELEGRAM_BASE_URL"`
 	Proxy             string          `json:"proxy"                yaml:"-"               env:"PICOCLAW_CHANNELS_TELEGRAM_PROXY"`
-	Streaming         StreamingConfig `json:"streaming,omitempty"  yaml:"-"`
+	Streaming         StreamingConfig `json:"streaming,omitzero"   yaml:"-"`
 	UseMarkdownV2     bool            `json:"use_markdown_v2"      yaml:"-"               env:"PICOCLAW_CHANNELS_TELEGRAM_USE_MARKDOWN_V2"`
 	MediaGroupDelayMS int             `json:"media_group_delay_ms" yaml:"-"               env:"PICOCLAW_CHANNELS_TELEGRAM_MEDIA_GROUP_DELAY_MS"`
 }
@@ -557,10 +573,11 @@ type WeComGroupConfig struct {
 }

 type WeComSettings struct {
-	BotID               string       `json:"bot_id"                  yaml:"-"                env:"BOT_ID"`
-	Secret              SecureString `json:"secret,omitzero"         yaml:"secret,omitempty" env:"SECRET"`
-	WebSocketURL        string       `json:"websocket_url,omitempty" yaml:"-"                env:"WEBSOCKET_URL"`
-	SendThinkingMessage bool         `json:"send_thinking_message"   yaml:"-"                env:"SEND_THINKING_MESSAGE"`
+	BotID               string          `json:"bot_id"                  yaml:"-"                env:"BOT_ID"`
+	Secret              SecureString    `json:"secret,omitzero"         yaml:"secret,omitempty" env:"SECRET"`
+	WebSocketURL        string          `json:"websocket_url,omitempty" yaml:"-"                env:"WEBSOCKET_URL"`
+	SendThinkingMessage bool            `json:"send_thinking_message"   yaml:"-"                env:"SEND_THINKING_MESSAGE"`
+	Streaming           StreamingConfig `json:"streaming,omitzero"      yaml:"-"`
 }

 func (c *WeComSettings) SetSecret(secret string) {
@@ -581,13 +598,14 @@ func (c *WeixinSettings) SetToken(token string) {
 }

 type PicoSettings struct {
-	Token           SecureString `json:"token,omitzero"              yaml:"token,omitempty" env:"PICOCLAW_CHANNELS_PICO_TOKEN"`
-	AllowTokenQuery bool         `json:"allow_token_query,omitempty" yaml:"-"`
-	AllowOrigins    []string     `json:"allow_origins,omitempty"     yaml:"-"`
-	PingInterval    int          `json:"ping_interval,omitempty"     yaml:"-"`
-	ReadTimeout     int          `json:"read_timeout,omitempty"      yaml:"-"`
-	WriteTimeout    int          `json:"write_timeout,omitempty"     yaml:"-"`
-	MaxConnections  int          `json:"max_connections,omitempty"   yaml:"-"`
+	Token           SecureString    `json:"token,omitzero"              yaml:"token,omitempty" env:"PICOCLAW_CHANNELS_PICO_TOKEN"`
+	AllowTokenQuery bool            `json:"allow_token_query,omitempty" yaml:"-"`
+	AllowOrigins    []string        `json:"allow_origins,omitempty"     yaml:"-"`
+	Streaming       StreamingConfig `json:"streaming,omitzero"          yaml:"-"`
+	PingInterval    int             `json:"ping_interval,omitempty"     yaml:"-"`
+	ReadTimeout     int             `json:"read_timeout,omitempty"      yaml:"-"`
+	WriteTimeout    int             `json:"write_timeout,omitempty"     yaml:"-"`
+	MaxConnections  int             `json:"max_connections,omitempty"   yaml:"-"`
 }

 // SetToken sets the Pico token and marks it as dirty for security saving
@@ -678,6 +696,14 @@ type VoiceConfig struct {
 	ElevenLabsAPIKey  string `json:"elevenlabs_api_key,omitempty" env:"PICOCLAW_VOICE_ELEVENLABS_API_KEY"`
 }

+type ModelStreamingConfig struct {
+	Enabled bool `json:"enabled,omitempty"`
+}
+
+func (c ModelStreamingConfig) IsZero() bool {
+	return !c.Enabled
+}
+
 // ModelConfig represents a model-centric provider configuration.
 // It allows adding new providers (especially OpenAI-compatible ones) via configuration only.
 // The Model field may be either a plain model identifier or a provider-prefixed
@@ -702,13 +728,14 @@ type ModelConfig struct {
 	Workspace   string `json:"workspace,omitempty"`    // Workspace path for CLI-based providers

 	// Optional optimizations
-	RPM                 int               `json:"rpm,omitempty"`              // Requests per minute limit
-	MaxTokensField      string            `json:"max_tokens_field,omitempty"` // Field name for max tokens (e.g., "max_completion_tokens")
-	RequestTimeout      int               `json:"request_timeout,omitempty"`
-	ThinkingLevel       string            `json:"thinking_level,omitempty"`        // Extended thinking: off|low|medium|high|xhigh|adaptive
-	ToolSchemaTransform string            `json:"tool_schema_transform,omitempty"` // Optional tool schema compatibility transform (e.g. "simple")
-	ExtraBody           map[string]any    `json:"extra_body,omitempty"`            // Additional fields to inject into request body
-	CustomHeaders       map[string]string `json:"custom_headers,omitempty"`        // Additional headers to inject into every HTTP request
+	RPM                 int                  `json:"rpm,omitempty"`              // Requests per minute limit
+	MaxTokensField      string               `json:"max_tokens_field,omitempty"` // Field name for max tokens (e.g., "max_completion_tokens")
+	RequestTimeout      int                  `json:"request_timeout,omitempty"`
+	ThinkingLevel       string               `json:"thinking_level,omitempty"`        // Extended thinking: off|low|medium|high|xhigh|adaptive
+	ToolSchemaTransform string               `json:"tool_schema_transform,omitempty"` // Optional tool schema compatibility transform (e.g. "simple")
+	Streaming           ModelStreamingConfig `json:"streaming,omitzero"`              // Opt-in for provider streaming on this model entry
+	ExtraBody           map[string]any       `json:"extra_body,omitempty"`            // Additional fields to inject into request body
+	CustomHeaders       map[string]string    `json:"custom_headers,omitempty"`        // Additional headers to inject into every HTTP request

 	APIKeys SecureStrings `json:"api_keys,omitzero" yaml:"api_keys,omitempty"` // API authentication keys (multiple keys for failover)

@@ -1661,6 +1688,7 @@ func expandMultiKeyModels(models []*ModelConfig) []*ModelConfig {
 				RequestTimeout:      m.RequestTimeout,
 				ThinkingLevel:       m.ThinkingLevel,
 				ToolSchemaTransform: m.ToolSchemaTransform,
+				Streaming:           m.Streaming,
 				ExtraBody:           m.ExtraBody,
 				CustomHeaders:       m.CustomHeaders,
 				UserAgent:           m.UserAgent,
@@ -1685,6 +1713,7 @@ func expandMultiKeyModels(models []*ModelConfig) []*ModelConfig {
 			RequestTimeout:      m.RequestTimeout,
 			ThinkingLevel:       m.ThinkingLevel,
 			ToolSchemaTransform: m.ToolSchemaTransform,
+			Streaming:           m.Streaming,
 			ExtraBody:           m.ExtraBody,
 			CustomHeaders:       m.CustomHeaders,
 			UserAgent:           m.UserAgent,
@@ -3,7 +3,9 @@ package config
 import (
 	"encoding/json"
 	"fmt"
+	"os"
 	"reflect"
+	"strconv"
 	"strings"

 	"github.com/caarlos0/env/v11"
@@ -239,6 +241,7 @@ func (b Channel) MarshalJSON() ([]byte, error) {
 		if err != nil {
 			return nil, err
 		}
+		raw = preserveExplicitDisabledStreaming(raw, b.Settings)
 		settings = raw
 	} else {
 		settings = b.Settings
@@ -252,6 +255,36 @@ func (b Channel) MarshalJSON() ([]byte, error) {
 	return json.Marshal((*Alias)(&out))
 }

+func preserveExplicitDisabledStreaming(settings, original RawNode) RawNode {
+	if len(original) == 0 || len(settings) == 0 {
+		return settings
+	}
+
+	var originalMap map[string]any
+	if err := json.Unmarshal(original, &originalMap); err != nil {
+		return settings
+	}
+	originalStreaming, ok := originalMap["streaming"].(map[string]any)
+	if !ok || originalStreaming["enabled"] != false {
+		return settings
+	}
+
+	var settingsMap map[string]any
+	if err := json.Unmarshal(settings, &settingsMap); err != nil {
+		return settings
+	}
+	if _, exists := settingsMap["streaming"]; exists {
+		return settings
+	}
+	settingsMap["streaming"] = map[string]any{"enabled": false}
+
+	data, err := json.Marshal(settingsMap)
+	if err != nil {
+		return settings
+	}
+	return data
+}
+
 // MarshalYAML implements yaml.ValueMarshaler for Channel.
 // Outputs only secure fields in the Settings YAML (for .security.yml).
 // If Decode was called, it serializes from the stored extend (reflecting any
@@ -696,6 +729,10 @@ func InitChannelList(channels ChannelsConfig) error {
 			if err := env.Parse(target); err != nil {
 				// Non-fatal: some env vars may not apply
 			}
+			applyTelegramStreamingEnvCompat(target)
+			if err := validateChannelStreamingConfig(name, target); err != nil {
+				return err
+			}
 		}
 	}

@@ -706,3 +743,48 @@ func InitChannelList(channels ChannelsConfig) error {

 	return nil
 }
+
+func applyTelegramStreamingEnvCompat(target any) {
+	settings, ok := target.(*TelegramSettings)
+	if !ok || settings == nil {
+		return
+	}
+
+	if raw, ok := os.LookupEnv("PICOCLAW_CHANNELS_TELEGRAM_STREAMING_ENABLED"); ok {
+		if value, err := strconv.ParseBool(raw); err == nil {
+			settings.Streaming.Enabled = value
+		}
+	}
+	if raw, ok := os.LookupEnv("PICOCLAW_CHANNELS_TELEGRAM_STREAMING_THROTTLE_SECONDS"); ok {
+		if value, err := strconv.Atoi(raw); err == nil {
+			settings.Streaming.ThrottleSeconds = value
+		}
+	}
+	if raw, ok := os.LookupEnv("PICOCLAW_CHANNELS_TELEGRAM_STREAMING_MIN_GROWTH_CHARS"); ok {
+		if value, err := strconv.Atoi(raw); err == nil {
+			settings.Streaming.MinGrowthChars = value
+		}
+	}
+}
+
+func validateChannelStreamingConfig(channelName string, target any) error {
+	var streaming StreamingConfig
+	switch settings := target.(type) {
+	case *PicoSettings:
+		streaming = settings.Streaming
+	case *TelegramSettings:
+		streaming = settings.Streaming
+	case *WeComSettings:
+		streaming = settings.Streaming
+	default:
+		return nil
+	}
+
+	if streaming.ThrottleSeconds < 0 {
+		return fmt.Errorf("channel %q streaming.throttle_seconds must be >= 0", channelName)
+	}
+	if streaming.MinGrowthChars < 0 {
+		return fmt.Errorf("channel %q streaming.min_growth_chars must be >= 0", channelName)
+	}
+	return nil
+}
@@ -2,6 +2,7 @@ package config

 import (
 	"encoding/json"
+	"reflect"
 	"strings"
 	"testing"

@@ -133,6 +134,179 @@ func TestChannel_JSON_Unmarshal(t *testing.T) {
 	assert.Equal(t, "", cfg.Token.String())
 }

+func TestStreamingConfig_IsChannelGeneric(t *testing.T) {
+	typ := reflect.TypeOf(StreamingConfig{})
+	for i := 0; i < typ.NumField(); i++ {
+		field := typ.Field(i)
+		if got := field.Tag.Get("env"); got != "" {
+			t.Fatalf("StreamingConfig.%s env tag = %q, want no channel-specific env tag", field.Name, got)
+		}
+	}
+}
+
+func TestPicoSettings_StreamingConfig(t *testing.T) {
+	raw := RawNode(`{
+		"token": "test-token",
+		"streaming": {
+			"enabled": true,
+			"throttle_seconds": 2,
+			"min_growth_chars": 80
+		}
+	}`)
+	ch := &Channel{
+		Type:     ChannelPico,
+		Enabled:  true,
+		Settings: raw,
+	}
+	ch.SetName("pico")
+	var picoCfg PicoSettings
+	if err := ch.Decode(&picoCfg); err != nil {
+		t.Fatalf("Decode() error = %v", err)
+	}
+	assert.True(t, picoCfg.Streaming.Enabled)
+	assert.Equal(t, 2, picoCfg.Streaming.ThrottleSeconds)
+	assert.Equal(t, 80, picoCfg.Streaming.MinGrowthChars)
+}
+
+func TestWeComSettings_StreamingConfig(t *testing.T) {
+	raw := RawNode(`{
+		"bot_id": "bot-1",
+		"streaming": {
+			"enabled": true,
+			"throttle_seconds": 4,
+			"min_growth_chars": 160
+		}
+	}`)
+	ch := &Channel{
+		Type:     ChannelWeCom,
+		Enabled:  true,
+		Settings: raw,
+	}
+	ch.SetName("wecom")
+	var wecomCfg WeComSettings
+	if err := ch.Decode(&wecomCfg); err != nil {
+		t.Fatalf("Decode() error = %v", err)
+	}
+	assert.True(t, wecomCfg.Streaming.Enabled)
+	assert.Equal(t, 4, wecomCfg.Streaming.ThrottleSeconds)
+	assert.Equal(t, 160, wecomCfg.Streaming.MinGrowthChars)
+}
+
+func TestPicoStreamingConfig_Defaults(t *testing.T) {
+	cfg := StreamingConfig{Enabled: true}
+	got := cfg.WithDefaults(1, 40)
+	assert.Equal(t, 1, got.ThrottleSeconds)
+	assert.Equal(t, 40, got.MinGrowthChars)
+
+	cfg = StreamingConfig{Enabled: true, ThrottleSeconds: 5, MinGrowthChars: 200}
+	got = cfg.WithDefaults(1, 40)
+	assert.Equal(t, 5, got.ThrottleSeconds)
+	assert.Equal(t, 200, got.MinGrowthChars)
+
+	cfg = StreamingConfig{}
+	got = cfg.WithDefaults(1, 40)
+	assert.Equal(t, 0, got.ThrottleSeconds)
+	assert.Equal(t, 0, got.MinGrowthChars)
+}
+
+func TestInitChannelList_TelegramStreamingEnvCompatibility(t *testing.T) {
+	t.Setenv("PICOCLAW_CHANNELS_TELEGRAM_STREAMING_ENABLED", "true")
+	t.Setenv("PICOCLAW_CHANNELS_TELEGRAM_STREAMING_THROTTLE_SECONDS", "3")
+	t.Setenv("PICOCLAW_CHANNELS_TELEGRAM_STREAMING_MIN_GROWTH_CHARS", "120")
+
+	channels := ChannelsConfig{
+		"telegram": {
+			Type:     ChannelTelegram,
+			Enabled:  true,
+			Settings: RawNode(`{"token":"telegram-token"}`),
+		},
+		"pico": {
+			Type:     ChannelPico,
+			Enabled:  true,
+			Settings: RawNode(`{"token":"pico-token"}`),
+		},
+	}
+	if err := InitChannelList(channels); err != nil {
+		t.Fatalf("InitChannelList() error = %v", err)
+	}
+
+	tgDecoded, err := channels["telegram"].GetDecoded()
+	if err != nil {
+		t.Fatalf("telegram GetDecoded() error = %v", err)
+	}
+	tgCfg := tgDecoded.(*TelegramSettings)
+	assert.True(t, tgCfg.Streaming.Enabled)
+	assert.Equal(t, 3, tgCfg.Streaming.ThrottleSeconds)
+	assert.Equal(t, 120, tgCfg.Streaming.MinGrowthChars)
+
+	picoDecoded, err := channels["pico"].GetDecoded()
+	if err != nil {
+		t.Fatalf("pico GetDecoded() error = %v", err)
+	}
+	picoCfg := picoDecoded.(*PicoSettings)
+	assert.False(t, picoCfg.Streaming.Enabled)
+	assert.Equal(t, 0, picoCfg.Streaming.ThrottleSeconds)
+	assert.Equal(t, 0, picoCfg.Streaming.MinGrowthChars)
+}
+
+func TestInitChannelList_RejectsNegativeStreamingDeliveryValues(t *testing.T) {
+	tests := []struct {
+		name        string
+		channelType string
+		settings    string
+	}{
+		{
+			name:        "pico throttle",
+			channelType: ChannelPico,
+			settings:    `{"token":"pico-token","streaming":{"enabled":true,"throttle_seconds":-1}}`,
+		},
+		{
+			name:        "pico growth",
+			channelType: ChannelPico,
+			settings:    `{"token":"pico-token","streaming":{"enabled":true,"min_growth_chars":-1}}`,
+		},
+		{
+			name:        "telegram throttle",
+			channelType: ChannelTelegram,
+			settings:    `{"token":"telegram-token","streaming":{"enabled":true,"throttle_seconds":-1}}`,
+		},
+		{
+			name:        "telegram growth",
+			channelType: ChannelTelegram,
+			settings:    `{"token":"telegram-token","streaming":{"enabled":true,"min_growth_chars":-1}}`,
+		},
+		{
+			name:        "wecom throttle",
+			channelType: ChannelWeCom,
+			settings:    `{"bot_id":"bot-1","streaming":{"enabled":true,"throttle_seconds":-1}}`,
+		},
+		{
+			name:        "wecom growth",
+			channelType: ChannelWeCom,
+			settings:    `{"bot_id":"bot-1","streaming":{"enabled":true,"min_growth_chars":-1}}`,
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			channels := ChannelsConfig{
+				tt.channelType: {
+					Type:     tt.channelType,
+					Enabled:  true,
+					Settings: RawNode(tt.settings),
+				},
+			}
+			err := InitChannelList(channels)
+			if err == nil {
+				t.Fatal("InitChannelList() error = nil, want validation error")
+			}
+			if !strings.Contains(err.Error(), "streaming.") {
+				t.Fatalf("InitChannelList() error = %v, want streaming field error", err)
+			}
+		})
+	}
+}
+
 // ═══════════════════════════════════════════════════
 //  JSON marshal: secure fields masked as [NOT_HERE]
 // ═══════════════════════════════════════════════════
@@ -161,6 +335,22 @@ func TestChannel_JSON_Marshal_SecureMasked(t *testing.T) {
 	assert.Contains(t, string(data), "proxy")
 }

+func TestChannel_JSON_Marshal_OmitsUnconfiguredStreaming(t *testing.T) {
+	ch := Channel{
+		Enabled:  true,
+		Type:     ChannelPico,
+		name:     "pico",
+		Settings: mustParseRawNode(`{"ping_interval":30}`),
+	}
+	var cfg PicoSettings
+	require.NoError(t, ch.Decode(&cfg))
+
+	data, err := json.MarshalIndent(ch, "", "  ")
+	require.NoError(t, err)
+
+	assert.NotContains(t, string(data), `"streaming"`)
+}
+
 // ═══════════════════════════════════════════════════
 //  YAML unmarshal: security.yml — only secure data
 // ═══════════════════════════════════════════════════
@@ -836,6 +836,42 @@ func TestDefaultConfig_Channels(t *testing.T) {
 	}
 }

+func TestDefaultConfig_ChannelStreamingDisabled(t *testing.T) {
+	cfg := DefaultConfig()
+
+	telegram := cfg.Channels.Get(ChannelTelegram)
+	if telegram == nil {
+		t.Fatal("DefaultConfig() missing telegram channel")
+	}
+	decoded, err := telegram.GetDecoded()
+	if err != nil {
+		t.Fatalf("telegram GetDecoded() error = %v", err)
+	}
+	settings, ok := decoded.(*TelegramSettings)
+	if !ok {
+		t.Fatalf("telegram settings type = %T, want *TelegramSettings", decoded)
+	}
+	if settings.Streaming.Enabled {
+		t.Fatal("DefaultConfig().telegram.settings.streaming.enabled should be false")
+	}
+
+	pico := cfg.Channels.Get(ChannelPico)
+	if pico == nil {
+		t.Fatal("DefaultConfig() missing pico channel")
+	}
+	decoded, err = pico.GetDecoded()
+	if err != nil {
+		t.Fatalf("pico GetDecoded() error = %v", err)
+	}
+	picoSettings, ok := decoded.(*PicoSettings)
+	if !ok {
+		t.Fatalf("pico settings type = %T, want *PicoSettings", decoded)
+	}
+	if !picoSettings.Streaming.Enabled {
+		t.Fatal("DefaultConfig().pico.settings.streaming.enabled should be true")
+	}
+}
+
 func TestValidateSingletonChannels_RejectsMultipleInstances(t *testing.T) {
 	channels := ChannelsConfig{
 		"pico1": &Channel{Enabled: true, Type: ChannelPico},
@@ -975,6 +1011,49 @@ func TestSaveConfig_PreservesDisabledTelegramPlaceholder(t *testing.T) {
 	}
 }

+func TestSaveConfig_PreservesExplicitDisabledPicoStreaming(t *testing.T) {
+	tmpDir := t.TempDir()
+	path := filepath.Join(tmpDir, "config.json")
+
+	cfg := DefaultConfig()
+	pico := cfg.Channels.Get(ChannelPico)
+	if pico == nil {
+		t.Fatal("DefaultConfig() missing pico channel")
+	}
+	pico.Settings = RawNode(`{"streaming":{"enabled":false}}`)
+
+	if err := SaveConfig(path, cfg); err != nil {
+		t.Fatalf("SaveConfig failed: %v", err)
+	}
+	data, err := os.ReadFile(path)
+	if err != nil {
+		t.Fatalf("ReadFile failed: %v", err)
+	}
+	if !strings.Contains(string(data), `"streaming"`) || !strings.Contains(string(data), `"enabled": false`) {
+		t.Fatalf("saved config should preserve explicit disabled pico streaming, got:\n%s", string(data))
+	}
+
+	loaded, err := LoadConfig(path)
+	if err != nil {
+		t.Fatalf("LoadConfig failed: %v", err)
+	}
+	loadedPico := loaded.Channels.Get(ChannelPico)
+	if loadedPico == nil {
+		t.Fatal("loaded config missing pico channel")
+	}
+	decoded, err := loadedPico.GetDecoded()
+	if err != nil {
+		t.Fatalf("pico GetDecoded() error = %v", err)
+	}
+	settings, ok := decoded.(*PicoSettings)
+	if !ok {
+		t.Fatalf("pico settings type = %T, want *PicoSettings", decoded)
+	}
+	if settings.Streaming.Enabled {
+		t.Fatal("explicit disabled pico streaming should remain disabled after SaveConfig/LoadConfig round-trip")
+	}
+}
+
 // TestSaveConfig_FiltersVirtualModels verifies that SaveConfig does not write
 // virtual models (generated by expandMultiKeyModels) to the config file.
 func TestSaveConfig_FiltersVirtualModels(t *testing.T) {
@@ -511,7 +511,6 @@ func defaultChannels() ChannelsConfig {
 			"typing":      map[string]any{"enabled": true},
 			"placeholder": map[string]any{"enabled": true, "text": []string{"Thinking... 💭"}},
 			"settings": map[string]any{
-				"streaming":            map[string]any{"enabled": true, "throttle_seconds": 3, "min_growth_chars": 200},
 				"use_markdown_v2":      false,
 				"media_group_delay_ms": 500,
 			},
@@ -566,6 +565,7 @@ func defaultChannels() ChannelsConfig {
 				"read_timeout":    60,
 				"write_timeout":   10,
 				"max_connections": 100,
+				"streaming":       map[string]any{"enabled": true},
 			},
 		},
 		"irc": map[string]any{
@@ -7,6 +7,7 @@ package config

 import (
 	"encoding/json"
+	"reflect"
 	"strings"
 	"sync"
 	"testing"
@@ -144,6 +145,47 @@ func TestGetModelConfig_Concurrent(t *testing.T) {
 	}
 }

+func TestModelConfig_StreamingConfig(t *testing.T) {
+	t.Run("loads streaming enabled", func(t *testing.T) {
+		var cfg ModelConfig
+		err := json.Unmarshal([]byte(`{
+			"model_name": "stream-model",
+			"model": "openai/gpt-5.4",
+			"streaming": {"enabled": true}
+		}`), &cfg)
+		if err != nil {
+			t.Fatalf("Unmarshal() error = %v", err)
+		}
+		if !cfg.Streaming.Enabled {
+			t.Fatal("Streaming.Enabled = false, want true")
+		}
+	})
+
+	t.Run("defaults disabled", func(t *testing.T) {
+		var cfg ModelConfig
+		err := json.Unmarshal([]byte(`{
+			"model_name": "plain-model",
+			"model": "openai/gpt-5.4"
+		}`), &cfg)
+		if err != nil {
+			t.Fatalf("Unmarshal() error = %v", err)
+		}
+		if cfg.Streaming.Enabled {
+			t.Fatal("Streaming.Enabled = true, want false by default")
+		}
+	})
+
+	t.Run("model streaming only has enabled", func(t *testing.T) {
+		typ := reflect.TypeOf(ModelStreamingConfig{})
+		if typ.NumField() != 1 {
+			t.Fatalf("ModelStreamingConfig field count = %d, want 1", typ.NumField())
+		}
+		if _, ok := typ.FieldByName("Enabled"); !ok {
+			t.Fatal("ModelStreamingConfig missing Enabled field")
+		}
+	})
+}
+
 func TestModelConfig_Validate(t *testing.T) {
 	tests := []struct {
 		name    string
@@ -197,6 +197,7 @@ func TestExpandMultiKeyModels_PreservesOtherFields(t *testing.T) {
 		RequestTimeout:      30,
 		ThinkingLevel:       "high",
 		ToolSchemaTransform: "simple",
+		Streaming:           ModelStreamingConfig{Enabled: true},
 	}
 	modelCfg.APIKeys = SimpleSecureStrings("key0", "key1") // Use internal field for multi-key testing
 	models := []*ModelConfig{modelCfg}
@@ -229,6 +230,9 @@ func TestExpandMultiKeyModels_PreservesOtherFields(t *testing.T) {
 	if primary.ToolSchemaTransform != "simple" {
 		t.Errorf("expected tool_schema_transform preserved, got %q", primary.ToolSchemaTransform)
 	}
+	if !primary.Streaming.Enabled {
+		t.Error("expected streaming config preserved on primary")
+	}

 	// Check additional entry also preserves fields
 	additional := result[0]
@@ -244,6 +248,9 @@ func TestExpandMultiKeyModels_PreservesOtherFields(t *testing.T) {
 	if additional.ToolSchemaTransform != "simple" {
 		t.Errorf("expected additional tool_schema_transform preserved, got %q", additional.ToolSchemaTransform)
 	}
+	if !additional.Streaming.Enabled {
+		t.Error("expected streaming config preserved on additional")
+	}
 }

 func TestExpandMultiKeyModels_IsVirtualFlag(t *testing.T) {
@@ -69,7 +69,11 @@ func TestIntegration_RealConfiguredServer(t *testing.T) {
 		t.Fatal("expected at least one discovered tool from real MCP server")
 	}

-	t.Logf("connected to real MCP server via %s with %d tool(s)", config.EffectiveMCPTransportType(serverCfg), len(tools))
+	t.Logf(
+		"connected to real MCP server via %s with %d tool(s)",
+		config.EffectiveMCPTransportType(serverCfg),
+		len(tools),
+	)
 	for _, tool := range tools {
 		if tool != nil {
 			t.Logf("discovered tool: %s", tool.Name)
@@ -15,8 +15,9 @@ import (
 )

 const (
-	geminiDefaultAPIBase = "https://generativelanguage.googleapis.com/v1beta"
-	geminiDefaultModel   = "gemini-2.0-flash"
+	geminiDefaultAPIBase                = "https://generativelanguage.googleapis.com/v1beta"
+	geminiDefaultModel                  = "gemini-2.0-flash"
+	geminiDefaultStreamingReadIdleLimit = 5 * time.Minute
 )

 type GeminiProvider struct {
@@ -114,6 +115,28 @@ func (p *GeminiProvider) ChatStream(
 	model string,
 	options map[string]any,
 	onChunk func(accumulated string),
+) (*LLMResponse, error) {
+	return p.ChatStreamEvents(
+		ctx,
+		messages,
+		tools,
+		model,
+		options,
+		func(chunk StreamChunk) {
+			if onChunk != nil && strings.TrimSpace(chunk.Content) != "" {
+				onChunk(chunk.Content)
+			}
+		},
+	)
+}
+
+func (p *GeminiProvider) ChatStreamEvents(
+	ctx context.Context,
+	messages []Message,
+	tools []ToolDefinition,
+	model string,
+	options map[string]any,
+	onChunk func(StreamChunk),
 ) (*LLMResponse, error) {
 	if p.apiBase == "" {
 		return nil, fmt.Errorf("API base not configured")
@@ -147,7 +170,43 @@ func (p *GeminiProvider) ChatStream(
 		return nil, common.HandleErrorResponse(resp, p.apiBase)
 	}

-	return parseGeminiStreamResponse(ctx, resp.Body, onChunk)
+	return parseGeminiStreamResponse(ctx,
+		withGeminiStreamingReadIdleTimeout(resp.Body,
+			geminiDefaultStreamingReadIdleLimit),
+		onChunk)
+}
+
+func withGeminiStreamingReadIdleTimeout(body io.ReadCloser, timeout time.Duration) io.ReadCloser {
+	if body == nil || timeout <= 0 {
+		return body
+	}
+	return &geminiStreamingReadIdleTimeoutBody{
+		body:    body,
+		timeout: timeout,
+	}
+}
+
+type geminiStreamingReadIdleTimeoutBody struct {
+	body    io.ReadCloser
+	timeout time.Duration
+}
+
+func (b *geminiStreamingReadIdleTimeoutBody) Read(p []byte) (int, error) {
+	timedOut := make(chan struct{})
+	timer := time.AfterFunc(b.timeout, func() {
+		close(timedOut)
+		_ = b.body.Close()
+	})
+	n, err := b.body.Read(p)
+	if !timer.Stop() {
+		<-timedOut
+		return n, fmt.Errorf("gemini stream idle timeout after %s", b.timeout)
+	}
+	return n, err
+}
+
+func (b *geminiStreamingReadIdleTimeoutBody) Close() error {
+	return b.body.Close()
 }

 func (p *GeminiProvider) applyHeaders(req *http.Request) {
@@ -458,7 +517,7 @@ func parseGeminiResponse(resp *geminiGenerateContentResponse) *LLMResponse {
 func parseGeminiStreamResponse(
 	ctx context.Context,
 	reader io.Reader,
-	onChunk func(accumulated string),
+	onChunk func(StreamChunk),
 ) (*LLMResponse, error) {
 	var contentBuilder strings.Builder
 	var reasoningBuilder strings.Builder
@@ -498,10 +557,13 @@ func parseGeminiStreamResponse(
 				if part.Text != "" {
 					if part.Thought {
 						reasoningBuilder.WriteString(part.Text)
+						if onChunk != nil {
+							onChunk(StreamChunk{ReasoningContent: reasoningBuilder.String()})
+						}
 					} else {
 						contentBuilder.WriteString(part.Text)
 						if onChunk != nil {
-							onChunk(contentBuilder.String())
+							onChunk(StreamChunk{Content: contentBuilder.String()})
 						}
 					}
 				}
@@ -3,10 +3,13 @@ package httpapi
 import (
 	"encoding/json"
 	"fmt"
+	"io"
 	"net/http"
 	"net/http/httptest"
 	"strings"
+	"sync"
 	"testing"
+	"time"
 )

 func TestGeminiProvider_ChatSeparatesThoughtAndToolCall(t *testing.T) {
@@ -212,6 +215,109 @@ func TestGeminiProvider_ChatStreamParsesThoughtTextAndToolCalls(t *testing.T) {
 	}
 }

+func TestGeminiProvider_ChatStreamEventsStreamsThoughtBeforeContent(t *testing.T) {
+	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		if !strings.Contains(r.URL.Path, ":streamGenerateContent") {
+			t.Fatalf("path = %s, expected streamGenerateContent endpoint", r.URL.Path)
+		}
+		w.Header().Set("Content-Type", "text/event-stream")
+		flusher, ok := w.(http.Flusher)
+		if !ok {
+			t.Fatal("response writer is not flushable")
+		}
+
+		chunk := map[string]any{
+			"candidates": []any{map[string]any{
+				"content": map[string]any{
+					"parts": []any{
+						map[string]any{"text": "think", "thought": true},
+						map[string]any{"text": "answer"},
+					},
+				},
+				"finishReason": "STOP",
+			}},
+		}
+		raw, err := json.Marshal(chunk)
+		if err != nil {
+			t.Fatalf("marshal chunk: %v", err)
+		}
+		_, _ = fmt.Fprintf(w, "data: %s\n\n", raw)
+		_, _ = fmt.Fprint(w, "data: [DONE]\n\n")
+		flusher.Flush()
+	}))
+	defer server.Close()
+
+	provider := NewGeminiProvider("test-key", server.URL, "", "", 0, nil, nil)
+	events := make([]string, 0)
+	resp, err := provider.ChatStreamEvents(
+		t.Context(),
+		[]Message{{Role: "user", Content: "hello"}},
+		nil,
+		"gemini-2.5-flash",
+		nil,
+		func(chunk StreamChunk) {
+			if chunk.ReasoningContent != "" {
+				events = append(events, "reasoning:"+chunk.ReasoningContent)
+			}
+			if chunk.Content != "" {
+				events = append(events, "content:"+chunk.Content)
+			}
+		},
+	)
+	if err != nil {
+		t.Fatalf("ChatStreamEvents() error = %v", err)
+	}
+	if resp.Content != "answer" {
+		t.Fatalf("Content = %q, want %q", resp.Content, "answer")
+	}
+	if resp.ReasoningContent != "think" {
+		t.Fatalf("ReasoningContent = %q, want %q", resp.ReasoningContent, "think")
+	}
+	want := []string{"reasoning:think", "content:answer"}
+	if len(events) != len(want) {
+		t.Fatalf("events = %#v, want %#v", events, want)
+	}
+	for i := range want {
+		if events[i] != want[i] {
+			t.Fatalf("events = %#v, want %#v", events, want)
+		}
+	}
+}
+
+type geminiBlockingReadCloser struct {
+	closeOnce sync.Once
+	closed    chan struct{}
+}
+
+func newGeminiBlockingReadCloser() *geminiBlockingReadCloser {
+	return &geminiBlockingReadCloser{closed: make(chan struct{})}
+}
+
+func (r *geminiBlockingReadCloser) Read([]byte) (int, error) {
+	<-r.closed
+	return 0, io.ErrClosedPipe
+}
+
+func (r *geminiBlockingReadCloser) Close() error {
+	r.closeOnce.Do(func() {
+		close(r.closed)
+	})
+	return nil
+}
+
+func TestGeminiStreamingReadIdleTimeoutClosesStalledBody(t *testing.T) {
+	body := newGeminiBlockingReadCloser()
+	wrapped := withGeminiStreamingReadIdleTimeout(body, 10*time.Millisecond)
+
+	_, err := wrapped.Read(make([]byte, 1))
+	if err == nil {
+		t.Fatal("expected stalled stream read to return an error")
+	}
+	if !strings.Contains(err.Error(), "gemini stream idle timeout") {
+		t.Fatalf("error = %v, want gemini stream idle timeout", err)
+	}
+}
+
 func TestGeminiProvider_ChatStreamSkipsEmptyDataFrames(t *testing.T) {
 	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		w.Header().Set("Content-Type", "text/event-stream")
@@ -70,6 +70,17 @@ func (p *HTTPProvider) ChatStream(
 	return p.delegate.ChatStream(ctx, messages, tools, model, options, onChunk)
 }

+func (p *HTTPProvider) ChatStreamEvents(
+	ctx context.Context,
+	messages []Message,
+	tools []ToolDefinition,
+	model string,
+	options map[string]any,
+	onChunk func(StreamChunk),
+) (*LLMResponse, error) {
+	return p.delegate.ChatStreamEvents(ctx, messages, tools, model, options, onChunk)
+}
+
 func (p *HTTPProvider) GetDefaultModel() string {
 	return ""
 }
@@ -10,6 +10,7 @@ type (
 	ToolCall               = protocoltypes.ToolCall
 	FunctionCall           = protocoltypes.FunctionCall
 	LLMResponse            = protocoltypes.LLMResponse
+	StreamChunk            = protocoltypes.StreamChunk
 	UsageInfo              = protocoltypes.UsageInfo
 	Message                = protocoltypes.Message
 	ToolDefinition         = protocoltypes.ToolDefinition
@@ -41,3 +42,14 @@ type StreamingProvider interface {
 		onChunk func(accumulated string),
 	) (*LLMResponse, error)
 }
+
+type StreamingEventProvider interface {
+	ChatStreamEvents(
+		ctx context.Context,
+		messages []Message,
+		tools []ToolDefinition,
+		model string,
+		options map[string]any,
+		onChunk func(StreamChunk),
+	) (*LLMResponse, error)
+}
@@ -23,6 +23,7 @@ type (
 	ToolCall               = protocoltypes.ToolCall
 	FunctionCall           = protocoltypes.FunctionCall
 	LLMResponse            = protocoltypes.LLMResponse
+	StreamChunk            = protocoltypes.StreamChunk
 	UsageInfo              = protocoltypes.UsageInfo
 	Message                = protocoltypes.Message
 	ToolDefinition         = protocoltypes.ToolDefinition
@@ -45,7 +46,10 @@ type Provider struct {

 type Option func(*Provider)

-const defaultRequestTimeout = common.DefaultRequestTimeout
+const (
+	defaultRequestTimeout           = common.DefaultRequestTimeout
+	defaultStreamingReadIdleTimeout = 5 * time.Minute
+)

 var stripModelPrefixProviders = map[string]struct{}{
 	"litellm":     {},
@@ -380,6 +384,28 @@ func (p *Provider) ChatStream(
 	model string,
 	options map[string]any,
 	onChunk func(accumulated string),
+) (*LLMResponse, error) {
+	return p.ChatStreamEvents(
+		ctx,
+		messages,
+		tools,
+		model,
+		options,
+		func(chunk StreamChunk) {
+			if onChunk != nil && strings.TrimSpace(chunk.Content) != "" {
+				onChunk(chunk.Content)
+			}
+		},
+	)
+}
+
+func (p *Provider) ChatStreamEvents(
+	ctx context.Context,
+	messages []Message,
+	tools []ToolDefinition,
+	model string,
+	options map[string]any,
+	onChunk func(StreamChunk),
 ) (*LLMResponse, error) {
 	if p.apiBase == "" {
 		return nil, fmt.Errorf("API base not configured")
@@ -422,14 +448,47 @@ func (p *Provider) ChatStream(
 		return nil, common.HandleErrorResponse(resp, p.apiBase)
 	}

-	return parseStreamResponse(ctx, resp.Body, onChunk)
+	return parseStreamResponse(ctx, withStreamingReadIdleTimeout(resp.Body, defaultStreamingReadIdleTimeout), onChunk)
+}
+
+func withStreamingReadIdleTimeout(body io.ReadCloser, timeout time.Duration) io.ReadCloser {
+	if body == nil || timeout <= 0 {
+		return body
+	}
+	return &streamingReadIdleTimeoutBody{
+		body:    body,
+		timeout: timeout,
+	}
+}
+
+type streamingReadIdleTimeoutBody struct {
+	body    io.ReadCloser
+	timeout time.Duration
+}
+
+func (b *streamingReadIdleTimeoutBody) Read(p []byte) (int, error) {
+	timedOut := make(chan struct{})
+	timer := time.AfterFunc(b.timeout, func() {
+		close(timedOut)
+		_ = b.body.Close()
+	})
+	n, err := b.body.Read(p)
+	if !timer.Stop() {
+		<-timedOut
+		return n, fmt.Errorf("stream idle timeout after %s", b.timeout)
+	}
+	return n, err
+}
+
+func (b *streamingReadIdleTimeoutBody) Close() error {
+	return b.body.Close()
 }

 // parseStreamResponse parses an OpenAI-compatible SSE stream.
 func parseStreamResponse(
 	ctx context.Context,
 	reader io.Reader,
-	onChunk func(accumulated string),
+	onChunk func(StreamChunk),
 ) (*LLMResponse, error) {
 	var textContent strings.Builder
 	var reasoningContent strings.Builder
@@ -489,22 +548,29 @@ func parseStreamResponse(

 		choice := chunk.Choices[0]

-		// Accumulate text content
-		if choice.Delta.Content != "" {
-			textContent.WriteString(choice.Delta.Content)
-			if onChunk != nil {
-				onChunk(textContent.String())
-			}
-		}
 		if choice.Delta.ReasoningContent != "" {
 			reasoningContent.WriteString(choice.Delta.ReasoningContent)
+			if onChunk != nil {
+				onChunk(StreamChunk{ReasoningContent: reasoningContent.String()})
+			}
 		}
 		if choice.Delta.Reasoning != "" {
 			reasoning.WriteString(choice.Delta.Reasoning)
+			if onChunk != nil {
+				onChunk(StreamChunk{ReasoningContent: reasoning.String()})
+			}
 		}
 		if len(choice.Delta.ReasoningDetails) > 0 {
 			reasoningDetails = append(reasoningDetails, choice.Delta.ReasoningDetails...)
 		}
+		// Accumulate text content after reasoning so UIs can show thought first
+		// when a provider sends both fields in the same event.
+		if choice.Delta.Content != "" {
+			textContent.WriteString(choice.Delta.Content)
+			if onChunk != nil {
+				onChunk(StreamChunk{Content: textContent.String()})
+			}
+		}

 		// Accumulate tool call deltas
 		for _, tc := range choice.Delta.ToolCalls {
@@ -9,6 +9,7 @@ import (
 	"net/http/httptest"
 	"net/url"
 	"strings"
+	"sync"
 	"testing"
 	"time"

@@ -1318,6 +1319,53 @@ func TestProviderChatStream_ParsesReasoningContent(t *testing.T) {
 	}
 }

+func TestProviderChatStreamEvents_EmitsReasoningBeforeContentFromSameEvent(t *testing.T) {
+	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		w.Header().Set("Content-Type", "text/event-stream")
+		_, _ = w.Write([]byte(
+			"data: {\"choices\":[{\"delta\":{\"reasoning_content\":\"think\",\"content\":\"answer\"},\"finish_reason\":\"stop\"}]}\n\n",
+		))
+		_, _ = w.Write([]byte("data: [DONE]\n\n"))
+	}))
+	defer server.Close()
+
+	p := NewProvider("key", server.URL, "")
+	events := make([]string, 0)
+	out, err := p.ChatStreamEvents(
+		t.Context(),
+		[]Message{{Role: "user", Content: "hi"}},
+		nil,
+		"deepseek-v4-flash",
+		nil,
+		func(chunk StreamChunk) {
+			if chunk.ReasoningContent != "" {
+				events = append(events, "reasoning:"+chunk.ReasoningContent)
+			}
+			if chunk.Content != "" {
+				events = append(events, "content:"+chunk.Content)
+			}
+		},
+	)
+	if err != nil {
+		t.Fatalf("ChatStreamEvents() error = %v", err)
+	}
+	if out.Content != "answer" {
+		t.Fatalf("Content = %q, want %q", out.Content, "answer")
+	}
+	if out.ReasoningContent != "think" {
+		t.Fatalf("ReasoningContent = %q, want %q", out.ReasoningContent, "think")
+	}
+	want := []string{"reasoning:think", "content:answer"}
+	if len(events) != len(want) {
+		t.Fatalf("events = %#v, want %#v", events, want)
+	}
+	for i := range want {
+		if events[i] != want[i] {
+			t.Fatalf("events = %#v, want %#v", events, want)
+		}
+	}
+}
+
 func TestProviderChatStream_ParsesMultilineSSEEvent(t *testing.T) {
 	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		w.Header().Set("Content-Type", "text/event-stream")
@@ -1463,6 +1511,40 @@ func (r *errAfterDataReadCloser) Close() error {
 	return nil
 }

+type blockingReadCloser struct {
+	closeOnce sync.Once
+	closed    chan struct{}
+}
+
+func newBlockingReadCloser() *blockingReadCloser {
+	return &blockingReadCloser{closed: make(chan struct{})}
+}
+
+func (r *blockingReadCloser) Read([]byte) (int, error) {
+	<-r.closed
+	return 0, io.ErrClosedPipe
+}
+
+func (r *blockingReadCloser) Close() error {
+	r.closeOnce.Do(func() {
+		close(r.closed)
+	})
+	return nil
+}
+
+func TestStreamingReadIdleTimeoutClosesStalledBody(t *testing.T) {
+	body := newBlockingReadCloser()
+	wrapped := withStreamingReadIdleTimeout(body, 10*time.Millisecond)
+
+	_, err := wrapped.Read(make([]byte, 1))
+	if err == nil {
+		t.Fatal("expected stalled stream read to return an error")
+	}
+	if !strings.Contains(err.Error(), "stream idle timeout") {
+		t.Fatalf("error = %v, want stream idle timeout", err)
+	}
+}
+
 func TestProvider_FunctionalOptionMaxTokensField(t *testing.T) {
 	p := NewProvider("key", "https://example.com/v1", "", WithMaxTokensField("max_completion_tokens"))
 	if p.maxTokensField != "max_completion_tokens" {
@@ -35,6 +35,11 @@ type LLMResponse struct {
 	ReasoningDetails []ReasoningDetail `json:"reasoning_details"`
 }

+type StreamChunk struct {
+	Content          string
+	ReasoningContent string
+}
+
 type ReasoningDetail struct {
 	Format string `json:"format"`
 	Index  int    `json:"index"`
@@ -67,6 +67,29 @@ func (p *toolSchemaStreamingProvider) ChatStream(
 	return streaming.ChatStream(ctx, messages, transformed, model, options, onChunk)
 }

+func (p *toolSchemaStreamingProvider) ChatStreamEvents(
+	ctx context.Context,
+	messages []Message,
+	tools []ToolDefinition,
+	model string,
+	options map[string]any,
+	onChunk func(StreamChunk),
+) (*LLMResponse, error) {
+	streaming, ok := p.delegate.(StreamingEventProvider)
+	if !ok {
+		return p.ChatStream(ctx, messages, tools, model, options, func(accumulated string) {
+			if onChunk != nil {
+				onChunk(StreamChunk{Content: accumulated})
+			}
+		})
+	}
+	transformed, err := common.TransformToolDefinitions(tools, p.transform)
+	if err != nil {
+		return nil, err
+	}
+	return streaming.ChatStreamEvents(ctx, messages, transformed, model, options, onChunk)
+}
+
 func (p *toolSchemaTransformProvider) SupportsThinking() bool {
 	tc, ok := p.delegate.(ThinkingCapable)
 	return ok && tc.SupportsThinking()
@@ -11,6 +11,7 @@ type (
 	ToolCall               = protocoltypes.ToolCall
 	FunctionCall           = protocoltypes.FunctionCall
 	LLMResponse            = protocoltypes.LLMResponse
+	StreamChunk            = protocoltypes.StreamChunk
 	UsageInfo              = protocoltypes.UsageInfo
 	Message                = protocoltypes.Message
 	ToolDefinition         = protocoltypes.ToolDefinition
@@ -52,6 +53,17 @@ type StreamingProvider interface {
 	) (*LLMResponse, error)
 }

+type StreamingEventProvider interface {
+	ChatStreamEvents(
+		ctx context.Context,
+		messages []Message,
+		tools []ToolDefinition,
+		model string,
+		options map[string]any,
+		onChunk func(StreamChunk),
+	) (*LLMResponse, error)
+}
+
 // ThinkingCapable is an optional interface for providers that support
 // extended thinking (e.g. Anthropic). Used by the agent loop to warn
 // when thinking_level is configured but the active provider cannot use it.
@@ -3,6 +3,7 @@ package api
 import (
 	"encoding/json"
 	"net/http"
+	"reflect"

 	"github.com/sipeed/picoclaw/pkg/config"
 )
@@ -170,6 +171,42 @@ func addChannelCommonConfig(settings map[string]any, bc *config.Channel) {
 	if bc.Placeholder.Enabled || len(bc.Placeholder.Text) > 0 {
 		settings["placeholder"] = bc.Placeholder
 	}
+	if _, exists := settings["streaming"]; !exists {
+		if streaming, ok := channelStreamingConfig(bc); ok {
+			if !streaming.IsZero() {
+				settings["streaming"] = streaming
+			}
+		}
+	}
+}
+
+func channelStreamingConfig(bc *config.Channel) (config.StreamingConfig, bool) {
+	if bc == nil {
+		return config.StreamingConfig{}, false
+	}
+
+	decoded, err := bc.GetDecoded()
+	if err != nil || decoded == nil {
+		return config.StreamingConfig{}, false
+	}
+
+	value := reflect.ValueOf(decoded)
+	if value.Kind() == reflect.Pointer {
+		if value.IsNil() {
+			return config.StreamingConfig{}, false
+		}
+		value = value.Elem()
+	}
+	if value.Kind() != reflect.Struct {
+		return config.StreamingConfig{}, false
+	}
+
+	field := value.FieldByName("Streaming")
+	if !field.IsValid() || !field.CanInterface() {
+		return config.StreamingConfig{}, false
+	}
+	streaming, ok := field.Interface().(config.StreamingConfig)
+	return streaming, ok
 }

 func detectConfiguredSecrets(settings config.RawNode, channelName string) []string {
@@ -147,6 +147,96 @@ func TestHandleGetChannelConfig_ReturnsCommonFieldsWhenSettingsEmpty(t *testing.
 	}
 }

+func TestHandleGetChannelConfig_OmitsUnconfiguredStreaming(t *testing.T) {
+	configPath, cleanup := setupOAuthTestEnv(t)
+	defer cleanup()
+
+	h := NewHandler(configPath)
+	mux := http.NewServeMux()
+	h.RegisterRoutes(mux)
+
+	req := httptest.NewRequest(http.MethodGet, "/api/channels/telegram/config", nil)
+	rec := httptest.NewRecorder()
+	mux.ServeHTTP(rec, req)
+
+	if rec.Code != http.StatusOK {
+		t.Fatalf(
+			"GET /api/channels/telegram/config status = %d, want %d, body=%s",
+			rec.Code,
+			http.StatusOK,
+			rec.Body.String(),
+		)
+	}
+
+	var resp struct {
+		Config map[string]any `json:"config"`
+	}
+	if err := json.Unmarshal(rec.Body.Bytes(), &resp); err != nil {
+		t.Fatalf("json.Unmarshal() error = %v", err)
+	}
+	if _, ok := resp.Config["streaming"]; ok {
+		t.Fatalf("config.streaming = %#v, want omitted when not configured", resp.Config["streaming"])
+	}
+}
+
+func TestHandleGetChannelConfig_ReturnsConfiguredStreaming(t *testing.T) {
+	configPath, cleanup := setupOAuthTestEnv(t)
+	defer cleanup()
+
+	cfg, err := config.LoadConfig(configPath)
+	if err != nil {
+		t.Fatalf("LoadConfig() error = %v", err)
+	}
+	pico := cfg.Channels.Get(config.ChannelPico)
+	if pico == nil {
+		t.Fatal("missing pico channel")
+	}
+	pico.Settings = config.RawNode(`{"streaming":{"enabled":true,"throttle_seconds":2,"min_growth_chars":80}}`)
+	if err := config.InitChannelList(cfg.Channels); err != nil {
+		t.Fatalf("InitChannelList() error = %v", err)
+	}
+	if err := config.SaveConfig(configPath, cfg); err != nil {
+		t.Fatalf("SaveConfig() error = %v", err)
+	}
+
+	h := NewHandler(configPath)
+	mux := http.NewServeMux()
+	h.RegisterRoutes(mux)
+
+	req := httptest.NewRequest(http.MethodGet, "/api/channels/pico/config", nil)
+	rec := httptest.NewRecorder()
+	mux.ServeHTTP(rec, req)
+
+	if rec.Code != http.StatusOK {
+		t.Fatalf(
+			"GET /api/channels/pico/config status = %d, want %d, body=%s",
+			rec.Code,
+			http.StatusOK,
+			rec.Body.String(),
+		)
+	}
+
+	var resp struct {
+		Config map[string]any `json:"config"`
+	}
+	if err := json.Unmarshal(rec.Body.Bytes(), &resp); err != nil {
+		t.Fatalf("json.Unmarshal() error = %v", err)
+	}
+	streaming, ok := resp.Config["streaming"].(map[string]any)
+	if !ok {
+		t.Fatalf("config.streaming = %#v, want object", resp.Config["streaming"])
+	}
+	if got := streaming["enabled"]; got != true {
+		t.Fatalf("config.streaming.enabled = %#v, want true", got)
+	}
+	if got := streaming["throttle_seconds"]; got != float64(2) {
+		t.Fatalf("config.streaming.throttle_seconds = %#v, want 2", got)
+	}
+	if got := streaming["min_growth_chars"]; got != float64(80) {
+		t.Fatalf("config.streaming.min_growth_chars = %#v, want 80", got)
+	}
+}
+
 func TestHandleGetChannelConfig_ReturnsDefaultShapeForMissingChannel(t *testing.T) {
 	configPath, cleanup := setupOAuthTestEnv(t)
 	defer cleanup()
@@ -290,6 +290,19 @@ func validateConfig(cfg *config.Config) []string {
 		errs = append(errs, fmt.Sprintf("gateway.port %d is out of valid range (1-65535)", cfg.Gateway.Port))
 	}

+	for name, bc := range cfg.Channels {
+		streaming, ok := channelStreamingConfig(bc)
+		if !ok {
+			continue
+		}
+		if streaming.ThrottleSeconds < 0 {
+			errs = append(errs, fmt.Sprintf("channel %q streaming.throttle_seconds must be >= 0", name))
+		}
+		if streaming.MinGrowthChars < 0 {
+			errs = append(errs, fmt.Sprintf("channel %q streaming.min_growth_chars must be >= 0", name))
+		}
+	}
+
 	// Pico channel: token required when enabled
 	{
 		bc := cfg.Channels.GetByType(config.ChannelPico)
@@ -446,6 +446,38 @@ func TestHandlePatchConfig_RejectsInvalidChannelArrayFields(t *testing.T) {
 	}
 }

+func TestHandlePatchConfig_RejectsNegativeStreamingDeliveryValues(t *testing.T) {
+	configPath, cleanup := setupOAuthTestEnv(t)
+	defer cleanup()
+
+	h := NewHandler(configPath)
+	mux := http.NewServeMux()
+	h.RegisterRoutes(mux)
+
+	req := httptest.NewRequest(http.MethodPatch, "/api/config", bytes.NewBufferString(`{
+		"channel_list": {
+			"pico": {
+				"settings": {
+					"streaming": {
+						"enabled": true,
+						"throttle_seconds": -1
+					}
+				}
+			}
+		}
+	}`))
+	req.Header.Set("Content-Type", "application/json")
+
+	rec := httptest.NewRecorder()
+	mux.ServeHTTP(rec, req)
+	if rec.Code != http.StatusBadRequest {
+		t.Fatalf("PATCH /api/config status = %d, want %d, body=%s", rec.Code, http.StatusBadRequest, rec.Body.String())
+	}
+	if !strings.Contains(rec.Body.String(), "streaming.throttle_seconds") {
+		t.Fatalf("response body = %q, want streaming.throttle_seconds validation error", rec.Body.String())
+	}
+}
+
 func TestHandlePatchConfig_ClearingAllowFromDoesNotLeaveEmptyStringItem(t *testing.T) {
 	configPath, cleanup := setupOAuthTestEnv(t)
 	defer cleanup()
@@ -25,6 +25,7 @@ import (
 	"github.com/sipeed/picoclaw/pkg/logger"
 	"github.com/sipeed/picoclaw/pkg/netbind"
 	ppid "github.com/sipeed/picoclaw/pkg/pid"
+	"github.com/sipeed/picoclaw/pkg/providers"
 	"github.com/sipeed/picoclaw/web/backend/utils"
 )

@@ -413,6 +414,10 @@ func computeConfigSignature(cfg *config.Config) string {
 	if defaultModel != "" {
 		parts = append(parts, "model:"+defaultModel)
 	}
+	modelStreamingSignatures := computeModelStreamingSignatures(cfg)
+	if len(modelStreamingSignatures) > 0 {
+		parts = append(parts, "model_streaming:"+strings.Join(modelStreamingSignatures, ","))
+	}
 	toolSignatures := []string{}
 	if cfg.Tools.ReadFile.Enabled {
 		toolSignatures = append(toolSignatures, "read_file")
@@ -491,6 +496,152 @@ func computeConfigSignature(cfg *config.Config) string {
 	return strings.Join(parts, ";")
 }

+func computeModelStreamingSignatures(cfg *config.Config) []string {
+	if cfg == nil {
+		return nil
+	}
+	defaultProvider := strings.TrimSpace(cfg.Agents.Defaults.Provider)
+	if defaultProvider == "" {
+		defaultProvider = "openai"
+	}
+	names := []string{strings.TrimSpace(cfg.Agents.Defaults.GetModelName())}
+	names = append(names, cfg.Agents.Defaults.ModelFallbacks...)
+	if cfg.Agents.Defaults.Routing != nil {
+		names = append(names, cfg.Agents.Defaults.Routing.LightModel)
+	}
+	for _, agent := range cfg.Agents.List {
+		if agent.Model == nil {
+			continue
+		}
+		names = append(names, agent.Model.Primary)
+		names = append(names, agent.Model.Fallbacks...)
+	}
+
+	seenNames := make(map[string]bool)
+	seenEntries := make(map[string]bool)
+	signatures := make([]string, 0, len(names))
+	for _, name := range names {
+		name = strings.TrimSpace(name)
+		if name == "" || seenNames[name] {
+			continue
+		}
+		seenNames[name] = true
+		for _, match := range modelConfigsMatchingSignatureRef(cfg.ModelList, name, defaultProvider) {
+			mc := match.model
+			entry := strings.Join([]string{
+				name,
+				strconv.Itoa(match.index),
+				strings.TrimSpace(mc.Provider),
+				strings.TrimSpace(mc.Model),
+				strconv.FormatBool(mc.Streaming.Enabled),
+			}, ":")
+			if seenEntries[entry] {
+				continue
+			}
+			seenEntries[entry] = true
+			signatures = append(signatures, entry)
+		}
+	}
+	sort.Strings(signatures)
+	return signatures
+}
+
+type signatureModelConfigMatch struct {
+	index int
+	model *config.ModelConfig
+}
+
+func modelConfigsMatchingSignatureRef(
+	modelList []*config.ModelConfig,
+	raw string,
+	defaultProvider string,
+) []signatureModelConfigMatch {
+	raw = strings.TrimSpace(raw)
+	if raw == "" {
+		return nil
+	}
+	matches := make([]signatureModelConfigMatch, 0, 1)
+	for i, mc := range modelList {
+		if mc == nil || strings.TrimSpace(mc.ModelName) != raw {
+			continue
+		}
+		matches = append(matches, signatureModelConfigMatch{index: i, model: mc})
+	}
+	if len(matches) > 0 {
+		return matches
+	}
+	for i, mc := range modelList {
+		if mc == nil || strings.TrimSpace(mc.Model) != raw {
+			continue
+		}
+		return []signatureModelConfigMatch{{index: i, model: mc}}
+	}
+	for i, mc := range modelList {
+		if modelConfigMatchesBareRef(mc, raw, defaultProvider) {
+			return []signatureModelConfigMatch{{index: i, model: mc}}
+		}
+	}
+
+	rawRef := providers.ParseModelRef(raw, "")
+	rawHasProvider := rawRef != nil && hasUnambiguousProviderPrefix(raw) &&
+		strings.TrimSpace(rawRef.Provider) != "" && strings.TrimSpace(rawRef.Model) != ""
+	if rawHasProvider {
+		for i, mc := range modelList {
+			if modelConfigMatchesProviderRef(mc, raw) {
+				return []signatureModelConfigMatch{{index: i, model: mc}}
+			}
+		}
+	}
+	return nil
+}
+
+func hasUnambiguousProviderPrefix(raw string) bool {
+	provider, _, found := strings.Cut(strings.TrimSpace(raw), "/")
+	if !found {
+		return false
+	}
+	provider = strings.ToLower(strings.TrimSpace(provider))
+	if provider == "" {
+		return false
+	}
+	normalizedProvider := providers.NormalizeProvider(provider)
+	if !providers.IsSupportedModelProvider(normalizedProvider) {
+		return false
+	}
+	return true
+}
+
+func modelConfigMatchesProviderRef(mc *config.ModelConfig, raw string) bool {
+	if mc == nil {
+		return false
+	}
+	raw = strings.TrimSpace(raw)
+	if raw == "" {
+		return false
+	}
+	rawRef := providers.ParseModelRef(raw, "")
+	if rawRef == nil || strings.TrimSpace(rawRef.Provider) == "" || strings.TrimSpace(rawRef.Model) == "" {
+		return false
+	}
+	protocol, modelID := providers.ExtractProtocol(mc)
+	return providers.ModelKey(protocol, modelID) == providers.ModelKey(rawRef.Provider, rawRef.Model)
+}
+
+func modelConfigMatchesBareRef(mc *config.ModelConfig, raw string, defaultProvider string) bool {
+	if mc == nil {
+		return false
+	}
+	raw = strings.TrimSpace(raw)
+	if raw == "" {
+		return false
+	}
+	protocol, modelID := providers.ExtractProtocol(mc)
+	if strings.TrimSpace(modelID) != raw {
+		return false
+	}
+	return providers.NormalizeProvider(protocol) == providers.NormalizeProvider(defaultProvider)
+}
+
 func computeChannelSignatures(channels config.ChannelsConfig) []string {
 	if len(channels) == 0 {
 		return nil
@@ -1267,6 +1267,528 @@ func TestGatewayStatusRequiresRestartAfterChannelChange(t *testing.T) {
 	}
 }

+func TestGatewayStatusRequiresRestartAfterDefaultModelStreamingChange(t *testing.T) {
+	resetGatewayTestState(t)
+
+	configPath := filepath.Join(t.TempDir(), "config.json")
+	cfg := config.DefaultConfig()
+	cfg.Agents.Defaults.ModelName = cfg.ModelList[0].ModelName
+	cfg.ModelList[0].SetAPIKey("test-key")
+	cfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: false}
+	if err := config.SaveConfig(configPath, cfg); err != nil {
+		t.Fatalf("SaveConfig() error = %v", err)
+	}
+
+	h := NewHandler(configPath)
+	mux := http.NewServeMux()
+	h.RegisterRoutes(mux)
+
+	process, err := os.FindProcess(os.Getpid())
+	if err != nil {
+		t.Fatalf("FindProcess() error = %v", err)
+	}
+
+	bootSignature := computeConfigSignature(cfg)
+	gateway.mu.Lock()
+	gateway.cmd = &exec.Cmd{Process: process}
+	gateway.bootDefaultModel = cfg.ModelList[0].ModelName
+	gateway.bootConfigSignature = bootSignature
+	setGatewayRuntimeStatusLocked("running")
+	gateway.mu.Unlock()
+
+	updatedCfg, err := config.LoadConfig(configPath)
+	if err != nil {
+		t.Fatalf("LoadConfig() error = %v", err)
+	}
+	updatedCfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: true}
+	if err := config.SaveConfig(configPath, updatedCfg); err != nil {
+		t.Fatalf("SaveConfig() error = %v", err)
+	}
+
+	gatewayHealthGet = func(string, time.Duration) (*http.Response, error) {
+		return mockGatewayHealthResponse(http.StatusOK, os.Getpid()), nil
+	}
+
+	rec := httptest.NewRecorder()
+	req := httptest.NewRequest(http.MethodGet, "/api/gateway/status", nil)
+	mux.ServeHTTP(rec, req)
+
+	if rec.Code != http.StatusOK {
+		t.Fatalf("status = %d, want %d", rec.Code, http.StatusOK)
+	}
+
+	var body map[string]any
+	if err := json.Unmarshal(rec.Body.Bytes(), &body); err != nil {
+		t.Fatalf("unmarshal response: %v", err)
+	}
+
+	if got := body["gateway_status"]; got != "running" {
+		t.Fatalf("gateway_status = %#v, want %q", got, "running")
+	}
+	if got := body["gateway_restart_required"]; got != true {
+		t.Fatalf("gateway_restart_required = %#v, want true", got)
+	}
+}
+
+func TestConfigSignatureIncludesModelStreamingForDefaultModelRef(t *testing.T) {
+	cfg := config.DefaultConfig()
+	cfg.ModelList[0].ModelName = "friendly-alias"
+	cfg.ModelList[0].Provider = ""
+	cfg.ModelList[0].Model = "openai/gpt-4o-ref"
+	cfg.Agents.Defaults.ModelName = "openai/gpt-4o-ref"
+	cfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: false}
+
+	before := computeConfigSignature(cfg)
+
+	cfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: true}
+	after := computeConfigSignature(cfg)
+
+	if before == after {
+		t.Fatal("config signature should change when streaming changes for a default model referenced by model ref")
+	}
+}
+
+func TestConfigSignatureIncludesModelStreamingForLoadBalancedAliasEntries(t *testing.T) {
+	cfg := config.DefaultConfig()
+	cfg.ModelList = []*config.ModelConfig{
+		{
+			ModelName: "lb-alias",
+			Model:     "openai/gpt-4o-primary",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+		{
+			ModelName: "lb-alias",
+			Model:     "openai/gpt-4o-secondary",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+	}
+	cfg.Agents.Defaults.ModelName = "lb-alias"
+
+	before := computeConfigSignature(cfg)
+
+	cfg.ModelList[1].Streaming = config.ModelStreamingConfig{Enabled: true}
+	after := computeConfigSignature(cfg)
+
+	if before == after {
+		t.Fatal("config signature should change when streaming changes for a load-balanced alias entry")
+	}
+}
+
+func TestConfigSignatureIncludesSlashModelIDForDefaultProvider(t *testing.T) {
+	cfg := config.DefaultConfig()
+	cfg.ModelList = []*config.ModelConfig{
+		{
+			ModelName: "nvidia-model",
+			Provider:  "nvidia",
+			Model:     "z-ai/glm-5.1",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+	}
+	cfg.Agents.Defaults.Provider = "nvidia"
+	cfg.Agents.Defaults.ModelName = "z-ai/glm-5.1"
+
+	before := computeConfigSignature(cfg)
+
+	cfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: true}
+	after := computeConfigSignature(cfg)
+
+	if before == after {
+		t.Fatal(
+			"config signature should change when streaming changes for a slash-containing model id on the default provider",
+		)
+	}
+}
+
+func TestConfigSignatureIncludesSupportedPrefixSlashModelIDForDefaultProvider(t *testing.T) {
+	cfg := config.DefaultConfig()
+	cfg.ModelList = []*config.ModelConfig{
+		{
+			ModelName: "openrouter-openai",
+			Provider:  "openrouter",
+			Model:     "openai/gpt-4o",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+	}
+	cfg.Agents.Defaults.Provider = "openrouter"
+	cfg.Agents.Defaults.ModelName = "openai/gpt-4o"
+
+	before := computeConfigSignature(cfg)
+
+	cfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: true}
+	after := computeConfigSignature(cfg)
+
+	if before == after {
+		t.Fatal(
+			"config signature should change when streaming changes for a supported-prefix slash model id on the default provider",
+		)
+	}
+}
+
+func TestConfigSignatureIncludesLegacyDefaultProviderPrefixedSlashModelID(t *testing.T) {
+	cfg := config.DefaultConfig()
+	cfg.ModelList = []*config.ModelConfig{
+		{
+			ModelName: "legacy-openrouter-openai",
+			Model:     "openrouter/openai/gpt-4o",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+	}
+	cfg.Agents.Defaults.Provider = "openrouter"
+	cfg.Agents.Defaults.ModelName = "openai/gpt-4o"
+
+	before := computeConfigSignature(cfg)
+
+	cfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: true}
+	after := computeConfigSignature(cfg)
+
+	if before == after {
+		t.Fatal(
+			"config signature should change when streaming changes for a legacy default-provider prefixed slash model id",
+		)
+	}
+}
+
+func TestConfigSignatureIncludesSlashModelIDWithoutProviderFieldForDefaultProvider(t *testing.T) {
+	cfg := config.DefaultConfig()
+	cfg.ModelList = []*config.ModelConfig{
+		{
+			ModelName: "nvidia-model",
+			Model:     "z-ai/glm-5.1",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+	}
+	cfg.Agents.Defaults.Provider = "nvidia"
+	cfg.Agents.Defaults.ModelName = "z-ai/glm-5.1"
+
+	before := computeConfigSignature(cfg)
+
+	cfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: true}
+	after := computeConfigSignature(cfg)
+
+	if before == after {
+		t.Fatal(
+			"config signature should change when streaming changes for a default-provider slash model id without provider field",
+		)
+	}
+}
+
+func TestConfigSignatureIncludesUnknownSlashPrefixModelIDWithoutProviderFieldForDefaultProvider(t *testing.T) {
+	cfg := config.DefaultConfig()
+	cfg.ModelList = []*config.ModelConfig{
+		{
+			ModelName: "nvidia-meta",
+			Model:     "meta/llama-3.1-8b",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+	}
+	cfg.Agents.Defaults.Provider = "nvidia"
+	cfg.Agents.Defaults.ModelName = "meta/llama-3.1-8b"
+
+	before := computeConfigSignature(cfg)
+
+	cfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: true}
+	after := computeConfigSignature(cfg)
+
+	if before == after {
+		t.Fatal(
+			"config signature should change when streaming changes for unknown-prefix default-provider slash model id",
+		)
+	}
+}
+
+func TestConfigSignatureDashAliasSlashModelIDMatchesProviderAlias(t *testing.T) {
+	cfg := config.DefaultConfig()
+	cfg.ModelList = []*config.ModelConfig{
+		{
+			ModelName: "zai-model",
+			Provider:  "zai",
+			Model:     "glm-5.1",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+	}
+	cfg.Agents.Defaults.Provider = "nvidia"
+	cfg.Agents.Defaults.ModelName = "z-ai/glm-5.1"
+
+	before := computeConfigSignature(cfg)
+
+	cfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: true}
+	after := computeConfigSignature(cfg)
+
+	if before == after {
+		t.Fatal("config signature should change when a dash-alias slash ref matches a provider alias")
+	}
+}
+
+func TestConfigSignatureDashAliasSlashModelIDMatchesProviderAliasWithOpenAIDefault(t *testing.T) {
+	cfg := config.DefaultConfig()
+	cfg.ModelList = []*config.ModelConfig{
+		{
+			ModelName: "zai-model",
+			Provider:  "zai",
+			Model:     "glm-5.1",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+	}
+	cfg.Agents.Defaults.Provider = "openai"
+	cfg.Agents.Defaults.ModelName = "z-ai/glm-5.1"
+
+	before := computeConfigSignature(cfg)
+
+	cfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: true}
+	after := computeConfigSignature(cfg)
+
+	if before == after {
+		t.Fatal(
+			"config signature should change when a dash-alias slash ref matches a provider alias with OpenAI default",
+		)
+	}
+}
+
+func TestConfigSignatureProviderAliasRefIgnoresDefaultProvider(t *testing.T) {
+	cfg := config.DefaultConfig()
+	cfg.ModelList = []*config.ModelConfig{
+		{
+			ModelName: "openai-gpt",
+			Provider:  "openai",
+			Model:     "gpt-4o",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+	}
+	cfg.Agents.Defaults.Provider = "nvidia"
+	cfg.Agents.Defaults.ModelName = "gpt/gpt-4o"
+
+	before := computeConfigSignature(cfg)
+
+	cfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: true}
+	after := computeConfigSignature(cfg)
+
+	if before == after {
+		t.Fatal("config signature should change for a provider alias ref even when default provider differs")
+	}
+}
+
+func TestConfigSignatureExplicitProviderRefIgnoresDefaultProvider(t *testing.T) {
+	cfg := config.DefaultConfig()
+	cfg.ModelList = []*config.ModelConfig{
+		{
+			ModelName: "openai-gpt",
+			Provider:  "openai",
+			Model:     "gpt-4o",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+	}
+	cfg.Agents.Defaults.Provider = "nvidia"
+	cfg.Agents.Defaults.ModelName = "openai/gpt-4o"
+
+	before := computeConfigSignature(cfg)
+
+	cfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: true}
+	after := computeConfigSignature(cfg)
+
+	if before == after {
+		t.Fatal("config signature should change for an explicit provider ref even when default provider differs")
+	}
+}
+
+func TestConfigSignatureExactModelNameTakesPrecedenceOverResolvedRefs(t *testing.T) {
+	tests := []struct {
+		name                  string
+		defaultProvider       string
+		defaultModelName      string
+		models                []*config.ModelConfig
+		shadowedEntryIndex    int
+		exactModelNameIndex   int
+		shadowedChangeMessage string
+		exactChangeMessage    string
+	}{
+		{
+			name:             "slash model name shadows explicit provider ref",
+			defaultProvider:  "nvidia",
+			defaultModelName: "openai/gpt-4o",
+			models: []*config.ModelConfig{
+				{
+					ModelName: "openai/gpt-4o",
+					Provider:  "nvidia",
+					Model:     "openai/gpt-4o",
+					Streaming: config.ModelStreamingConfig{Enabled: false},
+				},
+				{
+					ModelName: "openai-gpt",
+					Provider:  "openai",
+					Model:     "gpt-4o",
+					Streaming: config.ModelStreamingConfig{Enabled: false},
+				},
+			},
+			shadowedEntryIndex:    1,
+			exactModelNameIndex:   0,
+			shadowedChangeMessage: "config signature should not change when an exact slash model_name shadows an explicit provider ref",
+			exactChangeMessage:    "config signature should change when the exact slash model_name entry changes",
+		},
+		{
+			name:             "bare model name shadows default provider model id",
+			defaultProvider:  "openai",
+			defaultModelName: "gpt-4o",
+			models: []*config.ModelConfig{
+				{
+					ModelName: "gpt-4o",
+					Provider:  "anthropic",
+					Model:     "claude-sonnet",
+					Streaming: config.ModelStreamingConfig{Enabled: false},
+				},
+				{
+					ModelName: "openai-gpt",
+					Provider:  "openai",
+					Model:     "gpt-4o",
+					Streaming: config.ModelStreamingConfig{Enabled: false},
+				},
+			},
+			shadowedEntryIndex:    1,
+			exactModelNameIndex:   0,
+			shadowedChangeMessage: "config signature should not change when an exact bare model_name shadows a default-provider model id",
+			exactChangeMessage:    "config signature should change when the exact bare model_name entry changes",
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			cfg := config.DefaultConfig()
+			cfg.ModelList = tt.models
+			cfg.Agents.Defaults.Provider = tt.defaultProvider
+			cfg.Agents.Defaults.ModelName = tt.defaultModelName
+
+			before := computeConfigSignature(cfg)
+
+			cfg.ModelList[tt.shadowedEntryIndex].Streaming = config.ModelStreamingConfig{Enabled: true}
+			afterShadowedChange := computeConfigSignature(cfg)
+
+			if before != afterShadowedChange {
+				t.Fatal(tt.shadowedChangeMessage)
+			}
+
+			cfg.ModelList[tt.exactModelNameIndex].Streaming = config.ModelStreamingConfig{Enabled: true}
+			afterExactModelNameChange := computeConfigSignature(cfg)
+
+			if before == afterExactModelNameChange {
+				t.Fatal(tt.exactChangeMessage)
+			}
+		})
+	}
+}
+
+func TestConfigSignatureIncludesLoadBalancedDuplicateEntryIndex(t *testing.T) {
+	cfg := config.DefaultConfig()
+	cfg.ModelList = []*config.ModelConfig{
+		{
+			ModelName: "lb-alias",
+			Provider:  "openai",
+			Model:     "gpt-4o",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+		{
+			ModelName: "lb-alias",
+			Provider:  "openai",
+			Model:     "gpt-4o",
+			Streaming: config.ModelStreamingConfig{Enabled: true},
+		},
+	}
+	cfg.Agents.Defaults.ModelName = "lb-alias"
+
+	before := computeConfigSignature(cfg)
+
+	cfg.ModelList[0].Streaming.Enabled = true
+	cfg.ModelList[1].Streaming.Enabled = false
+	after := computeConfigSignature(cfg)
+
+	if before == after {
+		t.Fatal("config signature should change when duplicate load-balanced entries swap streaming state")
+	}
+}
+
+func TestConfigSignatureProviderDotAliasRefIgnoresDefaultProvider(t *testing.T) {
+	cfg := config.DefaultConfig()
+	cfg.ModelList = []*config.ModelConfig{
+		{
+			ModelName: "zai-model",
+			Provider:  "zai",
+			Model:     "glm-5.1",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+	}
+	cfg.Agents.Defaults.Provider = "nvidia"
+	cfg.Agents.Defaults.ModelName = "z.ai/glm-5.1"
+
+	before := computeConfigSignature(cfg)
+
+	cfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: true}
+	after := computeConfigSignature(cfg)
+
+	if before == after {
+		t.Fatal(
+			"config signature should change for an explicit dot-alias provider ref even when default provider differs",
+		)
+	}
+}
+
+func TestConfigSignatureIncludesDefaultProviderPrefixedRefWithSplitConfig(t *testing.T) {
+	cfg := config.DefaultConfig()
+	cfg.ModelList = []*config.ModelConfig{
+		{
+			ModelName: "openai-split",
+			Provider:  "openai",
+			Model:     "gpt-4o",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+	}
+	cfg.Agents.Defaults.Provider = "openai"
+	cfg.Agents.Defaults.ModelName = "openai/gpt-4o"
+
+	before := computeConfigSignature(cfg)
+
+	cfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: true}
+	after := computeConfigSignature(cfg)
+
+	if before == after {
+		t.Fatal(
+			"config signature should change when streaming changes for default-provider prefixed ref with split config",
+		)
+	}
+}
+
+func TestConfigSignatureBareModelRefUsesExactModelBeforeDefaultProviderModelID(t *testing.T) {
+	cfg := config.DefaultConfig()
+	cfg.ModelList = []*config.ModelConfig{
+		{
+			ModelName: "azure-alias",
+			Provider:  "azure",
+			Model:     "gpt-4o",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+		{
+			ModelName: "openai-alias",
+			Model:     "openai/gpt-4o",
+			Streaming: config.ModelStreamingConfig{Enabled: false},
+		},
+	}
+	cfg.Agents.Defaults.Provider = "openai"
+	cfg.Agents.Defaults.ModelName = "gpt-4o"
+
+	before := computeConfigSignature(cfg)
+
+	cfg.ModelList[0].Streaming = config.ModelStreamingConfig{Enabled: true}
+	afterExactModelChange := computeConfigSignature(cfg)
+
+	if before == afterExactModelChange {
+		t.Fatal("config signature should change when the exact bare model entry changes streaming")
+	}
+
+	cfg.ModelList[1].Streaming = config.ModelStreamingConfig{Enabled: true}
+	afterDefaultProviderModelChange := computeConfigSignature(cfg)
+
+	if afterExactModelChange != afterDefaultProviderModelChange {
+		t.Fatal("config signature should not change when a shadowed default-provider model id changes streaming")
+	}
+}
+
 func TestGatewayStatusRequiresRestartAfterWebSearchConfigChange(t *testing.T) {
 	resetGatewayTestState(t)

@@ -34,13 +34,13 @@ var fetchableProviders = map[string]bool{
 // registerModelRoutes binds model list management endpoints to the ServeMux.
 func (h *Handler) registerModelRoutes(mux *http.ServeMux) {
 	mux.HandleFunc("GET /api/models", h.handleListModels)
+	mux.HandleFunc("POST /api/models/fetch", h.handleFetchModels)
+	mux.HandleFunc("GET /api/models/catalog", h.handleListCatalogs)
+	mux.HandleFunc("DELETE /api/models/catalog/{id}", h.handleDeleteCatalog)
 	mux.HandleFunc("POST /api/models", h.handleAddModel)
 	mux.HandleFunc("POST /api/models/default", h.handleSetDefaultModel)
 	mux.HandleFunc("PUT /api/models/{index}", h.handleUpdateModel)
 	mux.HandleFunc("DELETE /api/models/{index}", h.handleDeleteModel)
-	mux.HandleFunc("POST /api/models/fetch", h.handleFetchModels)
-	mux.HandleFunc("GET /api/models/catalog", h.handleListCatalogs)
-	mux.HandleFunc("DELETE /api/models/catalog/{id}", h.handleDeleteCatalog)
 	mux.HandleFunc("POST /api/models/{index}/test", h.handleTestModel)
 	mux.HandleFunc("POST /api/models/test-inline", h.handleTestInlineModel)
 }
@@ -57,15 +57,16 @@ type modelResponse struct {
 	Proxy      string `json:"proxy,omitempty"`
 	AuthMethod string `json:"auth_method,omitempty"`
 	// Advanced fields
-	ConnectMode         string            `json:"connect_mode,omitempty"`
-	Workspace           string            `json:"workspace,omitempty"`
-	RPM                 int               `json:"rpm,omitempty"`
-	MaxTokensField      string            `json:"max_tokens_field,omitempty"`
-	RequestTimeout      int               `json:"request_timeout,omitempty"`
-	ThinkingLevel       string            `json:"thinking_level,omitempty"`
-	ToolSchemaTransform string            `json:"tool_schema_transform,omitempty"`
-	ExtraBody           map[string]any    `json:"extra_body,omitempty"`
-	CustomHeaders       map[string]string `json:"custom_headers,omitempty"`
+	ConnectMode         string                      `json:"connect_mode,omitempty"`
+	Workspace           string                      `json:"workspace,omitempty"`
+	RPM                 int                         `json:"rpm,omitempty"`
+	MaxTokensField      string                      `json:"max_tokens_field,omitempty"`
+	RequestTimeout      int                         `json:"request_timeout,omitempty"`
+	ThinkingLevel       string                      `json:"thinking_level,omitempty"`
+	ToolSchemaTransform string                      `json:"tool_schema_transform,omitempty"`
+	Streaming           config.ModelStreamingConfig `json:"streaming,omitempty"`
+	ExtraBody           map[string]any              `json:"extra_body,omitempty"`
+	CustomHeaders       map[string]string           `json:"custom_headers,omitempty"`
 	// Meta
 	Enabled             bool   `json:"enabled"`
 	Available           bool   `json:"available"`
@@ -293,6 +294,7 @@ func (h *Handler) handleListModels(w http.ResponseWriter, r *http.Request) {
 			RequestTimeout:      m.RequestTimeout,
 			ThinkingLevel:       m.ThinkingLevel,
 			ToolSchemaTransform: m.ToolSchemaTransform,
+			Streaming:           m.Streaming,
 			ExtraBody:           m.ExtraBody,
 			CustomHeaders:       m.CustomHeaders,
 			Enabled:             m.Enabled,
@@ -441,6 +443,9 @@ func (h *Handler) handleUpdateModel(w http.ResponseWriter, r *http.Request) {
 	if _, ok := rawFields["tool_schema_transform"]; !ok {
 		mc.ToolSchemaTransform = cfg.ModelList[idx].ToolSchemaTransform
 	}
+	if _, ok := rawFields["streaming"]; !ok {
+		mc.Streaming = cfg.ModelList[idx].Streaming
+	}
 	// Preserve the existing Provider when the caller omits it. This keeps the
 	// update API backward-compatible for clients that haven't started sending
 	// the new field yet, while still allowing explicit clearing via "".
@@ -762,6 +762,51 @@ func TestHandleAddModel_PersistsProvider(t *testing.T) {
 	}
 }

+func TestHandleListModels_ReturnsStreamingConfig(t *testing.T) {
+	configPath, cleanup := setupOAuthTestEnv(t)
+	defer cleanup()
+
+	cfg, err := config.LoadConfig(configPath)
+	if err != nil {
+		t.Fatalf("LoadConfig() error = %v", err)
+	}
+	cfg.ModelList = []*config.ModelConfig{{
+		ModelName: "streaming-model",
+		Provider:  "openai",
+		Model:     "gpt-4o-mini",
+		APIKeys:   config.SimpleSecureStrings("sk-existing"),
+		Streaming: config.ModelStreamingConfig{Enabled: true},
+	}}
+	if err = config.SaveConfig(configPath, cfg); err != nil {
+		t.Fatalf("SaveConfig() error = %v", err)
+	}
+
+	h := NewHandler(configPath)
+	mux := http.NewServeMux()
+	h.RegisterRoutes(mux)
+
+	rec := httptest.NewRecorder()
+	req := httptest.NewRequest(http.MethodGet, "/api/models", nil)
+	mux.ServeHTTP(rec, req)
+
+	if rec.Code != http.StatusOK {
+		t.Fatalf("status = %d, want %d, body=%s", rec.Code, http.StatusOK, rec.Body.String())
+	}
+
+	var resp struct {
+		Models []modelResponse `json:"models"`
+	}
+	if err = json.Unmarshal(rec.Body.Bytes(), &resp); err != nil {
+		t.Fatalf("Unmarshal() error = %v", err)
+	}
+	if len(resp.Models) != 1 {
+		t.Fatalf("len(models) = %d, want 1", len(resp.Models))
+	}
+	if !resp.Models[0].Streaming.Enabled {
+		t.Fatal("streaming.enabled = false, want true")
+	}
+}
+
 func TestHandleAddModel_RejectsUnsupportedProvider(t *testing.T) {
 	configPath, cleanup := setupOAuthTestEnv(t)
 	defer cleanup()
@@ -1214,6 +1259,71 @@ func TestHandleUpdateModel_ToolSchemaTransformPreserveAndClear(t *testing.T) {
 	}
 }

+func TestHandleUpdateModel_StreamingPreserveAndChange(t *testing.T) {
+	configPath, cleanup := setupOAuthTestEnv(t)
+	defer cleanup()
+
+	cfg, err := config.LoadConfig(configPath)
+	if err != nil {
+		t.Fatalf("LoadConfig() error = %v", err)
+	}
+	cfg.ModelList = []*config.ModelConfig{{
+		ModelName: "editable",
+		Provider:  "openai",
+		Model:     "gpt-4o-mini",
+		APIKeys:   config.SimpleSecureStrings("sk-existing"),
+		Streaming: config.ModelStreamingConfig{Enabled: true},
+	}}
+	if err = config.SaveConfig(configPath, cfg); err != nil {
+		t.Fatalf("SaveConfig() error = %v", err)
+	}
+
+	h := NewHandler(configPath)
+	mux := http.NewServeMux()
+	h.RegisterRoutes(mux)
+
+	recPreserve := httptest.NewRecorder()
+	reqPreserve := httptest.NewRequest(http.MethodPut, "/api/models/0", bytes.NewBufferString(`{
+		"model_name":"editable",
+		"provider":"openai",
+		"model":"gpt-4o-mini"
+	}`))
+	reqPreserve.Header.Set("Content-Type", "application/json")
+	mux.ServeHTTP(recPreserve, reqPreserve)
+	if recPreserve.Code != http.StatusOK {
+		t.Fatalf("preserve status = %d, want %d, body=%s", recPreserve.Code, http.StatusOK, recPreserve.Body.String())
+	}
+
+	afterPreserve, err := config.LoadConfig(configPath)
+	if err != nil {
+		t.Fatalf("LoadConfig() after preserve error = %v", err)
+	}
+	if !afterPreserve.ModelList[0].Streaming.Enabled {
+		t.Fatal("preserved streaming.enabled = false, want true")
+	}
+
+	recChange := httptest.NewRecorder()
+	reqChange := httptest.NewRequest(http.MethodPut, "/api/models/0", bytes.NewBufferString(`{
+		"model_name":"editable",
+		"provider":"openai",
+		"model":"gpt-4o-mini",
+		"streaming":{"enabled":false}
+	}`))
+	reqChange.Header.Set("Content-Type", "application/json")
+	mux.ServeHTTP(recChange, reqChange)
+	if recChange.Code != http.StatusOK {
+		t.Fatalf("change status = %d, want %d, body=%s", recChange.Code, http.StatusOK, recChange.Body.String())
+	}
+
+	afterChange, err := config.LoadConfig(configPath)
+	if err != nil {
+		t.Fatalf("LoadConfig() after change error = %v", err)
+	}
+	if afterChange.ModelList[0].Streaming.Enabled {
+		t.Fatal("streaming.enabled = true, want false after explicit update")
+	}
+}
+
 func TestHandleUpdateModel_PersistsProvider(t *testing.T) {
 	configPath, cleanup := setupOAuthTestEnv(t)
 	defer cleanup()
@@ -20,6 +20,9 @@ export interface ModelInfo {
  request_timeout?: number
  thinking_level?: string
  tool_schema_transform?: string
+  streaming?: {
+    enabled?: boolean
+  }
  extra_body?: Record<string, unknown>
  custom_headers?: Record<string, string>
  // Meta
@@ -149,6 +152,7 @@ export async function testModelInline(
 export interface UpstreamModel {
  id: string
  owned_by?: string
+  extra?: Record<string, unknown>
 }

 export interface FetchModelsRequest {
@@ -661,6 +661,7 @@ export function ChannelConfigPage({ channelName }: ChannelConfigPageProps) {
              configuredSecrets={configuredSecrets}
              hiddenKeys={[...hiddenKeys, "bot_id"]}
              requiredKeys={requiredKeys}
+              supportsStreaming
              fieldErrors={fieldErrors}
              registerArrayFieldFlusher={registerArrayFieldFlusher}
              arrayFieldResetVersion={arrayFieldResetVersion}
@@ -675,6 +676,7 @@ export function ChannelConfigPage({ channelName }: ChannelConfigPageProps) {
            configuredSecrets={configuredSecrets}
            hiddenKeys={hiddenKeys}
            requiredKeys={requiredKeys}
+            supportsStreaming={channel?.name === "pico"}
            fieldErrors={fieldErrors}
            registerArrayFieldFlusher={registerArrayFieldFlusher}
            arrayFieldResetVersion={arrayFieldResetVersion}
@@ -17,12 +17,15 @@ import { Field, KeyInput, SwitchCardField } from "@/components/shared-form"
 import { Card, CardContent } from "@/components/ui/card"
 import { Input } from "@/components/ui/input"

+import { StreamingConfigField } from "./streaming-config-field"
+
 interface GenericFormProps {
  config: ChannelConfig
  onChange: (key: string, value: unknown) => void
  configuredSecrets?: string[]
  hiddenKeys?: string[]
  requiredKeys?: string[]
+  supportsStreaming?: boolean
  fieldErrors?: Record<string, string>
  registerArrayFieldFlusher?: (
    fieldPath: string,
@@ -43,6 +46,7 @@ const OBJECT_FIELDS = new Set([
  "allow_from",
  "allow_origins",
  "groups",
+  "streaming",
 ])

 function formatLabel(key: string): string {
@@ -78,6 +82,7 @@ export function GenericForm({
  configuredSecrets = [],
  hiddenKeys = [],
  requiredKeys = [],
+  supportsStreaming = false,
  fieldErrors = {},
  registerArrayFieldFlusher,
  arrayFieldResetVersion,
@@ -89,6 +94,9 @@ export function GenericForm({
  const typingConfig = asRecord(config.typing)
  const placeholderConfig = asRecord(config.placeholder)
  const placeholderEnabled = asBool(placeholderConfig.enabled)
+  const showStreamingConfig =
+    (config.streaming !== undefined || supportsStreaming) &&
+    !hiddenFieldSet.has("streaming")

  const rawFields = Object.keys(config).filter(
    (k) =>
@@ -264,7 +272,8 @@ export function GenericForm({
    (config.group_trigger !== undefined &&
      !hiddenFieldSet.has("group_trigger")) ||
    (config.typing !== undefined && !hiddenFieldSet.has("typing")) ||
-    (config.placeholder !== undefined && !hiddenFieldSet.has("placeholder"))
+    (config.placeholder !== undefined && !hiddenFieldSet.has("placeholder")) ||
+    (config.streaming !== undefined && !hiddenFieldSet.has("streaming"))

  return (
    <div className="space-y-6">
@@ -375,6 +384,15 @@ export function GenericForm({
              </div>
            )}

+            {showStreamingConfig && (
+              <div>
+                <StreamingConfigField
+                  value={config.streaming}
+                  onChange={(value) => onChange("streaming", value)}
+                />
+              </div>
+            )}
+
            {config.placeholder !== undefined &&
              !hiddenFieldSet.has("placeholder") && (
                <div>
@@ -0,0 +1,98 @@
+import { useTranslation } from "react-i18next"
+
+import { Field, SwitchCardField } from "@/components/shared-form"
+import { Input } from "@/components/ui/input"
+
+interface StreamingConfigFieldProps {
+  value: unknown
+  onChange: (value: Record<string, unknown>) => void
+}
+
+function asRecord(value: unknown): Record<string, unknown> {
+  if (value && typeof value === "object" && !Array.isArray(value)) {
+    return value as Record<string, unknown>
+  }
+  return {}
+}
+
+function asBool(value: unknown): boolean {
+  return value === true
+}
+
+function numberInputValue(value: unknown): string {
+  return typeof value === "number" && value > 0 ? String(value) : ""
+}
+
+export function StreamingConfigField({
+  value,
+  onChange,
+}: StreamingConfigFieldProps) {
+  const { t } = useTranslation()
+  const streamingConfig = asRecord(value)
+  const streamingEnabled = asBool(streamingConfig.enabled)
+
+  const update = (patch: Record<string, unknown>) => {
+    onChange({ ...streamingConfig, ...patch })
+  }
+
+  const handleEnabledChange = (checked: boolean) => {
+    if (!checked) {
+      onChange({
+        enabled: false,
+        throttle_seconds: null,
+        min_growth_chars: null,
+      })
+      return
+    }
+    update({ enabled: true })
+  }
+
+  return (
+    <SwitchCardField
+      label={t("channels.field.streamingEnabled")}
+      hint={t("channels.form.desc.streamingEnabled")}
+      checked={streamingEnabled}
+      onCheckedChange={handleEnabledChange}
+      ariaLabel={t("channels.field.streamingEnabled")}
+    >
+      {streamingEnabled && (
+        <div className="grid gap-3 sm:grid-cols-2">
+          <Field
+            label={t("channels.field.streamingThrottleSeconds")}
+            hint={t("channels.form.desc.streamingThrottleSeconds")}
+          >
+            <Input
+              type="number"
+              min={0}
+              value={numberInputValue(streamingConfig.throttle_seconds)}
+              onChange={(e) =>
+                update({
+                  throttle_seconds:
+                    e.target.value === "" ? 0 : Number(e.target.value),
+                })
+              }
+              placeholder="0"
+            />
+          </Field>
+          <Field
+            label={t("channels.field.streamingMinGrowthChars")}
+            hint={t("channels.form.desc.streamingMinGrowthChars")}
+          >
+            <Input
+              type="number"
+              min={0}
+              value={numberInputValue(streamingConfig.min_growth_chars)}
+              onChange={(e) =>
+                update({
+                  min_growth_chars:
+                    e.target.value === "" ? 0 : Number(e.target.value),
+                })
+              }
+              placeholder="0"
+            />
+          </Field>
+        </div>
+      )}
+    </SwitchCardField>
+  )
+}
@@ -14,6 +14,8 @@ import { Field, KeyInput, SwitchCardField } from "@/components/shared-form"
 import { Card, CardContent } from "@/components/ui/card"
 import { Input } from "@/components/ui/input"

+import { StreamingConfigField } from "./streaming-config-field"
+
 interface TelegramFormProps {
  config: ChannelConfig
  onChange: (key: string, value: unknown) => void
@@ -125,6 +127,13 @@ export function TelegramForm({
            />
          </div>

+          <div>
+            <StreamingConfigField
+              value={config.streaming}
+              onChange={(value) => onChange("streaming", value)}
+            />
+          </div>
+
          <div>
            <SwitchCardField
              label={t("channels.field.placeholderEnabled")}
@@ -57,6 +57,7 @@ interface AddForm {
  requestTimeout: string
  thinkingLevel: string
  toolSchemaTransform: string
+  streamingEnabled: boolean
  extraBody: string
  customHeaders: string
 }
@@ -76,6 +77,7 @@ const EMPTY_ADD_FORM: AddForm = {
  requestTimeout: "",
  thinkingLevel: "",
  toolSchemaTransform: "",
+  streamingEnabled: false,
  extraBody: "",
  customHeaders: "",
 }
@@ -254,7 +256,9 @@ export function AddModelSheet({
      debouncedValidateModel(form.model, provider)
    }
    // Clear setAsDefault if the new provider doesn't support being default
-    const allowed = providerOptions?.find((o) => o.id === provider)?.default_model_allowed ?? false
+    const allowed =
+      providerOptions?.find((o) => o.id === provider)?.default_model_allowed ??
+      false
    if (!allowed) {
      setSetAsDefault(false)
    }
@@ -289,7 +293,8 @@ export function AddModelSheet({
  const providerDef = PROVIDER_MAP.get(form.provider)
  const commonModels = providerDef?.commonModels || []
  const defaultModelAllowed = form.provider
-    ? (providerOptions?.find((o) => o.id === form.provider)?.default_model_allowed ?? false)
+    ? (providerOptions?.find((o) => o.id === form.provider)
+        ?.default_model_allowed ?? false)
    : false

  const handleSave = async () => {
@@ -345,6 +350,7 @@ export function AddModelSheet({
          : undefined,
        thinking_level: form.thinkingLevel.trim() || undefined,
        tool_schema_transform: form.toolSchemaTransform.trim() || undefined,
+        streaming: form.streamingEnabled ? { enabled: true } : undefined,
        extra_body: extraBody,
        custom_headers: customHeaders,
      })
@@ -383,7 +389,10 @@ export function AddModelSheet({
            </SheetDescription>
          </SheetHeader>

-          <div className="min-h-0 flex-1 overflow-y-auto" ref={scrollContainerRef}>
+          <div
+            className="min-h-0 flex-1 overflow-y-auto"
+            ref={scrollContainerRef}
+          >
            <div className="space-y-5 px-6 py-5">
              <Field
                label={t("models.add.modelName")}
@@ -508,17 +517,18 @@ export function AddModelSheet({
                  </div>
                )}
                <div className="flex items-center gap-2">
-                  {form.provider && FETCHABLE_PROVIDER_KEYS.has(form.provider) && (
-                    <Button
-                      variant="outline"
-                      size="sm"
-                      className="h-7 text-xs"
-                      onClick={() => setFetchOpen(true)}
-                    >
-                      <IconDownload className="size-3" />
-                      {t("models.fetch.title")}
-                    </Button>
-                  )}
+                  {form.provider &&
+                    FETCHABLE_PROVIDER_KEYS.has(form.provider) && (
+                      <Button
+                        variant="outline"
+                        size="sm"
+                        className="h-7 text-xs"
+                        onClick={() => setFetchOpen(true)}
+                      >
+                        <IconDownload className="size-3" />
+                        {t("models.fetch.title")}
+                      </Button>
+                    )}
                  {!form.provider && (
                    <span className="text-muted-foreground text-xs">
                      {t("models.field.selectProviderFirst")}
@@ -671,6 +681,16 @@ export function AddModelSheet({
                  />
                </Field>

+                <SwitchCardField
+                  label={t("models.field.streamingEnabled")}
+                  hint={t("models.field.streamingEnabledHint")}
+                  checked={form.streamingEnabled}
+                  onCheckedChange={(checked) =>
+                    setForm((f) => ({ ...f, streamingEnabled: checked }))
+                  }
+                  ariaLabel={t("models.field.streamingEnabled")}
+                />
+
                <Field
                  label={t("models.field.extraBody")}
                  hint={t("models.field.extraBodyHint")}
@@ -40,7 +40,11 @@ import { FetchModelsDialog } from "./fetch-models-dialog"
 import { type FieldValidation, validateModelField } from "./model-validation"
 import { ProviderCombobox } from "./provider-combobox"
 import { getProviderKey } from "./provider-label"
-import { FETCHABLE_PROVIDER_KEYS, PROVIDER_API_BASES, PROVIDER_MAP } from "./provider-registry"
+import {
+  FETCHABLE_PROVIDER_KEYS,
+  PROVIDER_API_BASES,
+  PROVIDER_MAP,
+} from "./provider-registry"
 import { TestModelDialog } from "./test-model-dialog"

 interface EditForm {
@@ -57,6 +61,7 @@ interface EditForm {
  requestTimeout: string
  thinkingLevel: string
  toolSchemaTransform: string
+  streamingEnabled: boolean
  extraBody: string
  customHeaders: string
 }
@@ -113,7 +118,8 @@ function buildInitialEditForm(model: ModelInfo): EditForm {
    maxTokensField: model.max_tokens_field ?? "",
    requestTimeout: model.request_timeout ? String(model.request_timeout) : "",
    thinkingLevel: model.thinking_level ?? "",
-    toolSchemaTransform: model.tool_schema_transform ?? "", // <-- AGGIUNGI QUESTA RIGA
+    toolSchemaTransform: model.tool_schema_transform ?? "",
+    streamingEnabled: model.streaming?.enabled === true,
    extraBody: model.extra_body
      ? JSON.stringify(model.extra_body, null, 2)
      : "",
@@ -145,6 +151,7 @@ export function EditModelSheet({
    requestTimeout: "",
    thinkingLevel: "",
    toolSchemaTransform: "",
+    streamingEnabled: false,
    extraBody: "",
    customHeaders: "",
  })
@@ -223,7 +230,9 @@ export function EditModelSheet({
    if (form.modelId) {
      debouncedValidateModel(form.modelId, provider)
    }
-    const allowed = providerOptions?.find((o) => o.id === provider)?.default_model_allowed ?? false
+    const allowed =
+      providerOptions?.find((o) => o.id === provider)?.default_model_allowed ??
+      false
    if (!allowed) {
      setSetAsDefault(false)
    }
@@ -252,7 +261,8 @@ export function EditModelSheet({
  const providerDef = PROVIDER_MAP.get(form.provider)
  const commonModels = providerDef?.commonModels || []
  const defaultModelAllowed = form.provider
-    ? (providerOptions?.find((o) => o.id === form.provider)?.default_model_allowed ?? false)
+    ? (providerOptions?.find((o) => o.id === form.provider)
+        ?.default_model_allowed ?? false)
    : false

  const handleSave = async () => {
@@ -295,6 +305,10 @@ export function EditModelSheet({
    try {
      const modelId = form.modelId.trim()
      const provider = form.provider.trim()
+      const streaming =
+        model.streaming?.enabled === true || form.streamingEnabled
+          ? { enabled: form.streamingEnabled }
+          : undefined
      await updateModel(model.index, {
        model_name: model.model_name,
        provider: provider,
@@ -312,6 +326,7 @@ export function EditModelSheet({
          : undefined,
        thinking_level: form.thinkingLevel || undefined,
        tool_schema_transform: form.toolSchemaTransform.trim() || undefined,
+        streaming,
        extra_body: extraBody,
        custom_headers: customHeaders,
      })
@@ -359,7 +374,10 @@ export function EditModelSheet({
            </SheetDescription>
          </SheetHeader>

-          <div className="min-h-0 flex-1 overflow-y-auto" ref={scrollContainerRef}>
+          <div
+            className="min-h-0 flex-1 overflow-y-auto"
+            ref={scrollContainerRef}
+          >
            <div className="space-y-5 px-6 py-5">
              <Field
                label={t("models.field.provider")}
@@ -459,17 +477,18 @@ export function EditModelSheet({
                  </div>
                )}
                <div className="flex items-center gap-2">
-                  {form.provider && FETCHABLE_PROVIDER_KEYS.has(form.provider) && (
-                    <Button
-                      variant="outline"
-                      size="sm"
-                      className="h-7 text-xs"
-                      onClick={() => setFetchOpen(true)}
-                    >
-                      <IconDownload className="size-3" />
-                      {t("models.fetch.title")}
-                    </Button>
-                  )}
+                  {form.provider &&
+                    FETCHABLE_PROVIDER_KEYS.has(form.provider) && (
+                      <Button
+                        variant="outline"
+                        size="sm"
+                        className="h-7 text-xs"
+                        onClick={() => setFetchOpen(true)}
+                      >
+                        <IconDownload className="size-3" />
+                        {t("models.fetch.title")}
+                      </Button>
+                    )}
                </div>
              </Field>

@@ -651,6 +670,16 @@ export function EditModelSheet({
                    placeholder="google"
                  />
                </Field>
+
+                <SwitchCardField
+                  label={t("models.field.streamingEnabled")}
+                  hint={t("models.field.streamingEnabledHint")}
+                  checked={form.streamingEnabled}
+                  onCheckedChange={(checked) =>
+                    setForm((f) => ({ ...f, streamingEnabled: checked }))
+                  }
+                  ariaLabel={t("models.field.streamingEnabled")}
+                />
              </AdvancedSection>

              {error && (
@@ -101,6 +101,7 @@ export function handlePicoMessage(
        parseAssistantMessageCreateState(payload)
      const attachments = parseAttachments(payload)
      const contextUsage = parseContextUsage(payload)
+      const isPlaceholder = payload.placeholder === true
      const timestamp =
        message.timestamp !== undefined &&
        Number.isFinite(Number(message.timestamp))
@@ -120,7 +121,11 @@ export function handlePicoMessage(
            timestamp,
          },
        ],
-        isTyping: false,
+        isTyping:
+          !isPlaceholder &&
+          (kind === "normal" || message.type === "media.create")
+            ? false
+            : prev.isTyping,
        ...(contextUsage ? { contextUsage } : {}),
      }))
      break
@@ -313,6 +313,8 @@
      "maxTokensFieldHint": "Override the request field name for max tokens, e.g. max_completion_tokens.",
      "toolSchemaTransform": "Tool Schema Transform",
      "toolSchemaTransformHint": "Optional compatibility transform for tool JSON schemas. Leave blank for native behavior. Supported values: simple.",
+      "streamingEnabled": "Streaming Output",
+      "streamingEnabledHint": "Allow this model entry to try provider streaming requests. The current channel streaming switch must also be enabled.",
      "extraBody": "Extra Body",
      "extraBodyHint": "Additional JSON fields to inject into the request body, e.g. {\"reasoning_split\": true}.",
      "customHeaders": "Custom Headers",
@@ -464,6 +466,9 @@
      "typingEnabled": "Typing Indicator",
      "placeholderEnabled": "Placeholder Message",
      "placeholderText": "Placeholder Text",
+      "streamingEnabled": "Streaming Output",
+      "streamingThrottleSeconds": "Update Interval (s)",
+      "streamingMinGrowthChars": "Minimum Growth Characters",
      "groupTriggerMentionOnly": "Group Mention Only",
      "groupTriggerPrefixes": "Group Trigger Prefixes",
      "groupTriggerPrefixesPlaceholder": "e.g. /, !, ?",
@@ -502,6 +507,9 @@
        "mentionOnly": "Only respond when the bot is explicitly mentioned in group chats.",
        "typingEnabled": "Display typing status while the assistant is generating a response.",
        "placeholderEnabled": "Enable temporary placeholder messages before the final reply is sent.",
+        "streamingEnabled": "Allow this channel to display provider streaming output. The current model entry streaming switch must also be enabled.",
+        "streamingThrottleSeconds": "Minimum interval between intermediate streaming updates. 0 means use the default. Final replies are not throttled.",
+        "streamingMinGrowthChars": "Minimum text growth before sending another intermediate streaming update. 0 means use the default. Final replies are not throttled.",
        "groupTriggerMentionOnly": "In group chats, respond only when the bot is mentioned.",
        "groupTriggerPrefixes": "Custom group-chat trigger prefixes. Add items one by one, or paste multiple values at once.",
        "randomReactionEmoji": "PicoClaw adds emoji reactions to user messages to confirm receipt. Example: \"THUMBSUP\", \"HEART\", \"SMILE\". Leave empty to use the default \"Pin\" emoji.",
@@ -308,6 +308,10 @@
      "thinkingLevelHint": "Orçamento de pensamento estendido: off, low, medium, high, xhigh, adaptive.",
      "maxTokensField": "Campo de Max Tokens",
      "maxTokensFieldHint": "Sobrescreve o nome do campo de max tokens na requisição, ex: max_completion_tokens.",
+      "toolSchemaTransform": "Transformação de Schema de Tool",
+      "toolSchemaTransformHint": "Transformação opcional de compatibilidade para schemas JSON de tools. Deixe em branco para comportamento nativo. Valores suportados: simple.",
+      "streamingEnabled": "Saída Streaming",
+      "streamingEnabledHint": "Permite que esta entrada de modelo tente requisições de provider streaming. O switch de streaming do canal atual também precisa estar habilitado.",
      "extraBody": "Body Extra",
      "extraBodyHint": "Campos JSON adicionais para injetar no body da requisição, ex: {\"reasoning_split\": true}.",
      "customHeaders": "Headers Customizados",
@@ -386,6 +390,9 @@
      "typingEnabled": "Indicador de Digitação",
      "placeholderEnabled": "Mensagem de Placeholder",
      "placeholderText": "Texto do Placeholder",
+      "streamingEnabled": "Saída Streaming",
+      "streamingThrottleSeconds": "Intervalo de Atualização (s)",
+      "streamingMinGrowthChars": "Caracteres Mínimos de Crescimento",
      "groupTriggerMentionOnly": "Apenas Menção em Grupo",
      "groupTriggerPrefixes": "Prefixos de Trigger em Grupo",
      "groupTriggerPrefixesPlaceholder": "ex: /, !, ?",
@@ -424,6 +431,9 @@
        "mentionOnly": "Responder apenas quando o bot for explicitamente mencionado em chats em grupo.",
        "typingEnabled": "Exibir status de digitação enquanto o assistente está gerando uma resposta.",
        "placeholderEnabled": "Habilitar mensagens de placeholder temporárias antes da resposta final ser enviada.",
+        "streamingEnabled": "Permite que este canal exiba output streaming do provider. O switch de streaming da entrada de modelo atual também precisa estar habilitado.",
+        "streamingThrottleSeconds": "Intervalo mínimo entre atualizações intermediárias de streaming. 0 significa usar o padrão. Respostas finais não são limitadas.",
+        "streamingMinGrowthChars": "Crescimento mínimo de texto antes de enviar outra atualização intermediária. 0 significa usar o padrão. Respostas finais não são limitadas.",
        "groupTriggerMentionOnly": "Em chats em grupo, responder apenas quando o bot for mencionado.",
        "groupTriggerPrefixes": "Prefixos customizados de trigger para chats em grupo. Adicione itens um a um ou cole vários valores de uma vez.",
        "randomReactionEmoji": "PicoClaw adiciona reações de emoji às mensagens dos usuários para confirmar recebimento. Exemplo: \"THUMBSUP\", \"HEART\", \"SMILE\". Deixe vazio para usar o emoji \"Pin\" padrão.",
@@ -316,6 +316,8 @@
      "maxTokensFieldHint": "覆盖请求中 max_tokens 的字段名，例如 max_completion_tokens。",
      "toolSchemaTransform": "工具 Schema 转换",
      "toolSchemaTransformHint": "可选的工具 JSON Schema 兼容性转换。留空表示保持原生行为。当前支持值：simple。",
+      "streamingEnabled": "流式输出",
+      "streamingEnabledHint": "允许此模型条目尝试 Provider 流式请求；还需要当前频道的流式开关同时开启。",
      "extraBody": "Extra Body",
      "extraBodyHint": "要注入到请求体中的额外 JSON 字段，例如 {\"reasoning_split\": true}。",
      "customHeaders": "Custom Headers",
@@ -465,6 +467,9 @@
      "typingEnabled": "输入中提示",
      "placeholderEnabled": "占位消息",
      "placeholderText": "占位文案",
+      "streamingEnabled": "流式输出",
+      "streamingThrottleSeconds": "更新间隔（秒）",
+      "streamingMinGrowthChars": "最小增长字符数",
      "groupTriggerMentionOnly": "群聊仅提及时响应",
      "groupTriggerPrefixes": "群聊触发前缀",
      "groupTriggerPrefixesPlaceholder": "例如 /, !, ?",
@@ -503,6 +508,9 @@
        "mentionOnly": "在群聊中仅当明确提及时才响应",
        "typingEnabled": "在生成回复时显示“正在输入”状态",
        "placeholderEnabled": "在最终回复发送前，先发送临时占位消息",
+        "streamingEnabled": "允许该频道展示 Provider 流式输出。还需要当前模型条目的流式开关同时开启。",
+        "streamingThrottleSeconds": "中间流式更新的最小间隔，0 表示使用默认值。最终回复不受限制。",
+        "streamingMinGrowthChars": "中间流式更新相比上次发送至少增长的字符数，0 表示使用默认值。最终回复不受限制。",
        "groupTriggerMentionOnly": "在群聊中仅当提及机器人时才响应",
        "groupTriggerPrefixes": "群聊触发前缀。可逐项添加，也支持一次粘贴多个值。",
        "randomReactionEmoji": "PicoClaw 会对用户消息添加表情回复以确认已收到。例如：\"THUMBSUP\", \"HEART\", \"SMILE\"。留空则使用默认的 \"Pin\" 表情。",