docs(voice): Update docs for audio-transcription

2026-05-25 16:00:35 +00:00 · 2026-03-22 21:04:10 +08:00
parent 8ad4b9b497
commit 92678d1700
5 changed files with 67 additions and 4 deletions
@@ -547,6 +547,7 @@
    "monitor_usb": true
  },
  "voice": {
+    "model_name": "",
    "echo_transcription": false
  },
  "gateway": {
@@ -2,7 +2,7 @@

 # Telegram

-The Telegram channel uses long polling via the Telegram Bot API for bot-based communication. It supports text messages, media attachments (photos, voice, audio, documents), voice transcription via Groq Whisper, and built-in command handling.
+The Telegram channel uses long polling via the Telegram Bot API for bot-based communication. It supports text messages, media attachments (photos, voice, audio, documents), voice transcription ([setup](../../providers.md#voice-transcription)), and built-in command handling.

 ## Configuration

@@ -2,7 +2,7 @@

 # Telegram

-Telegram Channel 通过 Telegram 机器人 API 使用长轮询实现基于机器人的通信。它支持文本消息、媒体附件（照片、语音、音频、文档）、通过 Groq Whisper 进行语音转录以及内置命令处理器。
+Telegram Channel 通过 Telegram 机器人 API 使用长轮询实现基于机器人的通信。它支持文本消息、媒体附件（照片、语音、音频、文档）、语音转录（配置见[提供商与模型配置](../../zh/providers.md#语音转录)），以及内置命令处理器。

 ## 配置

@@ -5,7 +5,7 @@
 ### Providers

 > [!NOTE]
-> Groq provides free voice transcription via Whisper. If configured, audio messages from any channel will be automatically transcribed at the agent level.
+> Voice transcription can use a configured multimodal model via `voice.model_name`. Groq Whisper remains available as a fallback when no voice model is configured.

 | Provider     | Purpose                                 | Get API Key                                                  |
 | ------------ | --------------------------------------- | ------------------------------------------------------------ |
@@ -101,6 +101,33 @@ This design also enables **multi-agent support** with flexible provider selectio
 }
 ```

+#### Voice Transcription
+
+You can configure a dedicated model for audio transcription with `voice.model_name`. This lets you reuse existing multimodal providers that support audio input instead of relying only on Groq.
+
+If `voice.model_name` is not configured, PicoClaw will continue to fall back to Groq transcription when a Groq API key is available.
+
+```json
+{
+  "model_list": [
+    {
+      "model_name": "voice-gemini",
+      "model": "gemini/gemini-2.5-flash",
+      "api_key": "your-gemini-key"
+    }
+  ],
+  "voice": {
+    "model_name": "voice-gemini",
+    "echo_transcription": false
+  },
+  "providers": {
+    "groq": {
+      "api_key": "gsk_xxx"
+    }
+  }
+}
+```
+
 #### Vendor-Specific Examples

 **OpenAI**
@@ -344,6 +371,10 @@ picoclaw agent -m "Hello"
      "api_key": "gsk_xxx"
    }
  },
+  "voice": {
+    "model_name": "voice-gemini",
+    "echo_transcription": false
+  },
  "channels": {
    "telegram": {
      "enabled": true,
@@ -5,7 +5,7 @@
 ### 提供商 (Providers)

 > [!NOTE]
-> Groq 通过 Whisper 提供免费的语音转录。如果配置了 Groq，任意渠道的音频消息都将在 Agent 层面自动转录为文字。
+> 语音转录现在可以通过 `voice.model_name` 指定的多模态模型完成；如果未配置语音模型，Groq Whisper 仍可作为回退方案。

 | 提供商               | 用途                         | 获取 API Key                                                         |
 | -------------------- | ---------------------------- | -------------------------------------------------------------------- |
@@ -99,6 +99,33 @@
 }
 ```

+#### 语音转录
+
+你可以通过 `voice.model_name` 为语音转录指定一个专用模型。这样可以直接复用已经配置好的、支持音频输入的多模态 provider，而不必只依赖 Groq。
+
+如果没有配置 `voice.model_name`，且存在 Groq API Key，PicoClaw 会继续回退到 Groq 转录。
+
+```json
+{
+  "model_list": [
+    {
+      "model_name": "voice-gemini",
+      "model": "gemini/gemini-2.5-flash",
+      "api_key": "your-gemini-key"
+    }
+  ],
+  "voice": {
+    "model_name": "voice-gemini",
+    "echo_transcription": false
+  },
+  "providers": {
+    "groq": {
+      "api_key": "gsk_xxx"
+    }
+  }
+}
+```
+
 #### 各厂商配置示例

 **OpenAI**
@@ -342,6 +369,10 @@ picoclaw agent -m "你好"
      "api_key": "gsk_xxx"
    }
  },
+  "voice": {
+    "model_name": "voice-gemini",
+    "echo_transcription": false
+  },
  "channels": {
    "telegram": {
      "enabled": true,