docs(voice): Update docs for audio-transcription

This commit is contained in:
RussellLuo
2026-03-22 21:04:10 +08:00
parent 8ad4b9b497
commit 92678d1700
5 changed files with 67 additions and 4 deletions
+1
View File
@@ -547,6 +547,7 @@
"monitor_usb": true
},
"voice": {
"model_name": "",
"echo_transcription": false
},
"gateway": {
+1 -1
View File
@@ -2,7 +2,7 @@
# Telegram
The Telegram channel uses long polling via the Telegram Bot API for bot-based communication. It supports text messages, media attachments (photos, voice, audio, documents), voice transcription via Groq Whisper, and built-in command handling.
The Telegram channel uses long polling via the Telegram Bot API for bot-based communication. It supports text messages, media attachments (photos, voice, audio, documents), voice transcription ([setup](../../providers.md#voice-transcription)), and built-in command handling.
## Configuration
+1 -1
View File
@@ -2,7 +2,7 @@
# Telegram
Telegram Channel 通过 Telegram 机器人 API 使用长轮询实现基于机器人的通信。它支持文本消息、媒体附件(照片、语音、音频、文档)、通过 Groq Whisper 进行语音转录以及内置命令处理器。
Telegram Channel 通过 Telegram 机器人 API 使用长轮询实现基于机器人的通信。它支持文本消息、媒体附件(照片、语音、音频、文档)、语音转录(配置见[提供商与模型配置](../../zh/providers.md#语音转录)),以及内置命令处理器。
## 配置
+32 -1
View File
@@ -5,7 +5,7 @@
### Providers
> [!NOTE]
> Groq provides free voice transcription via Whisper. If configured, audio messages from any channel will be automatically transcribed at the agent level.
> Voice transcription can use a configured multimodal model via `voice.model_name`. Groq Whisper remains available as a fallback when no voice model is configured.
| Provider | Purpose | Get API Key |
| ------------ | --------------------------------------- | ------------------------------------------------------------ |
@@ -101,6 +101,33 @@ This design also enables **multi-agent support** with flexible provider selectio
}
```
#### Voice Transcription
You can configure a dedicated model for audio transcription with `voice.model_name`. This lets you reuse existing multimodal providers that support audio input instead of relying only on Groq.
If `voice.model_name` is not configured, PicoClaw will continue to fall back to Groq transcription when a Groq API key is available.
```json
{
"model_list": [
{
"model_name": "voice-gemini",
"model": "gemini/gemini-2.5-flash",
"api_key": "your-gemini-key"
}
],
"voice": {
"model_name": "voice-gemini",
"echo_transcription": false
},
"providers": {
"groq": {
"api_key": "gsk_xxx"
}
}
}
```
#### Vendor-Specific Examples
**OpenAI**
@@ -344,6 +371,10 @@ picoclaw agent -m "Hello"
"api_key": "gsk_xxx"
}
},
"voice": {
"model_name": "voice-gemini",
"echo_transcription": false
},
"channels": {
"telegram": {
"enabled": true,
+32 -1
View File
@@ -5,7 +5,7 @@
### 提供商 (Providers)
> [!NOTE]
> Groq 通过 Whisper 提供免费的语音转录。如果配置了 Groq,任意渠道的音频消息都将在 Agent 层面自动转录为文字
> 语音转录现在可以通过 `voice.model_name` 指定的多模态模型完成;如果配置语音模型,Groq Whisper 仍可作为回退方案
| 提供商 | 用途 | 获取 API Key |
| -------------------- | ---------------------------- | -------------------------------------------------------------------- |
@@ -99,6 +99,33 @@
}
```
#### 语音转录
你可以通过 `voice.model_name` 为语音转录指定一个专用模型。这样可以直接复用已经配置好的、支持音频输入的多模态 provider,而不必只依赖 Groq。
如果没有配置 `voice.model_name`,且存在 Groq API KeyPicoClaw 会继续回退到 Groq 转录。
```json
{
"model_list": [
{
"model_name": "voice-gemini",
"model": "gemini/gemini-2.5-flash",
"api_key": "your-gemini-key"
}
],
"voice": {
"model_name": "voice-gemini",
"echo_transcription": false
},
"providers": {
"groq": {
"api_key": "gsk_xxx"
}
}
}
```
#### 各厂商配置示例
**OpenAI**
@@ -342,6 +369,10 @@ picoclaw agent -m "你好"
"api_key": "gsk_xxx"
}
},
"voice": {
"model_name": "voice-gemini",
"echo_transcription": false
},
"channels": {
"telegram": {
"enabled": true,