mirror of
https://github.com/sipeed/picoclaw.git
synced 2026-05-25 16:00:35 +00:00
docs(voice): Update docs for audio-transcription
This commit is contained in:
@@ -547,6 +547,7 @@
|
||||
"monitor_usb": true
|
||||
},
|
||||
"voice": {
|
||||
"model_name": "",
|
||||
"echo_transcription": false
|
||||
},
|
||||
"gateway": {
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
# Telegram
|
||||
|
||||
The Telegram channel uses long polling via the Telegram Bot API for bot-based communication. It supports text messages, media attachments (photos, voice, audio, documents), voice transcription via Groq Whisper, and built-in command handling.
|
||||
The Telegram channel uses long polling via the Telegram Bot API for bot-based communication. It supports text messages, media attachments (photos, voice, audio, documents), voice transcription ([setup](../../providers.md#voice-transcription)), and built-in command handling.
|
||||
|
||||
## Configuration
|
||||
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
# Telegram
|
||||
|
||||
Telegram Channel 通过 Telegram 机器人 API 使用长轮询实现基于机器人的通信。它支持文本消息、媒体附件(照片、语音、音频、文档)、通过 Groq Whisper 进行语音转录以及内置命令处理器。
|
||||
Telegram Channel 通过 Telegram 机器人 API 使用长轮询实现基于机器人的通信。它支持文本消息、媒体附件(照片、语音、音频、文档)、语音转录(配置见[提供商与模型配置](../../zh/providers.md#语音转录)),以及内置命令处理器。
|
||||
|
||||
## 配置
|
||||
|
||||
|
||||
+32
-1
@@ -5,7 +5,7 @@
|
||||
### Providers
|
||||
|
||||
> [!NOTE]
|
||||
> Groq provides free voice transcription via Whisper. If configured, audio messages from any channel will be automatically transcribed at the agent level.
|
||||
> Voice transcription can use a configured multimodal model via `voice.model_name`. Groq Whisper remains available as a fallback when no voice model is configured.
|
||||
|
||||
| Provider | Purpose | Get API Key |
|
||||
| ------------ | --------------------------------------- | ------------------------------------------------------------ |
|
||||
@@ -101,6 +101,33 @@ This design also enables **multi-agent support** with flexible provider selectio
|
||||
}
|
||||
```
|
||||
|
||||
#### Voice Transcription
|
||||
|
||||
You can configure a dedicated model for audio transcription with `voice.model_name`. This lets you reuse existing multimodal providers that support audio input instead of relying only on Groq.
|
||||
|
||||
If `voice.model_name` is not configured, PicoClaw will continue to fall back to Groq transcription when a Groq API key is available.
|
||||
|
||||
```json
|
||||
{
|
||||
"model_list": [
|
||||
{
|
||||
"model_name": "voice-gemini",
|
||||
"model": "gemini/gemini-2.5-flash",
|
||||
"api_key": "your-gemini-key"
|
||||
}
|
||||
],
|
||||
"voice": {
|
||||
"model_name": "voice-gemini",
|
||||
"echo_transcription": false
|
||||
},
|
||||
"providers": {
|
||||
"groq": {
|
||||
"api_key": "gsk_xxx"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Vendor-Specific Examples
|
||||
|
||||
**OpenAI**
|
||||
@@ -344,6 +371,10 @@ picoclaw agent -m "Hello"
|
||||
"api_key": "gsk_xxx"
|
||||
}
|
||||
},
|
||||
"voice": {
|
||||
"model_name": "voice-gemini",
|
||||
"echo_transcription": false
|
||||
},
|
||||
"channels": {
|
||||
"telegram": {
|
||||
"enabled": true,
|
||||
|
||||
+32
-1
@@ -5,7 +5,7 @@
|
||||
### 提供商 (Providers)
|
||||
|
||||
> [!NOTE]
|
||||
> Groq 通过 Whisper 提供免费的语音转录。如果配置了 Groq,任意渠道的音频消息都将在 Agent 层面自动转录为文字。
|
||||
> 语音转录现在可以通过 `voice.model_name` 指定的多模态模型完成;如果未配置语音模型,Groq Whisper 仍可作为回退方案。
|
||||
|
||||
| 提供商 | 用途 | 获取 API Key |
|
||||
| -------------------- | ---------------------------- | -------------------------------------------------------------------- |
|
||||
@@ -99,6 +99,33 @@
|
||||
}
|
||||
```
|
||||
|
||||
#### 语音转录
|
||||
|
||||
你可以通过 `voice.model_name` 为语音转录指定一个专用模型。这样可以直接复用已经配置好的、支持音频输入的多模态 provider,而不必只依赖 Groq。
|
||||
|
||||
如果没有配置 `voice.model_name`,且存在 Groq API Key,PicoClaw 会继续回退到 Groq 转录。
|
||||
|
||||
```json
|
||||
{
|
||||
"model_list": [
|
||||
{
|
||||
"model_name": "voice-gemini",
|
||||
"model": "gemini/gemini-2.5-flash",
|
||||
"api_key": "your-gemini-key"
|
||||
}
|
||||
],
|
||||
"voice": {
|
||||
"model_name": "voice-gemini",
|
||||
"echo_transcription": false
|
||||
},
|
||||
"providers": {
|
||||
"groq": {
|
||||
"api_key": "gsk_xxx"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### 各厂商配置示例
|
||||
|
||||
**OpenAI**
|
||||
@@ -342,6 +369,10 @@ picoclaw agent -m "你好"
|
||||
"api_key": "gsk_xxx"
|
||||
}
|
||||
},
|
||||
"voice": {
|
||||
"model_name": "voice-gemini",
|
||||
"echo_transcription": false
|
||||
},
|
||||
"channels": {
|
||||
"telegram": {
|
||||
"enabled": true,
|
||||
|
||||
Reference in New Issue
Block a user