* feat(provider,web): enhance model management with provider options * fix(asr): enhance compatibility for ElevenLabs transcription model * fix(provider,web): align provider availability predicates and add flow gating * fix(web,asr): preserve legacy elevenlabs transcription configs * fix(provider,web,asr): normalize elevenlabs configs and gate default chat models * fix: tighten provider catalog and elevenlabs compatibility
ASR (Automatic Speech Recognition)
This package handles speech-to-text for PicoClaw voice input.
If you are new to ASR setup, the simplest mental model is:
- Add one or more ASR-capable entries to
model_list. - Point
voice.model_nameat the one you want to use. - Put the API key in
.security.yml.
Quick Recommendation
For most new users, start with one of these:
| Provider | Example model | Why start here |
|---|---|---|
| Groq | groq/whisper-large-v3-turbo |
Fast Whisper-style transcription and a straightforward OpenAI-compatible API. Groq currently advertises a free tier plan for 2000 reqs/day. |
| ElevenLabs | elevenlabs/scribe_v1 |
Easy setup and strong speech-to-text quality. ElevenLabs currently advertises a free plan that includes speech-to-text usage. |
Pricing and free-plan limits can change, so check the linked pricing pages before depending on them in production.
How ASR Configuration Works
PicoClaw does not keep ASR API keys inside the voice section.
Instead:
voice.model_namechooses a named entry frommodel_list.- The matching
model_listentry describes the actual provider and model. .security.ymlstores the API key for that named model entry.
This is the recommended pattern because it is explicit, reusable, and consistent with the rest of PicoClaw's model configuration.
Recommended Setup
Option A: Groq Whisper
config.json
{
"voice": {
"model_name": "groq-asr",
"echo_transcription": true
},
"model_list": [
{
"model_name": "groq-asr",
"model": "groq/whisper-large-v3-turbo"
}
]
}
.security.yml
model_list:
groq-asr:
api_keys:
- "gsk_your_groq_key"
Notes:
- You can omit
api_baseand PicoClaw will use Groq's default API base automatically. - If you set
api_basemanually for Groq Whisper, both of these forms work:https://api.groq.com/openai/v1https://api.groq.com/openai/v1/audio/transcriptions
- Any OpenAI-compatible Whisper model name containing
whispercan use the Whisper transcription path, not onlywhisper-large-v3-turbo.
Option B: ElevenLabs
config.json
{
"voice": {
"model_name": "elevenlabs-asr",
"echo_transcription": true
},
"model_list": [
{
"model_name": "elevenlabs-asr",
"provider": "elevenlabs",
"model": "scribe_v1"
}
]
}
.security.yml
model_list:
elevenlabs-asr:
api_keys:
- "sk-elevenlabs-your-key"
Option C: OpenAI Whisper
config.json
{
"voice": {
"model_name": "openai-asr"
},
"model_list": [
{
"model_name": "openai-asr",
"model": "openai/whisper-1"
}
]
}
.security.yml
model_list:
openai-asr:
api_keys:
- "sk-openai-your-key"
Other ASR-Capable Model Types
PicoClaw currently supports three main ASR routes:
| Route | Example models | Behavior |
|---|---|---|
| ElevenLabs ASR | provider: elevenlabs, model: scribe_v1 |
Uses the ElevenLabs transcription API. |
| Whisper endpoint models | openai/whisper-1, groq/whisper-large-v3 |
Uses an OpenAI-compatible /audio/transcriptions endpoint. |
| Audio-capable chat models (Under construction) | openai/gpt-4o-audio-preview, gemini/gemini-2.5-flash |
Sends audio to a multimodal chat model and asks it to transcribe. |
If you are unsure which one to pick, choose Groq Whisper or ElevenLabs first.
How PicoClaw Chooses a Transcriber
DetectTranscriber resolves ASR in this order:
- Preferred path: resolve
voice.model_nameagainstmodel_list. - If that resolved model is:
- an
elevenlabsprovider model, PicoClaw uses the ElevenLabs transcriber. - an OpenAI-compatible Whisper model, PicoClaw uses the Whisper transcriber.
- an audio-capable chat model, PicoClaw uses
AudioModelTranscriber.
- an
- Fallback path: if
voice.model_nameis not set, PicoClaw performs a compatibility scan throughmodel_listfor legacy auto-detected ASR entries.
Fallback scanning exists for backward compatibility. New configurations should set voice.model_name explicitly.
Common Mistakes
- Defining an ASR model in
model_listbut forgetting to setvoice.model_name. - Putting the API key in
voiceinstead of.security.yml. - Using a non-ASR model and expecting Whisper-style transcription behavior.
- Setting a custom
api_basethat points to the wrong provider endpoint.
Minimal Checklist
Before testing voice input, make sure:
voice.model_namematches amodel_list[].model_name.- The matching
.security.ymlentry contains a valid API key. - The selected model is actually ASR-capable.
- Voice input is enabled for the channel you are using.