* refactor: support explicit model list providers * fix(web): preserve explicit model providers * fix(web): preserve legacy provider prefixes on model updates fix(models): normalize explicit provider-prefixed ids fix(api): preserve legacy model updates across providers fix(agent): preserve config identity for explicit provider refs * fix ci
TTS (Text-to-Speech)
This package handles speech synthesis for PicoClaw.
If you are new to TTS setup, the simplest workflow is:
- Add a TTS-capable entry to
model_list. - Point
voice.tts_model_nameat that entry. - Put the API key in
.security.yml.
Quick Recommendation
For most users, these are the best starting points:
| Provider | Why start here |
|---|---|
| OpenAI | Best-supported path in PicoClaw today. The current TTS implementation is built around the OpenAI-compatible /audio/speech API shape, and OpenAI is the safest default. |
| Xiaomi MiMo | A good second option if you want an OpenAI-compatible provider endpoint and are already using MiMo models in the rest of your stack. |
How TTS Configuration Works
PicoClaw does not keep TTS API keys inside voice.
Instead:
voice.tts_model_nameselects a named entry frommodel_list.- That
model_listentry provides the provider, model ID, API base, and proxy settings. .security.ymlstores the API key for the same named model entry.
This is the recommended and supported configuration pattern.
Recommended Setup
Option A: OpenAI
config.json
{
"voice": {
"tts_model_name": "openai-tts"
},
"model_list": [
{
"model_name": "openai-tts",
"model": "openai/tts-1"
}
]
}
.security.yml
model_list:
openai-tts:
api_keys:
- "sk-openai-your-key"
Option B: Xiaomi MiMo
config.json
{
"voice": {
"tts_model_name": "mimo-tts"
},
"model_list": [
{
"model_name": "mimo-tts",
"model": "mimo/mimo-v2-tts"
}
]
}
.security.yml
model_list:
mimo-tts:
api_keys:
- "your-mimo-key"
If you use a custom MiMo endpoint, you can also set api_base explicitly. Otherwise PicoClaw will use the provider default.
What PicoClaw Sends Today
The current TTS runtime uses an OpenAI-compatible speech request with these defaults:
- Endpoint:
/audio/speech - Response format:
opus - Voice:
alloy - Model: taken from the selected
model_listentry
That means:
openai/tts-1works naturally.- Other OpenAI-compatible providers can work if they accept the same request format.
- PicoClaw currently does not expose a user-facing config field for changing the TTS voice from
alloy.
How PicoClaw Chooses a TTS Provider
DetectTTS resolves TTS in this order:
- Preferred path: resolve
voice.tts_model_nameagainstmodel_list. - If a matching model entry exists and has an API key, PicoClaw creates an OpenAI-compatible TTS provider using that model's settings.
- Fallback path: if
voice.tts_model_nameis not set or cannot be resolved, PicoClaw scansmodel_listfor the first entry whose model string containsttsand has an API key.
Fallback scanning exists for compatibility. New configs should set voice.tts_model_name explicitly.
Notes About API Base Handling
PicoClaw normalizes the configured base URL for TTS:
- For OpenAI, a base like
https://api.openai.comorhttps://api.openai.com/v1becomeshttps://api.openai.com/v1/audio/speech. - For other OpenAI-compatible providers, PicoClaw preserves the configured base path and ensures it ends with
/audio/speech. - If
api_baseis omitted, PicoClaw uses the provider default base when the model prefix is known.
Common Mistakes
- Setting
voice.tts_model_nameto a name that does not exist inmodel_list. - Adding a TTS model but forgetting to put its API key in
.security.yml. - Assuming PicoClaw will automatically use provider-specific custom voices.
- Using a provider endpoint that is not compatible with the OpenAI
/audio/speechrequest format.
Minimal Checklist
Before testing send_tts, make sure:
voice.tts_model_namematches amodel_list[].model_name.- The matching
.security.ymlentry contains a valid API key. - The chosen provider supports an OpenAI-compatible speech synthesis endpoint.
- Your selected model is actually a TTS-capable model.