mirror of https://github.com/sipeed/picoclaw.git synced 2026-06-12 18:08:54 +00:00

Files

T

LC 81a050555d feat(provider,web,asr): enhance model management with explicit provider metadata (#2701 )

* feat(provider,web): enhance model management with provider options

* fix(asr): enhance compatibility for ElevenLabs transcription model

* fix(provider,web): align provider availability predicates and add flow gating

* fix(web,asr): preserve legacy elevenlabs transcription configs

* fix(provider,web,asr): normalize elevenlabs configs and gate default chat models

* fix: tighten provider catalog and elevenlabs compatibility

2026-05-06 16:06:49 +08:00

agent_test.go

Merge branch 'main' into refactor-inbound-context-routing-session

2026-04-07 21:41:02 +08:00

agent.go

Merge branch 'main' into refactor-inbound-context-routing-session

2026-04-07 21:41:02 +08:00

asr_test.go

feat(provider,web,asr): enhance model management with explicit provider metadata (#2701 )

2026-05-06 16:06:49 +08:00

asr.go

feat(provider,web,asr): enhance model management with explicit provider metadata (#2701 )

2026-05-06 16:06:49 +08:00

audio_model_transcriber_test.go

Refactor/asr tts (#1939 )

2026-04-01 12:21:21 +08:00

audio_model_transcriber.go

Refactor/asr tts (#1939 )

2026-04-01 12:21:21 +08:00

elevenlabs_transcriber_test.go

feat(provider,web,asr): enhance model management with explicit provider metadata (#2701 )

2026-05-06 16:06:49 +08:00

elevenlabs_transcriber.go

feat(provider,web,asr): enhance model management with explicit provider metadata (#2701 )

2026-05-06 16:06:49 +08:00

README.md

feat(provider,web,asr): enhance model management with explicit provider metadata (#2701 )

2026-05-06 16:06:49 +08:00

README.zh.md

feat(provider,web,asr): enhance model management with explicit provider metadata (#2701 )

2026-05-06 16:06:49 +08:00

whisper_transcriber_test.go

Refactor/asr tts (#1939 )

2026-04-01 12:21:21 +08:00

whisper_transcriber.go

refactor: support explicit provider field in model list entries (#2609 )

2026-04-22 11:28:47 +08:00

README.md

ASR (Automatic Speech Recognition)

This package handles speech-to-text for PicoClaw voice input.

If you are new to ASR setup, the simplest mental model is:

Add one or more ASR-capable entries to model_list.
Point voice.model_name at the one you want to use.
Put the API key in .security.yml.

Quick Recommendation

For most new users, start with one of these:

Provider	Example model	Why start here
Groq	`groq/whisper-large-v3-turbo`	Fast Whisper-style transcription and a straightforward OpenAI-compatible API. Groq currently advertises a free tier plan for 2000 reqs/day.
ElevenLabs	`elevenlabs/scribe_v1`	Easy setup and strong speech-to-text quality. ElevenLabs currently advertises a free plan that includes speech-to-text usage.

Pricing and free-plan limits can change, so check the linked pricing pages before depending on them in production.

How ASR Configuration Works

PicoClaw does not keep ASR API keys inside the voice section.

Instead:

voice.model_name chooses a named entry from model_list.
The matching model_list entry describes the actual provider and model.
.security.yml stores the API key for that named model entry.

This is the recommended pattern because it is explicit, reusable, and consistent with the rest of PicoClaw's model configuration.

Recommended Setup

Option A: Groq Whisper

config.json

{
  "voice": {
    "model_name": "groq-asr",
    "echo_transcription": true
  },
  "model_list": [
    {
      "model_name": "groq-asr",
      "model": "groq/whisper-large-v3-turbo"
    }
  ]
}

.security.yml

model_list:
  groq-asr:
    api_keys:
      - "gsk_your_groq_key"

Notes:

You can omit api_base and PicoClaw will use Groq's default API base automatically.
If you set api_base manually for Groq Whisper, both of these forms work:
- https://api.groq.com/openai/v1
- https://api.groq.com/openai/v1/audio/transcriptions
Any OpenAI-compatible Whisper model name containing whisper can use the Whisper transcription path, not only whisper-large-v3-turbo.

Option B: ElevenLabs

config.json

{
  "voice": {
    "model_name": "elevenlabs-asr",
    "echo_transcription": true
  },
  "model_list": [
    {
      "model_name": "elevenlabs-asr",
      "provider": "elevenlabs",
      "model": "scribe_v1"
    }
  ]
}

.security.yml

model_list:
  elevenlabs-asr:
    api_keys:
      - "sk-elevenlabs-your-key"

Option C: OpenAI Whisper

config.json

{
  "voice": {
    "model_name": "openai-asr"
  },
  "model_list": [
    {
      "model_name": "openai-asr",
      "model": "openai/whisper-1"
    }
  ]
}

.security.yml

model_list:
  openai-asr:
    api_keys:
      - "sk-openai-your-key"

Other ASR-Capable Model Types

PicoClaw currently supports three main ASR routes:

Route	Example models	Behavior
ElevenLabs ASR	`provider: elevenlabs`, `model: scribe_v1`	Uses the ElevenLabs transcription API.
Whisper endpoint models	`openai/whisper-1`, `groq/whisper-large-v3`	Uses an OpenAI-compatible `/audio/transcriptions` endpoint.
Audio-capable chat models (Under construction)	`openai/gpt-4o-audio-preview`, `gemini/gemini-2.5-flash`	Sends audio to a multimodal chat model and asks it to transcribe.

If you are unsure which one to pick, choose Groq Whisper or ElevenLabs first.

How PicoClaw Chooses a Transcriber

DetectTranscriber resolves ASR in this order:

Preferred path: resolve voice.model_name against model_list.
If that resolved model is:
- an elevenlabs provider model, PicoClaw uses the ElevenLabs transcriber.
- an OpenAI-compatible Whisper model, PicoClaw uses the Whisper transcriber.
- an audio-capable chat model, PicoClaw uses AudioModelTranscriber.
Fallback path: if voice.model_name is not set, PicoClaw performs a compatibility scan through model_list for legacy auto-detected ASR entries.

Fallback scanning exists for backward compatibility. New configurations should set voice.model_name explicitly.

Common Mistakes

Defining an ASR model in model_list but forgetting to set voice.model_name.
Putting the API key in voice instead of .security.yml.
Using a non-ASR model and expecting Whisper-style transcription behavior.
Setting a custom api_base that points to the wrong provider endpoint.

Minimal Checklist

Before testing voice input, make sure:

voice.model_name matches a model_list[].model_name.
The matching .security.yml entry contains a valid API key.
The selected model is actually ASR-capable.
Voice input is enabled for the channel you are using.