Commit Graph

15 Commits

Author SHA1 Message Date
Cytown b646d3b8fe refactor config and security to simplified the structure (#2068) 2026-03-28 00:03:34 +08:00
Orkun Manap dd9adf8a04 feat: add ElevenLabs Scribe STT transcriber and Telegram SendVoice support (#1905)
* feat: add ElevenLabs Scribe STT transcriber and Telegram SendVoice support

Add ElevenLabsTranscriber as an alternative speech-to-text provider using
the ElevenLabs Scribe API (scribe_v1). This enables voice message
transcription for users who already have an ElevenLabs API key, without
requiring a separate Groq account.

Changes:
- Add ElevenLabsTranscriber implementing the Transcriber interface
- Update DetectTranscriber to check providers.elevenlabs.api_key first,
  falling back to Groq for backward compatibility
- Add ElevenLabs to ProvidersConfig
- Add "voice" media type for OGG files with "voice" in filename
- Add SendVoice support in Telegram channel for voice bubble messages
- Add comprehensive tests for ElevenLabs transcriber

Configuration:
  "providers": {
    "elevenlabs": {
      "api_key": "sk_your_key_here"
    }
  }

Closes #1503 (partial)

* fix: move voice-bubble detection into Telegram channel to avoid regression in other channels

Address review feedback: keep inferMediaType returning "audio" for all
OGG files. Voice-bubble detection (SendVoice vs SendAudio) is now done
inside the Telegram channel based on filename, so other channels that
map "audio" explicitly are unaffected.

* fix: align VoiceConfig struct tags to pass golines formatter

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(agent): use ModelName in loop test added by upstream

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 22:11:10 +01:00
Cytown 7bf4831059 Merge branch 'main' into version 2026-03-23 10:54:08 +08:00
RussellLuo 4d2b244522 refactor(voice): share audio format support and restrict transcriber selection 2026-03-22 23:40:13 +08:00
RussellLuo 8ad4b9b497 feat(voice): add audio-model transcription support
- Add `AudioModelTranscriber` for model-based audio transcription via LLM providers
- Support selecting a transcription model with `voice.model_name` in config
- Keep Groq transcription as a fallback and move it into dedicated files with focused tests
- Serialize `data:audio/...` media as input_audio for OpenAI-compatible providers
- Improve transcription logging by rendering error fields as strings
- Add coverage for transcriber detection, audio-model behavior, provider audio serialization, and Groq transcription

Fixes #1890.
2026-03-22 20:07:22 +08:00
Cytown e455eb5e67 refactor: seperate security.yml for store keys 2026-03-22 01:55:00 +08:00
Cytown 1c123e0162 refactor Config to add Version and migratable 2026-03-12 13:52:55 +08:00
Dimitrij Denissenko 494953fb78 Fix lint 2026-03-04 10:21:59 +00:00
Dimitrij Denissenko b74f92ed28 A more neutral and elegant voice.Transcriber interface 2026-03-01 21:02:16 +00:00
Dimitrij Denissenko b1386ad71f Fix voice transcription 2026-03-01 08:39:05 +00:00
Artem Yadelskyi 02b4d9fbe2 feat(linter): Fix govet linter 2026-02-20 22:35:16 +02:00
Artem Yadelskyi 9e120f90ea feat(fmt): Run formatters 2026-02-18 21:48:23 +02:00
Together f12c337965 Remove duplicate truncate functions, reuse utils.Truncate
Multiple packages had their own private truncate implementations:
- channels/telegram.go: truncateString (byte-based, no "...")
- channels/dingtalk.go: truncateStringDingTalk (byte-based, no "...")
- voice/transcriber.go: truncateText (byte-based, with "...")

All three are functionally equivalent to the existing utils.Truncate,
which already handles rune-safe truncation and appends "..." correctly.

Replace all private copies with utils.Truncate and delete the dead code.
2026-02-12 00:46:48 +08:00
lxowalle 9936dbce52 * Discord & Telegram support ASR through groq 2026-02-10 17:11:42 +08:00
lxowalle e17693b17c * First commit 2026-02-09 19:20:19 +08:00