Files
lxowalle 639b32703a feat: support streaming (#2892)
* Support streaming

* fix: stream pico reasoning updates

Route Pico reasoning through the active streamer and hide empty thought placeholders.

* fix: harden configured streaming delivery

* fix ci

* fix split issue
2026-05-19 16:38:47 +08:00

28 KiB

🔌 Providers & Model Configuration

Back to README

Providers

Note

Voice transcription can use a configured multimodal model via voice.model_name. Groq Whisper remains available as a fallback when no voice model is configured.

Provider Purpose Get API Key
gemini LLM (Gemini direct) aistudio.google.com
zhipu LLM (Zhipu direct) bigmodel.cn
zai-coding LLM (Z.AI Coding Plan) z.ai
volcengine LLM(Volcengine direct) volcengine.com
openrouter LLM (recommended, access to all models) openrouter.ai
anthropic LLM (Claude direct) console.anthropic.com
openai LLM (GPT direct) platform.openai.com
venice LLM (Venice AI direct) venice.ai
deepseek LLM (DeepSeek direct) platform.deepseek.com
qwen LLM (Qwen direct) dashscope.console.aliyun.com
groq LLM + Voice transcription (Whisper) console.groq.com
cerebras LLM (Cerebras direct) cerebras.ai
vivgrid LLM (Vivgrid direct) vivgrid.com
nvidia LLM (NVIDIA NIM) build.nvidia.com
moonshot LLM (Kimi/Moonshot direct) platform.moonshot.cn
minimax LLM (Minimax direct) platform.minimaxi.com
avian LLM (Avian direct) avian.io
mistral LLM (Mistral direct) console.mistral.ai
longcat LLM (Longcat direct) longcat.ai
modelscope LLM (ModelScope direct) modelscope.cn
mimo LLM (Xiaomi MiMo direct) platform.xiaomimimo.com

Model Configuration (model_list)

What's New? PicoClaw now prefers explicit provider + native model configuration (for example "provider": "zhipu", "model": "glm-4.7"). The legacy single-field provider/model form remains supported for compatibility when provider is omitted.

For agent dispatch and light-model routing examples, see the Routing Guide.

This design also enables multi-agent support with flexible provider selection:

  • Different agents, different providers: Each agent can use its own LLM provider
  • Model fallbacks: Configure primary and fallback models for resilience
  • Load balancing: Distribute requests across multiple endpoints
  • Centralized configuration: Manage all providers in one place

📋 All Supported Vendors

Vendor provider Value Default API Base Protocol API Key
OpenAI openai https://api.openai.com/v1 OpenAI Get Key
Venice AI venice https://api.venice.ai/api/v1 OpenAI Get Key
Anthropic anthropic https://api.anthropic.com/v1 Anthropic Get Key
智谱 AI (GLM) zhipu https://open.bigmodel.cn/api/paas/v4 OpenAI Get Key
Z.AI Coding Plan openai https://api.z.ai/api/coding/paas/v4 OpenAI Get Key
DeepSeek deepseek https://api.deepseek.com/v1 OpenAI Get Key
Google Gemini gemini https://generativelanguage.googleapis.com/v1beta Gemini Get Key
Groq groq https://api.groq.com/openai/v1 OpenAI Get Key
Moonshot moonshot https://api.moonshot.cn/v1 OpenAI Get Key
通义千问 (Qwen) qwen https://dashscope.aliyuncs.com/compatible-mode/v1 OpenAI Get Key
NVIDIA nvidia https://integrate.api.nvidia.com/v1 OpenAI Get Key
Ollama ollama http://localhost:11434/v1 OpenAI Local (no key needed)
LM Studio lmstudio http://localhost:1234/v1 OpenAI Optional (local default: no key)
OpenRouter openrouter https://openrouter.ai/api/v1 OpenAI Get Key
LiteLLM Proxy litellm http://localhost:4000/v1 OpenAI Your LiteLLM proxy key
VLLM vllm http://localhost:8000/v1 OpenAI Local
Cerebras cerebras https://api.cerebras.ai/v1 OpenAI Get Key
VolcEngine (Doubao) volcengine https://ark.cn-beijing.volces.com/api/v3 OpenAI Get Key
神算云 shengsuanyun https://router.shengsuanyun.com/api/v1 OpenAI -
BytePlus byteplus https://ark.ap-southeast.bytepluses.com/api/v3 OpenAI Get Key
Vivgrid vivgrid https://api.vivgrid.com/v1 OpenAI Get Key
LongCat longcat https://api.longcat.chat/openai OpenAI Get Key
ModelScope (魔搭) modelscope https://api-inference.modelscope.cn/v1 OpenAI Get Token
Xiaomi MiMo mimo https://api.xiaomimimo.com/v1 OpenAI Get Key
Azure OpenAI azure https://{resource}.openai.azure.com Azure Get Key
Antigravity antigravity Google Cloud Custom OAuth only
GitHub Copilot github-copilot localhost:4321 gRPC -

Basic Configuration

{
  "model_list": [
    {
      "model_name": "ark-code-latest",
      "provider": "volcengine",
      "model": "ark-code-latest",
      "api_keys": ["sk-your-api-key"]
    },
    {
      "model_name": "gpt-5.4",
      "provider": "openai",
      "model": "gpt-5.4",
      "api_keys": ["sk-your-openai-key"]
    },
    {
      "model_name": "claude-sonnet-4.6",
      "provider": "anthropic",
      "model": "claude-sonnet-4.6",
      "api_keys": ["sk-ant-your-key"]
    },
    {
      "model_name": "glm-4.7",
      "provider": "zhipu",
      "model": "glm-4.7",
      "api_keys": ["your-zhipu-key"]
    }
  ],
  "agents": {
    "defaults": {
      "model_name": "gpt-5.4"
    }
  }
}

model_list Entry Fields

Field Type Required Description
model_name string Yes Unique name used to reference this model in agent config
provider string No Preferred provider identifier. When present, PicoClaw sends model unchanged to that provider
model string Yes Native model ID when provider is set. If provider is omitted, the legacy provider/model form is still supported
api_keys string[] Yes* API key(s) for authentication. Multiple keys enable per-request rotation. Not required for local providers (Ollama, LM Studio, VLLM)
api_base string No Override the default API endpoint URL
proxy string No HTTP proxy URL for this model entry
user_agent string No Custom User-Agent header sent with API requests (supported by OpenAI-compatible, Gemini, Anthropic, and Azure providers)
request_timeout int No Request timeout in seconds (default varies by provider)
max_tokens_field string No Override the max tokens field name in request body (e.g., max_completion_tokens for o1 models)
thinking_level string No Extended thinking level: off, low, medium, high, xhigh, or adaptive
tool_schema_transform string No Optional compatibility transform for tool parameter schemas. Default: disabled. Supported values: simple.
extra_body object No Additional fields to inject into every request body
custom_headers object No Additional HTTP headers to inject into every request (e.g., {"X-Source":"coding-plan"}). If a key matches a built-in header, the custom value overrides the built-in one (e.g., Authorization, User-Agent, Content-Type, Accept).
streaming.enabled bool No Opt-in for provider streaming on this model entry. Defaults to false and also requires the active channel's settings.streaming.enabled to be true.
rpm int No Per-minute request rate limit
fallbacks string[] No Fallback model names for automatic failover
enabled bool No Whether this model entry is active (default: true)

When streaming is disabled, omit the streaming block. Writing "streaming": {"enabled": false} is optional and not needed in generated or hand-written config.

Tool Schema Compatibility

By default, PicoClaw now forwards tool JSON Schemas unchanged.

Some providers reject advanced JSON Schema features such as $ref, $defs, anyOf, oneOf, allOf, pattern, or numeric/string constraints inside tool declarations. For those models, you can opt into a compatibility transform per model entry with tool_schema_transform.

Use simple when the upstream provider expects the conservative style function schema subset:

{
  "model_name": "gemini-2.5-flash-safe-tools",
  "provider": "gemini",
  "model": "gemini-2.5-flash",
  "api_keys": ["your-gemini-key"],
  "tool_schema_transform": "simple"
}

Notes:

  • Default behavior is disabled. If you omit tool_schema_transform, PicoClaw sends the original tool schema.
  • The setting is per model entry, so you can enable it only for the providers that need it.

Provider / Model Resolution

PicoClaw resolves provider and the runtime model ID using these rules:

  • If provider is set, model is used as-is.
  • If provider is omitted, PicoClaw treats the first / segment in model as the provider and everything after that first / as the runtime model ID.

Examples:

Config Resolved Provider Model Sent Upstream
"provider": "openai", "model": "gpt-5.4" openai gpt-5.4
"model": "openai/gpt-5.4" openai gpt-5.4
"provider": "openrouter", "model": "openai/gpt-5.4" openrouter openai/gpt-5.4
"model": "openrouter/openai/gpt-5.4" openrouter openai/gpt-5.4

Voice Transcription

You can configure a dedicated model for audio transcription with voice.model_name. This lets you reuse existing multimodal providers that support audio input instead of relying only on Groq.

If voice.model_name is not configured, PicoClaw will continue to fall back to Groq transcription when a Groq API key is available.

{
  "model_list": [
    {
      "model_name": "voice-gemini",
      "provider": "gemini",
      "model": "gemini-2.5-flash",
      "api_keys": ["your-gemini-key"]
    }
  ],
  "voice": {
    "model_name": "voice-gemini",
    "echo_transcription": false
  },
  "providers": {
    "groq": {
      "api_key": "gsk_xxx"
    }
  }
}

Vendor-Specific Examples

OpenAI

{
  "model_name": "gpt-5.4",
  "provider": "openai",
  "model": "gpt-5.4",
  "api_keys": ["sk-..."]
}

VolcEngine (Doubao)

{
  "model_name": "ark-code-latest",
  "provider": "volcengine",
  "model": "ark-code-latest",
  "api_keys": ["sk-..."]
}

智谱 AI (GLM)

{
  "model_name": "glm-4.7",
  "provider": "zhipu",
  "model": "glm-4.7",
  "api_keys": ["your-key"]
}

Z.AI Coding Plan (GLM)

Z.AI and 智谱 AI are two brands of the same provider. For the Z.AI Coding Plan use the openai model key and the api base as follows, rather than the zhipu config

{
  "model_name": "glm-4.7",
  "provider": "openai",
  "model": "glm-4.7",
  "api_keys": ["your-z.ai-key"],
  "api_base": "https://api.z.ai/api/coding/paas/v4"
}

DeepSeek

{
  "model_name": "deepseek-chat",
  "provider": "deepseek",
  "model": "deepseek-chat",
  "api_keys": ["sk-..."]
}

Anthropic (with API key)

{
  "model_name": "claude-sonnet-4.6",
  "provider": "anthropic",
  "model": "claude-sonnet-4.6",
  "api_keys": ["sk-ant-your-key"]
}

Run picoclaw auth login --provider anthropic to paste your API token.

Anthropic Messages API (native format)

For direct Anthropic API access or custom endpoints that only support Anthropic's native message format:

{
  "model_name": "claude-opus-4-6",
  "provider": "anthropic-messages",
  "model": "claude-opus-4-6",
  "api_keys": ["sk-ant-your-key"],
  "api_base": "https://api.anthropic.com"
}

Use anthropic-messages protocol when:

  • Using third-party proxies that only support Anthropic's native /v1/messages endpoint (not OpenAI-compatible /v1/chat/completions)
  • Connecting to services like MiniMax, Synthetic that require Anthropic's native message format
  • The existing anthropic protocol returns 404 errors (indicating the endpoint doesn't support OpenAI-compatible format)

Note: The anthropic protocol uses OpenAI-compatible format (/v1/chat/completions), while anthropic-messages uses Anthropic's native format (/v1/messages). Choose based on your endpoint's supported format.

Ollama (local)

{
  "model_name": "llama3",
  "provider": "ollama",
  "model": "llama3"
}

LM Studio (local)

{
  "model_name": "lmstudio-local",
  "provider": "lmstudio",
  "model": "openai/gpt-oss-20b"
}

api_base defaults to http://localhost:1234/v1. API key is optional unless your LM Studio server enables authentication.
With explicit provider, PicoClaw sends openai/gpt-oss-20b unchanged to the LM Studio server. The legacy compatibility form "model": "lmstudio/openai/gpt-oss-20b" still resolves to the same upstream model ID when provider is omitted.

Custom Proxy/API

{
  "model_name": "my-custom-model",
  "provider": "openai",
  "model": "custom-model",
  "api_base": "https://my-proxy.com/v1",
  "api_keys": ["sk-..."],
  "user_agent": "MyApp/1.0",
  "request_timeout": 300
}

LiteLLM Proxy

{
  "model_name": "lite-gpt4",
  "provider": "litellm",
  "model": "lite-gpt4",
  "api_base": "http://localhost:4000/v1",
  "api_keys": ["sk-..."]
}

With explicit provider, PicoClaw sends model unchanged. That means "provider": "litellm", "model": "lite-gpt4" sends lite-gpt4, while "provider": "litellm", "model": "openai/gpt-4o" sends openai/gpt-4o. The legacy compatibility forms litellm/lite-gpt4 and litellm/openai/gpt-4o still resolve the same way when provider is omitted.

Z.AI Coding Plan

If the standard Zhipu endpoint (https://open.bigmodel.cn/api/paas/v4) returns 429 (code 1113: insufficient balance), try using the Z.AI Coding Plan endpoint instead:

{
  "model_name": "glm-4.7",
  "provider": "openai",
  "model": "glm-4.7",
  "api_keys": ["your-zhipu-api-key"],
  "api_base": "https://api.z.ai/api/coding/paas/v4"
}

Note: The Z.AI Coding Plan endpoint and standard Zhipu endpoint use the same API key format but have separate billing. If you encounter 429 errors with the standard Zhipu endpoint, the Z.AI Coding Plan endpoint may have available balance.

Load Balancing

Configure multiple endpoints for the same model name—PicoClaw will automatically round-robin between them:

{
  "model_list": [
    {
      "model_name": "gpt-5.4",
      "provider": "openai",
      "model": "gpt-5.4",
      "api_base": "https://api1.example.com/v1",
      "api_keys": ["sk-key1"]
    },
    {
      "model_name": "gpt-5.4",
      "provider": "openai",
      "model": "gpt-5.4",
      "api_base": "https://api2.example.com/v1",
      "api_keys": ["sk-key2"]
    }
  ]
}

Automatic Model Failover (Cascade)

PicoClaw already supports automatic failover when you configure primary + fallbacks in the agent model settings. The runtime fallback chain retries the next candidate for retriable failures such as HTTP 429, quota/rate-limit errors, and timeout errors. It also applies cooldown tracking per candidate to avoid immediately retrying a recently failed target.

{
  "model_list": [
    {
      "model_name": "qwen-main",
      "provider": "openai",
      "model": "qwen3.5:cloud",
      "api_base": "https://api.example.com/v1",
      "api_keys": ["sk-main"]
    },
    {
      "model_name": "deepseek-backup",
      "provider": "deepseek",
      "model": "deepseek-chat",
      "api_keys": ["sk-backup-1"]
    },
    {
      "model_name": "gemini-backup",
      "provider": "gemini",
      "model": "gemini-2.5-flash",
      "api_keys": ["sk-backup-2"]
    }
  ],
  "agents": {
    "defaults": {
      "model_name": "qwen-main",
      "model_fallbacks": ["deepseek-backup", "gemini-backup"]
    }
  }
}

If you use key-level failover for the same model, PicoClaw can chain through additional key-backed candidates before moving to cross-model backups.

Migration from Legacy providers Config

The old providers configuration is deprecated and has been removed in V2. Existing V0/V1 configs are auto-migrated.

Old Config (deprecated):

{
  "providers": {
    "zhipu": {
      "api_key": "your-key",
      "api_base": "https://open.bigmodel.cn/api/paas/v4"
    }
  },
  "agents": {
    "defaults": {
      "provider": "zhipu",
      "model": "glm-4.7"
    }
  }
}

New Config (recommended):

{
  "version": 3,
  "model_list": [
    {
      "model_name": "glm-4.7",
      "provider": "zhipu",
      "model": "glm-4.7",
      "api_keys": ["your-key"]
    }
  ],
  "agents": {
    "defaults": {
      "model_name": "glm-4.7"
    }
  }
}

For detailed migration guide, see migration/model-list-migration.md.

Provider Architecture

PicoClaw routes providers by protocol family:

  • OpenAI-compatible protocol: OpenRouter, OpenAI-compatible gateways, Groq, Zhipu, and vLLM-style endpoints.
  • Gemini native protocol: Google Gemini via the native models/*:generateContent and models/*:streamGenerateContent endpoints.
  • Anthropic protocol: Claude-native API behavior.
  • Codex/OAuth path: OpenAI OAuth/token authentication route.

This keeps the runtime lightweight while making new OpenAI-compatible backends mostly a config operation (api_base + api_keys).

Zhipu

1. Get API key and base URL

2. Configure

{
  "agents": {
    "defaults": {
      "workspace": "~/.picoclaw/workspace",
      "model_name": "glm-4.7",
      "max_tokens": 8192,
      "temperature": 0.7,
      "max_tool_iterations": 20
    }
  },
  "providers": {
    "zhipu": {
      "api_key": "Your API Key",
      "api_base": "https://open.bigmodel.cn/api/paas/v4"
    }
  }
}

3. Run

picoclaw agent -m "Hello"
Full config example
{
  "agents": {
    "defaults": {
      "model_name": "claude-opus-4-5"
    }
  },
  "session": {
    "dm_scope": "per-channel-peer"
  },
  "providers": {
    "openrouter": {
      "api_key": "sk-or-v1-xxx"
    },
    "groq": {
      "api_key": "gsk_xxx"
    }
  },
  "voice": {
    "model_name": "voice-gemini",
    "echo_transcription": false
  },
  "channel_list": {
    "telegram": {
      "enabled": true,
      "type": "telegram",
      "token": "123456:ABC...",
      "allow_from": ["123456789"]
    },
    "discord": {
      "enabled": true,
      "type": "discord",
      "token": "",
      "allow_from": [""]
    },
    "whatsapp": {
      "enabled": false,
      "type": "whatsapp",
      "bridge_url": "ws://localhost:3001",
      "use_native": false,
      "session_store_path": "",
      "allow_from": []
    },
    "feishu": {
      "enabled": false,
      "type": "feishu",
      "app_id": "cli_xxx",
      "app_secret": "xxx",
      "encrypt_key": "",
      "verification_token": "",
      "allow_from": []
    },
    "qq": {
      "enabled": false,
      "type": "qq",
      "app_id": "",
      "app_secret": "",
      "allow_from": []
    }
  },
  "tools": {
    "web": {
      "brave": {
        "enabled": false,
        "api_key": "BSA...",
        "max_results": 5
      },
      "duckduckgo": {
        "enabled": true,
        "max_results": 5
      },
      "perplexity": {
        "enabled": false,
        "api_key": "",
        "max_results": 5
      },
      "searxng": {
        "enabled": false,
        "base_url": "http://localhost:8888",
        "max_results": 5
      }
    },
    "cron": {
      "exec_timeout_minutes": 5
    }
  },
  "heartbeat": {
    "enabled": true,
    "interval": 30
  }
}

📝 API Key Comparison

Service Pricing Use Case
OpenRouter Free: 200K tokens/month Multiple models (Claude, GPT-4, etc.)
Volcengine CodingPlan ¥9.9/first month Best for Chinese users, multiple SOTA models (Doubao, DeepSeek, etc.)
Zhipu Free: 200K tokens/month Suitable for Chinese users
Brave Search $5/1000 queries Web search functionality
SearXNG Free (self-hosted) Privacy-focused metasearch (70+ engines)
Groq Free tier available Fast inference (Llama, Mixtral)
Cerebras Free tier available Fast inference (Llama, Qwen, etc.)
LongCat Free: up to 5M tokens/day Fast inference
ModelScope Free: 2000 requests/day Inference (Qwen, GLM, DeepSeek, etc.)

PicoClaw Meme