mirror of https://github.com/sipeed/picoclaw.git synced 2026-05-25 16:00:35 +00:00

Files

T

Anton Bogdanovich 794eb04f32 feat(providers): add gemini web search provider (#2763 )

* add gemini web search provider

* fix(web): prefer free providers before Gemini in auto mode

* fix(web): expose gemini api key and model settings

* fix(web): prefer configured providers before Gemini in auto mode

* fix(web): satisfy gemini lint checks

* fix(web): address gemini provider review feedback

* test(web): align auto-provider expectations

* fix(web): let gemini ignore search range

2026-05-14 09:50:47 +08:00

25 KiB

Raw Permalink Blame History

Tools Configuration

PicoClaw's tools configuration is located in the tools field of config.json.

Directory Structure

{
  "tools": {
    "web": {
      ...
    },
    "mcp": {
      ...
    },
    "exec": {
      ...
    },
    "cron": {
      ...
    },
    "skills": {
      ...
    }
  }
}

Sensitive Data Filtering

Before tool results are sent to the LLM, PicoClaw can filter sensitive values (API keys, tokens, secrets) from the output. This prevents the LLM from seeing its own credentials.

See Sensitive Data Filtering for full documentation.

Config	Type	Default	Description
`filter_sensitive_data`	bool	`true`	Enable/disable filtering
`filter_min_length`	int	`8`	Minimum content length to trigger filtering

Web Tools

Web tools are used for web search and fetching.

Web Fetcher

General settings for fetching and processing webpage content.

Config	Type	Default	Description
`enabled`	bool	true	Enable the webpage fetching capability.
`fetch_limit_bytes`	int	10485760	Maximum size of the webpage payload to fetch, in bytes (default is 10MB).
`format`	string	"plaintext"	Output format of the fetched content. Options: `plaintext` or `markdown` (recommended).

Brave

Config	Type	Default	Description
`enabled`	bool	false	Enable Brave search
`api_key`	string	-	Brave Search API key
`api_keys`	string[]	-	Multiple API keys for rotation (takes priority over `api_key`)
`max_results`	int	5	Maximum number of results

DuckDuckGo

Config	Type	Default	Description
`enabled`	bool	true	Enable DuckDuckGo search
`max_results`	int	5	Maximum number of results

Gemini Google Search

Gemini search uses Gemini with Google Search grounding. It returns an AI-synthesized answer with citations from Google Search.

Config	Type	Default	Description
`enabled`	bool	false	Enable Gemini Google Search
`api_key`	string	-	Google Gemini API key
`model`	string	`gemini-2.5-flash`	Gemini model used for search
`max_results`	int	5	Maximum number of citations

{
  "tools": {
    "web": {
      "gemini": {
        "enabled": true,
        "api_key": "YOUR_GEMINI_API_KEY",
        "model": "gemini-2.5-flash",
        "max_results": 5
      }
    }
  }
}

Baidu Search

Baidu Search uses the Qianfan AI Search API, which is AI-powered and optimized for Chinese-language queries.

Config	Type	Default	Description
`enabled`	bool	false	Enable Baidu Search
`api_key`	string	-	Qianfan API key
`base_url`	string	`https://qianfan.baidubce.com/v2/ai_search/web_search`	Baidu Search API URL
`max_results`	int	5	Maximum number of results

{
  "tools": {
    "web": {
      "baidu_search": {
        "enabled": true,
        "api_key": "YOUR_BAIDU_QIANFAN_API_KEY",
        "max_results": 10
      }
    }
  }
}

Perplexity

Config	Type	Default	Description
`enabled`	bool	false	Enable Perplexity search
`api_key`	string	-	Perplexity API key
`api_keys`	string[]	-	Multiple API keys for rotation (takes priority over `api_key`)
`max_results`	int	5	Maximum number of results

Tavily

Config	Type	Default	Description
`enabled`	bool	false	Enable Tavily search
`api_key`	string	-	Tavily API key
`base_url`	string	-	Custom Tavily API base URL
`max_results`	int	5	Maximum number of results

SearXNG

Config	Type	Default	Description
`enabled`	bool	false	Enable SearXNG search
`base_url`	string	`http://localhost:8888`	SearXNG instance URL
`max_results`	int	5	Maximum number of results

GLM Search

Config	Type	Default	Description
`enabled`	bool	false	Enable GLM Search
`api_key`	string	-	GLM API key
`base_url`	string	`https://open.bigmodel.cn/api/paas/v4/web_search`	GLM Search API URL
`search_engine`	string	`search_std`	Search engine type
`max_results`	int	5	Maximum number of results

Additional Web Settings

Config	Type	Default	Description
`prefer_native`	bool	true	Prefer provider's native search over configured search engines
`private_host_whitelist`	string[]	`[]`	Private/internal hosts allowed for web fetching

`web_search` Tool Parameters

At runtime, the web_search tool accepts the following parameters:

Field	Type	Required	Description
`query`	string	yes	Search query string
`count`	integer	no	Number of results to return. Default: `10`, max: `10`
`range`	string	no	Optional time filter: `d` (day), `w` (week), `m` (month), `y` (year)

If range is omitted, PicoClaw performs an unrestricted search.

Example `web_search` Call

{
  "query": "ai agent news",
  "count": 10,
  "range": "w"
}

Exec Tool

The exec tool is used to execute shell commands.

Config	Type	Default	Description
`enabled`	bool	true	Enable the exec tool
`enable_deny_patterns`	bool	true	Enable default dangerous command blocking
`custom_deny_patterns`	array	[]	Custom deny patterns (regular expressions)

Disabling the Exec Tool

To completely disable the exec tool, set enabled to false:

Via config file:

{
  "tools": {
    "exec": {
      "enabled": false
    }
  }
}

Via environment variable:

PICOCLAW_TOOLS_EXEC_ENABLED=false

Note: When disabled, the agent will not be able to execute shell commands. This also affects the Cron tool's ability to run scheduled shell commands.

Functionality

enable_deny_patterns: Set to false to completely disable the default dangerous command blocking patterns
custom_deny_patterns: Add custom deny regex patterns; commands matching these will be blocked

Default Blocked Command Patterns

By default, PicoClaw blocks the following dangerous commands:

Delete commands: rm -rf, del /f/q, rmdir /s
Disk operations: format, mkfs, diskpart, dd if=, writing to /dev/sd*
System operations: shutdown, reboot, poweroff
Command substitution: $(), ${}, backticks
Pipe to shell: | sh, | bash
Privilege escalation: sudo, chmod, chown
Process control: pkill, killall, kill -9
Remote operations: curl | sh, wget | sh, ssh
Package management: apt, yum, dnf, npm install -g, pip install --user
Containers: docker run, docker exec
Git: git push, git force
Other: eval, source *.sh

Known Architectural Limitation

The exec guard only validates the top-level command sent to PicoClaw. It does not recursively inspect child processes spawned by build tools or scripts after that command starts running.

Examples of workflows that can bypass the direct command guard once the initial command is allowed:

make run
go run ./cmd/...
cargo run
npm run build

This means the guard is useful for blocking obviously dangerous direct commands, but it is not a full sandbox for unreviewed build pipelines. If your threat model includes untrusted code in the workspace, use stronger isolation such as containers, VMs, or an approval flow around build-and-run commands.

Configuration Example

{
  "tools": {
    "exec": {
      "enable_deny_patterns": true,
      "custom_deny_patterns": [
        "\\brm\\s+-r\\b",
        "\\bkillall\\s+python"
      ]
    }
  }
}

Cron Tool

The cron tool is used for scheduling periodic tasks.

Config	Type	Default	Description
`enabled`	bool	true	Register the agent-facing cron tool
`allow_command`	bool	true	Allow command jobs without extra confirmation
`exec_timeout_minutes`	int	5	Execution timeout in minutes, 0 means no limit

For schedule types, execution modes (deliver, agent turn, and command jobs), persistence, and the current command-security gates, see Scheduled Tasks and Cron Jobs.

MCP Tool

The MCP tool enables integration with external Model Context Protocol servers.

If you prefer not to edit JSON manually, PicoClaw also provides an MCP configuration manager CLI:

picoclaw mcp add — add or update a server (supports --deferred / --no-deferred)
picoclaw mcp list — list all configured servers with status and deferred state
picoclaw mcp show <name> — show full details and the tool list for one server
picoclaw mcp test <name> — connectivity check for one server
picoclaw mcp remove <name> — remove a server entry
picoclaw mcp edit — open config.json in $EDITOR for advanced edits

These commands manage the same tools.mcp.servers section documented below. See MCP Server CLI for command syntax, examples, and behavior details.

Tool Discovery (Lazy Loading)

When connecting to multiple MCP servers, exposing hundreds of tools simultaneously can exhaust the LLM's context window and increase API costs. The Discovery feature solves this by keeping MCP tools hidden by default.

Instead of loading all tools, the LLM is provided with a lightweight search tool (using BM25 keyword matching or Regex). When the LLM needs a specific capability, it searches the hidden library. Matching tools are then temporarily "unlocked" and injected into the context for a configured number of turns (ttl).

Global Config

Config	Type	Default	Description
`enabled`	bool	false	Enable MCP integration globally
`discovery`	object	`{}`	Configuration for Tool Discovery (see below)
`servers`	object	`{}`	Map of server name to server config

Discovery Config (`discovery`)

Config	Type	Default	Description
`enabled`	bool	false	Global default: if `true`, all MCP tools are hidden and loaded on-demand via search; if `false`, all tools are loaded into context. Individual servers can override this with the per-server `deferred` field.
`ttl`	int	5	Number of conversational turns a discovered tool remains unlocked
`max_search_results`	int	5	Maximum number of tools returned per search query
`use_bm25`	bool	true	Enable the natural language/keyword search tool (`tool_search_tool_bm25`). Warning: consumes more resources than regex search
`use_regex`	bool	false	Enable the regex pattern search tool (`tool_search_tool_regex`)

Note: If discovery.enabled is true, you MUST enable at least one search engine (use_bm25 or use_regex), otherwise the application will fail to start.

Per-Server Config

Config	Type	Required	Description
`enabled`	bool	yes	Enable this MCP server
`deferred`	bool	no	Override deferred mode for this server only. `true` = tools are hidden and discoverable via search; `false` = tools are always visible in context. When omitted, the global `discovery.enabled` value applies.
`type`	string	no	Transport type: `stdio`, `sse`, `http`
`command`	string	stdio	Executable command for stdio transport
`args`	array	no	Command arguments for stdio transport
`env`	object	no	Environment variables for stdio process
`env_file`	string	no	Path to environment file for stdio process
`url`	string	sse/http	Endpoint URL for `sse`/`http` transport
`headers`	object	no	HTTP headers for `sse`/`http` transport

Transport Behavior

If type is omitted, transport is auto-detected:
- url is set → sse
- command is set → stdio
http and sse both use url + optional headers.
env and env_file are only applied to stdio servers.

Configuration Examples

1) Stdio MCP server

{
  "tools": {
    "mcp": {
      "enabled": true,
      "servers": {
        "filesystem": {
          "enabled": true,
          "command": "npx",
          "args": [
            "-y",
            "@modelcontextprotocol/server-filesystem",
            "/tmp"
          ]
        }
      }
    }
  }
}

2) Remote SSE/HTTP MCP server

{
  "tools": {
    "mcp": {
      "enabled": true,
      "servers": {
        "remote-mcp": {
          "enabled": true,
          "type": "sse",
          "url": "https://example.com/mcp",
          "headers": {
            "Authorization": "Bearer YOUR_TOKEN"
          }
        }
      }
    }
  }
}

3) Massive MCP setup with Tool Discovery enabled

In this example, the LLM will only see the tool_search_tool_bm25. It will search and unlock Github or Postgres tools dynamically only when requested by the user.

{
  "tools": {
    "mcp": {
      "enabled": true,
      "discovery": {
        "enabled": true,
        "ttl": 5,
        "max_search_results": 5,
        "use_bm25": true,
        "use_regex": false
      },
      "servers": {
        "github": {
          "enabled": true,
          "command": "npx",
          "args": [
            "-y",
            "@modelcontextprotocol/server-github"
          ],
          "env": {
            "GITHUB_PERSONAL_ACCESS_TOKEN": "YOUR_GITHUB_TOKEN"
          }
        },
        "postgres": {
          "enabled": true,
          "command": "npx",
          "args": [
            "-y",
            "@modelcontextprotocol/server-postgres",
            "postgresql://user:password@localhost/dbname"
          ]
        },
        "slack": {
          "enabled": true,
          "type": "slack",
          "command": "npx",
          "args": [
            "-y",
            "@modelcontextprotocol/server-slack"
          ],
          "env": {
            "SLACK_BOT_TOKEN": "YOUR_SLACK_BOT_TOKEN",
            "SLACK_TEAM_ID": "YOUR_SLACK_TEAM_ID"
          }
        }
      }
    }
  }
}

4) Mixed setup: per-server deferred override

Discovery is enabled globally, but filesystem is pinned as always-visible while context7 follows the global default (deferred). aws explicitly opts in to deferred mode even though it is the same as the global default.

{
  "tools": {
    "mcp": {
      "enabled": true,
      "discovery": {
        "enabled": true,
        "ttl": 5,
        "max_search_results": 5,
        "use_bm25": true
      },
      "servers": {
        "filesystem": {
          "enabled": true,
          "command": "npx",
          "args": ["-y", "@modelcontextprotocol/server-filesystem", "/workspace"],
          "deferred": false
        },
        "context7": {
          "enabled": true,
          "command": "npx",
          "args": ["-y", "@upstash/context7-mcp"]
        },
        "aws": {
          "enabled": true,
          "command": "npx",
          "args": ["-y", "aws-mcp-server"],
          "deferred": true
        }
      }
    }
  }
}

Tip: deferred on a per-server basis is independent of discovery.enabled. You can keep discovery.enabled: false globally (all tools visible by default) and still mark individual high-volume servers as "deferred": true to avoid polluting the context with their tools.

Skills Tool

The skills tool configures skill discovery and installation via registries like ClawHub and GitHub.

Registries

Config	Type	Default	Description
`registries.clawhub.enabled`	bool	true	Enable ClawHub registry
`registries.clawhub.base_url`	string	`https://clawhub.ai`	ClawHub base URL
`registries.clawhub.auth_token`	string	`""`	Optional Bearer token for higher rate limits
`registries.clawhub.search_path`	string	`""`	Search API path
`registries.clawhub.skills_path`	string	`""`	Skills API path
`registries.clawhub.download_path`	string	`""`	Download API path
`registries.clawhub.timeout`	int	0	Request timeout in seconds (0 = default)
`registries.clawhub.max_zip_size`	int	0	Max skill zip size in bytes (0 = default)
`registries.clawhub.max_response_size`	int	0	Max API response size in bytes (0 = default)
`registries.github.enabled`	bool	true	Enable GitHub installs via registry config
`registries.github.base_url`	string	`https://github.com`	GitHub or GitHub Enterprise base URL
`registries.github.auth_token`	string	`""`	GitHub personal access token
`registries.github.proxy`	string	`""`	HTTP proxy for GitHub API requests

Legacy GitHub Config

github.* is deprecated. Use registries.github.* instead. The legacy fields are still supported for compatibility and will be removed later.

Config	Type	Default	Description
`github.base_url`	string	`https://github.com`	Deprecated GitHub base URL
`github.proxy`	string	`""`	Deprecated GitHub proxy
`github.token`	string	`""`	Deprecated GitHub token

Search Settings

Config	Type	Default	Description
`max_concurrent_searches`	int	2	Max concurrent skill search requests
`search_cache.max_size`	int	50	Max cached search results
`search_cache.ttl_seconds`	int	300	Cache TTL in seconds

Configuration Example

{
  "tools": {
    "skills": {
      "registries": {
        "clawhub": {
          "enabled": true,
          "base_url": "https://clawhub.ai",
          "auth_token": "",
	          "search_path": "",
	          "skills_path": "",
	          "download_path": "",
	          "timeout": 0,
	          "max_zip_size": 0,
	          "max_response_size": 0
        },
        "github": {
          "enabled": true,
          "base_url": "https://github.com",
	          "auth_token": "",
	          "proxy": ""
        }
      },
      "github": {
	        "base_url": "https://github.com",
        "proxy": "",
        "token": ""
      },
      "max_concurrent_searches": 2,
      "search_cache": {
        "max_size": 50,
        "ttl_seconds": 300
      }
    }
  }
}

Environment Variables

All configuration options can be overridden via environment variables with the format PICOCLAW_TOOLS_<SECTION>_<KEY>:

For example:

PICOCLAW_TOOLS_WEB_BRAVE_ENABLED=true
PICOCLAW_TOOLS_EXEC_ENABLED=false
PICOCLAW_TOOLS_EXEC_ENABLE_DENY_PATTERNS=false
PICOCLAW_TOOLS_CRON_EXEC_TIMEOUT_MINUTES=10
PICOCLAW_TOOLS_MCP_ENABLED=true
PICOCLAW_TOOLS_MCP_MAX_INLINE_TEXT_CHARS=16384

Note: Nested map-style config (for example tools.mcp.servers.<name>.*) is configured in config.json rather than environment variables.

For MCP tools, tools.mcp.max_inline_text_chars controls how much text result is kept inline in model context. The threshold is counted in Unicode characters (Go runes), not bytes. For example, 16384 means up to 16,384 characters inline, which may occupy more than 16 KB for multibyte text such as CJK. Above this threshold, PicoClaw saves the MCP text result as a local artifact in the agent workspace and gives the model a short note plus a structured [file:...] artifact path instead of injecting the full payload into context.

25 KiB Raw Permalink Blame History

Tools Configuration

Directory Structure

Sensitive Data Filtering

Web Tools

Web Fetcher

Brave

DuckDuckGo

Gemini Google Search

Baidu Search

Perplexity

Tavily

SearXNG

GLM Search

Additional Web Settings

web_search Tool Parameters

Example web_search Call

Exec Tool

Disabling the Exec Tool

Functionality

Default Blocked Command Patterns

Known Architectural Limitation

Configuration Example

Cron Tool

MCP Tool

Tool Discovery (Lazy Loading)

Global Config

Discovery Config (discovery)

Per-Server Config

Transport Behavior

Configuration Examples

1) Stdio MCP server

2) Remote SSE/HTTP MCP server

3) Massive MCP setup with Tool Discovery enabled

4) Mixed setup: per-server deferred override

Skills Tool

Registries

Legacy GitHub Config

Search Settings

Configuration Example

Environment Variables

25 KiB

Raw Permalink Blame History

`web_search` Tool Parameters

Example `web_search` Call

Discovery Config (`discovery`)