feat(mcp): tool search tools (#1243)

* feat(mcp): tool search tools

* removed unused call_discovered_tool

* improvements and optimizations

* fix gate mcp enabled

* fix TOCTOU race BM25 cache version check

* fix encapsulation bypass on registry internals

* safety comment on TickTTL

* added more unit tests

* enhanced logs
This commit is contained in:
Mauro
2026-03-09 18:21:49 +01:00
committed by GitHub
parent c45c5073c0
commit b89f6445d1
12 changed files with 1481 additions and 48 deletions
+124 -27
View File
@@ -7,11 +7,21 @@ PicoClaw's tools configuration is located in the `tools` field of `config.json`.
```json
{
"tools": {
"web": { ... },
"mcp": { ... },
"exec": { ... },
"cron": { ... },
"skills": { ... }
"web": {
...
},
"mcp": {
...
},
"exec": {
...
},
"cron": {
...
},
"skills": {
...
}
}
}
```
@@ -23,7 +33,7 @@ Web tools are used for web search and fetching.
### Brave
| Config | Type | Default | Description |
| ------------- | ------ | ------- | ------------------------- |
|---------------|--------|---------|---------------------------|
| `enabled` | bool | false | Enable Brave search |
| `api_key` | string | - | Brave Search API key |
| `max_results` | int | 5 | Maximum number of results |
@@ -31,14 +41,14 @@ Web tools are used for web search and fetching.
### DuckDuckGo
| Config | Type | Default | Description |
| ------------- | ---- | ------- | ------------------------- |
|---------------|------|---------|---------------------------|
| `enabled` | bool | true | Enable DuckDuckGo search |
| `max_results` | int | 5 | Maximum number of results |
### Perplexity
| Config | Type | Default | Description |
| ------------- | ------ | ------- | ------------------------- |
|---------------|--------|---------|---------------------------|
| `enabled` | bool | false | Enable Perplexity search |
| `api_key` | string | - | Perplexity API key |
| `max_results` | int | 5 | Maximum number of results |
@@ -48,7 +58,7 @@ Web tools are used for web search and fetching.
The exec tool is used to execute shell commands.
| Config | Type | Default | Description |
| ---------------------- | ----- | ------- | ------------------------------------------ |
|------------------------|-------|---------|--------------------------------------------|
| `enable_deny_patterns` | bool | true | Enable default dangerous command blocking |
| `custom_deny_patterns` | array | [] | Custom deny patterns (regular expressions) |
@@ -81,7 +91,10 @@ By default, PicoClaw blocks the following dangerous commands:
"tools": {
"exec": {
"enable_deny_patterns": true,
"custom_deny_patterns": ["\\brm\\s+-r\\b", "\\bkillall\\s+python"]
"custom_deny_patterns": [
"\\brm\\s+-r\\b",
"\\bkillall\\s+python"
]
}
}
}
@@ -92,24 +105,47 @@ By default, PicoClaw blocks the following dangerous commands:
The cron tool is used for scheduling periodic tasks.
| Config | Type | Default | Description |
| ---------------------- | ---- | ------- | ---------------------------------------------- |
|------------------------|------|---------|------------------------------------------------|
| `exec_timeout_minutes` | int | 5 | Execution timeout in minutes, 0 means no limit |
## MCP Tool
The MCP tool enables integration with external Model Context Protocol servers.
### Tool Discovery (Lazy Loading)
When connecting to multiple MCP servers, exposing hundreds of tools simultaneously can exhaust the LLM's context window
and increase API costs. The **Discovery** feature solves this by keeping MCP tools *hidden* by default.
Instead of loading all tools, the LLM is provided with a lightweight search tool (using BM25 keyword matching or Regex).
When the LLM needs a specific capability, it searches the hidden library. Matching tools are then temporarily "unlocked"
and injected into the context for a configured number of turns (`ttl`).
### Global Config
| Config | Type | Default | Description |
| --------- | ------ | ------- | ----------------------------------- |
| `enabled` | bool | false | Enable MCP integration globally |
| `servers` | object | `{}` | Map of server name to server config |
| Config | Type | Default | Description |
|-------------|--------|---------|----------------------------------------------|
| `enabled` | bool | false | Enable MCP integration globally |
| `discovery` | object | `{}` | Configuration for Tool Discovery (see below) |
| `servers` | object | `{}` | Map of server name to server config |
### Discovery Config (`discovery`)
| Config | Type | Default | Description |
|----------------------|------|---------|-----------------------------------------------------------------------------------------------------------------------------------|
| `enabled` | bool | false | If true, MCP tools are hidden and loaded on-demand via search. If false, all tools are loaded |
| `ttl` | int | 5 | Number of conversational turns a discovered tool remains unlocked |
| `max_search_results` | int | 5 | Maximum number of tools returned per search query |
| `use_bm25` | bool | true | Enable the natural language/keyword search tool (`tool_search_tool_bm25`). **Warning**: consumes more resources than regex search |
| `use_regex` | bool | false | Enable the regex pattern search tool (`tool_search_tool_regex`) |
> **Note:** If `discovery.enabled` is `true`, you MUST enable at least one search engine (`use_bm25` or `use_regex`),
> otherwise the application will fail to start.
### Per-Server Config
| Config | Type | Required | Description |
| ---------- | ------ | -------- | ------------------------------------------ |
|------------|--------|----------|--------------------------------------------|
| `enabled` | bool | yes | Enable this MCP server |
| `type` | string | no | Transport type: `stdio`, `sse`, `http` |
| `command` | string | stdio | Executable command for stdio transport |
@@ -122,8 +158,8 @@ The MCP tool enables integration with external Model Context Protocol servers.
### Transport Behavior
- If `type` is omitted, transport is auto-detected:
- `url` is set → `sse`
- `command` is set → `stdio`
- `url` is set → `sse`
- `command` is set → `stdio`
- `http` and `sse` both use `url` + optional `headers`.
- `env` and `env_file` are only applied to `stdio` servers.
@@ -140,7 +176,11 @@ The MCP tool enables integration with external Model Context Protocol servers.
"filesystem": {
"enabled": true,
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
"args": [
"-y",
"@modelcontextprotocol/server-filesystem",
"/tmp"
]
}
}
}
@@ -170,20 +210,76 @@ The MCP tool enables integration with external Model Context Protocol servers.
}
```
#### 3) Massive MCP setup with Tool Discovery enabled
*In this example, the LLM will only see the `tool_search_tool_bm25`. It will search and unlock Github or Postgres tools
dynamically only when requested by the user.*
```json
{
"tools": {
"mcp": {
"enabled": true,
"discovery": {
"enabled": true,
"ttl": 5,
"max_search_results": 5,
"use_bm25": true,
"use_regex": false
},
"servers": {
"github": {
"enabled": true,
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-github"
],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "YOUR_GITHUB_TOKEN"
}
},
"postgres": {
"enabled": true,
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-postgres",
"postgresql://user:password@localhost/dbname"
]
},
"slack": {
"enabled": true,
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-slack"
],
"env": {
"SLACK_BOT_TOKEN": "YOUR_SLACK_BOT_TOKEN",
"SLACK_TEAM_ID": "YOUR_SLACK_TEAM_ID"
}
}
}
}
}
}
```
## Skills Tool
The skills tool configures skill discovery and installation via registries like ClawHub.
### Registries
| Config | Type | Default | Description |
| ---------------------------------- | ------ | -------------------- | ----------------------- |
| `registries.clawhub.enabled` | bool | true | Enable ClawHub registry |
| `registries.clawhub.base_url` | string | `https://clawhub.ai` | ClawHub base URL |
| Config | Type | Default | Description |
|------------------------------------|--------|----------------------|----------------------------------------------|
| `registries.clawhub.enabled` | bool | true | Enable ClawHub registry |
| `registries.clawhub.base_url` | string | `https://clawhub.ai` | ClawHub base URL |
| `registries.clawhub.auth_token` | string | `""` | Optional Bearer token for higher rate limits |
| `registries.clawhub.search_path` | string | `/api/v1/search` | Search API path |
| `registries.clawhub.skills_path` | string | `/api/v1/skills` | Skills API path |
| `registries.clawhub.download_path` | string | `/api/v1/download` | Download API path |
| `registries.clawhub.search_path` | string | `/api/v1/search` | Search API path |
| `registries.clawhub.skills_path` | string | `/api/v1/skills` | Skills API path |
| `registries.clawhub.download_path` | string | `/api/v1/download` | Download API path |
### Configuration Example
@@ -217,4 +313,5 @@ For example:
- `PICOCLAW_TOOLS_CRON_EXEC_TIMEOUT_MINUTES=10`
- `PICOCLAW_TOOLS_MCP_ENABLED=true`
Note: Nested map-style config (for example `tools.mcp.servers.<name>.*`) is configured in `config.json` rather than environment variables.
Note: Nested map-style config (for example `tools.mcp.servers.<name>.*`) is configured in `config.json` rather than
environment variables.