diff --git a/.gitignore b/.gitignore index 6b660e6e7..e798fb31c 100644 --- a/.gitignore +++ b/.gitignore @@ -60,4 +60,9 @@ cmd/telegram/ web/backend/dist/* !web/backend/dist/.gitkeep -.claude/ \ No newline at end of file +<<<<<<< HEAD +.claude/ +======= + +docker/data +>>>>>>> upstream-main diff --git a/README.md b/README.md index 652792d83..4b0852ccd 100644 --- a/README.md +++ b/README.md @@ -747,797 +747,5 @@ User Groups: discord: -PicoClaw center"> - PicoClaw - -

PicoClaw: Ultra-Efficient AI Assistant in Go

- -

$10 Hardware · <10MB RAM · <1s Boot · 皮皮虾,我们走!

-

- Go - Hardware - License -
- Website - Docs - Wiki -
- Twitter - - Discord -

- -[中文](README.zh.md) | [日本語](README.ja.md) | [Português](README.pt-br.md) | [Tiếng Việt](README.vi.md) | [Français](README.fr.md) | [Italiano](README.it.md) | [Bahasa Indonesia](README.id.md) | **English** - - - ---- - -> **PicoClaw** is an independent open-source project initiated by [Sipeed](https://sipeed.com). It is written entirely in **Go** — not a fork of OpenClaw, NanoBot, or any other project. - -🦐 PicoClaw is an ultra-lightweight personal AI Assistant inspired by [NanoBot](https://github.com/HKUDS/nanobot), refactored from the ground up in Go through a self-bootstrapping process, where the AI agent itself drove the entire architectural migration and code optimization. - -⚡️ Runs on $10 hardware with <10MB RAM: That's 99% less memory than OpenClaw and 98% cheaper than a Mac mini! - - - - - - -
-

- -

-
-

- -

-
- -> [!CAUTION] -> **🚨 SECURITY & OFFICIAL CHANNELS / 安全声明** -> -> * **NO CRYPTO:** PicoClaw has **NO** official token/coin. All claims on `pump.fun` or other trading platforms are **SCAMS**. -> -> * **OFFICIAL DOMAIN:** The **ONLY** official website is **[picoclaw.io](https://picoclaw.io)**, and company website is **[sipeed.com](https://sipeed.com)** -> * **Warning:** Many `.ai/.org/.com/.net/...` domains are registered by third parties. -> * **Warning:** picoclaw is in early development now and may have unresolved network security issues. Do not deploy to production environments before the v1.0 release. -> * **Note:** picoclaw has recently merged a lot of PRs, which may result in a larger memory footprint (10–20MB) in the latest versions. We plan to prioritize resource optimization as soon as the current feature set reaches a stable state. - -## 📢 News - -2026-03-17 🚀 **v0.2.3 Released!** System tray UI (Windows & Linux), sub-agent status tracking (`spawn_status`), experimental gateway hot-reload, cron security gates, and 2 security fixes. PicoClaw now at **25K ⭐**! - -2026-03-09 🎉 **v0.2.1 — Biggest update yet!** MCP protocol support, 4 new channels (Matrix/IRC/WeCom/Discord Proxy), 3 new providers (Kimi/Minimax/Avian), vision pipeline, JSONL memory store, and model routing. - -2026-02-28 📦 **v0.2.0** released with Docker Compose support and Web UI launcher. - -2026-02-26 🎉 PicoClaw hit **20K stars** in just 17 days! Channel auto-orchestration and capability interfaces landed. - -
-Older news... - -2026-02-16 🎉 PicoClaw hit 12K stars in one week! Community maintainer roles and [roadmap](ROADMAP.md) officially posted. - -2026-02-13 🎉 PicoClaw hit 5000 stars in 4 days! Project Roadmap and Developer Group setup underway. - -2026-02-09 🎉 **PicoClaw Launched!** Built in 1 day to bring AI Agents to $10 hardware with <10MB RAM. 🦐 PicoClaw,Let's Go! - -
- -## ✨ Features - -🪶 **Ultra-Lightweight**: <10MB Memory footprint — 99% smaller than OpenClaw core functionality.* - -💰 **Minimal Cost**: Efficient enough to run on $10 Hardware — 98% cheaper than a Mac mini. - -⚡️ **Lightning Fast**: 400X Faster startup time, boot in <1 second even on 0.6GHz single core. - -🌍 **True Portability**: Single self-contained binary across RISC-V, ARM, MIPS, and x86, One-click to Go! - -🤖 **AI-Bootstrapped**: Autonomous Go-native implementation — 95% Agent-generated core with human-in-the-loop refinement. - -🔌 **MCP Support**: Native [Model Context Protocol](https://modelcontextprotocol.io/) integration — connect any MCP server to extend agent capabilities. - -👁️ **Vision Pipeline**: Send images and files directly to the agent — automatic base64 encoding for multimodal LLMs. - -🧠 **Smart Routing**: Rule-based model routing — simple queries go to lightweight models, saving API costs. - -_*Recent versions may use 10–20MB due to rapid feature merges. Resource optimization is planned. Startup comparison based on 0.8GHz single-core benchmarks (see table below)._ - -| | OpenClaw | NanoBot | **PicoClaw** | -| ----------------------------- | ------------- | ------------------------ | ----------------------------------------- | -| **Language** | TypeScript | Python | **Go** | -| **RAM** | >1GB | >100MB | **< 10MB*** | -| **Startup**
(0.8GHz core) | >500s | >30s | **<1s** | -| **Cost** | Mac Mini $599 | Most Linux SBC
~$50 | **Any Linux Board**
**As low as $10** | - -PicoClaw - -> 📋 **[Hardware Compatibility List](docs/hardware-compatibility.md)** — See all tested boards, from $5 RISC-V to Raspberry Pi to Android phones. Your board not listed? Submit a PR! - -## 🦾 Demonstration - -### 🛠️ Standard Assistant Workflows - - - - - - - - - - - - - - - - - -

🧩 Full-Stack Engineer

🗂️ Logging & Planning Management

🔎 Web Search & Learning

Develop • Deploy • ScaleSchedule • Automate • MemoryDiscovery • Insights • Trends
- -### 📱 Run on old Android Phones - -Give your decade-old phone a second life! Turn it into a smart AI Assistant with PicoClaw. Quick Start: - -1. **Install [Termux](https://github.com/termux/termux-app)** (Download from [GitHub Releases](https://github.com/termux/termux-app/releases), or search in F-Droid / Google Play). -2. **Execute cmds** - -```bash -# Download the latest release from https://github.com/sipeed/picoclaw/releases -wget https://github.com/sipeed/picoclaw/releases/latest/download/picoclaw_Linux_arm64.tar.gz -tar xzf picoclaw_Linux_arm64.tar.gz -pkg install proot -termux-chroot ./picoclaw onboard -``` - -And then follow the instructions in the "Quick Start" section to complete the configuration! - -PicoClaw - -### 🐜 Innovative Low-Footprint Deploy - -PicoClaw can be deployed on almost any Linux device! - -- $9.9 [LicheeRV-Nano](https://www.aliexpress.com/item/1005006519668532.html) E(Ethernet) or W(WiFi6) version, for Minimal Home Assistant -- $30~50 [NanoKVM](https://www.aliexpress.com/item/1005007369816019.html), or $100 [NanoKVM-Pro](https://www.aliexpress.com/item/1005010048471263.html) for Automated Server Maintenance -- $50 [MaixCAM](https://www.aliexpress.com/item/1005008053333693.html) or $100 [MaixCAM2](https://www.kickstarter.com/projects/zepan/maixcam2-build-your-next-gen-4k-ai-camera) for Smart Monitoring - - - -🌟 More Deployment Cases Await! - -## 📦 Install - -### Install with precompiled binary - -Download the binary for your platform from the [Releases](https://github.com/sipeed/picoclaw/releases) page. - -### Install from source (latest features, recommended for development) - -```bash -git clone https://github.com/sipeed/picoclaw.git - -cd picoclaw -make deps - -# Build, no need to install -make build - -# Build for multiple platforms -make build-all - -# Build for Raspberry Pi Zero 2 W (32-bit: make build-linux-arm; 64-bit: make build-linux-arm64) -make build-pi-zero - -# Build And Install -make install -``` - -**Raspberry Pi Zero 2 W:** Use the binary that matches your OS: 32-bit Raspberry Pi OS → `make build-linux-arm`; 64-bit → `make build-linux-arm64`. Or run `make build-pi-zero` to build both. - -## 📚 Documentation - -For detailed guides, see the docs below. The README covers quick start only. - -```bash -# 1. Clone this repo -git clone https://github.com/sipeed/picoclaw.git -cd picoclaw - -# 2. First run — auto-generates docker/data/config.json then exits -docker compose -f docker/docker-compose.yml --profile gateway up -# The container prints "First-run setup complete." and stops. - -# 3. Set your API keys -vim docker/data/config.json # Set provider API keys, bot tokens, etc. - -# 4. Start -docker compose -f docker/docker-compose.yml --profile gateway up -d -``` - -> [!TIP] -> **Docker Users**: By default, the Gateway listens on `127.0.0.1` which is not accessible from the host. If you need to access the health endpoints or expose ports, set `PICOCLAW_GATEWAY_HOST=0.0.0.0` in your environment or update `config.json`. - -```bash -# 5. Check logs -docker compose -f docker/docker-compose.yml logs -f picoclaw-gateway - -# 6. Stop -docker compose -f docker/docker-compose.yml --profile gateway down -``` - -### Launcher Mode (Web Console) - -The `launcher` image includes all three binaries (`picoclaw`, `picoclaw-launcher`, `picoclaw-launcher-tui`) and starts the web console by default, which provides a browser-based UI for configuration and chat. - -```bash -docker compose -f docker/docker-compose.yml --profile launcher up -d -``` - -Open http://localhost:18800 in your browser. The launcher manages the gateway process automatically. - -> [!WARNING] -> The web console does not yet support authentication. Avoid exposing it to the public internet. - -### Agent Mode (One-shot) - -```bash -# Ask a question -docker compose -f docker/docker-compose.yml run --rm picoclaw-agent -m "What is 2+2?" - -# Interactive mode -docker compose -f docker/docker-compose.yml run --rm picoclaw-agent -``` - -### Update - -```bash -docker compose -f docker/docker-compose.yml pull -docker compose -f docker/docker-compose.yml --profile gateway up -d -``` - -### 🚀 Quick Start - -> [!TIP] -> Set your API Key in `~/.picoclaw/config.json`. Get API Keys: [Volcengine (CodingPlan)](https://console.volcengine.com) (LLM) · [OpenRouter](https://openrouter.ai/keys) (LLM) · [Zhipu](https://open.bigmodel.cn/usercenter/proj-mgmt/apikeys) (LLM). Web search is optional — get a free [Tavily API](https://tavily.com) (1000 free queries/month) or [Brave Search API](https://brave.com/search/api) (2000 free queries/month). - -**1. Initialize** - -```bash -picoclaw onboard -``` - -**2. Configure** (`~/.picoclaw/config.json`) - -```json -{ - "agents": { - "defaults": { - "workspace": "~/.picoclaw/workspace", - "model_name": "gpt-5.4", - "max_tokens": 8192, - "temperature": 0.7, - "max_tool_iterations": 20 - } - }, - "model_list": [ - { - "model_name": "ark-code-latest", - "model": "volcengine/ark-code-latest", - "api_key": "sk-your-api-key" - }, - { - "model_name": "gpt-5.4", - "model": "openai/gpt-5.4", - "api_key": "your-api-key", - "request_timeout": 300 - }, - { - "model_name": "claude-sonnet-4.6", - "model": "anthropic/claude-sonnet-4.6", - "api_key": "your-anthropic-key" - } - ], - "tools": { - "web": { - "brave": { - "enabled": false, - "api_key": "YOUR_BRAVE_API_KEY", - "max_results": 5 - }, - "tavily": { - "enabled": false, - "api_key": "YOUR_TAVILY_API_KEY", - "max_results": 5 - }, - "duckduckgo": { - "enabled": true, - "max_results": 5 - }, - "perplexity": { - "enabled": false, - "api_key": "YOUR_PERPLEXITY_API_KEY", - "max_results": 5 - }, - "searxng": { - "enabled": false, - "base_url": "http://your-searxng-instance:8888", - "max_results": 5 - } - } - } -} -``` - -> **New**: The `model_list` configuration format allows zero-code provider addition. See [Model Configuration](#model-configuration-model_list) for details. -> `request_timeout` is optional and uses seconds. If omitted or set to `<= 0`, PicoClaw uses the default timeout (120s). - -**3. Get API Keys** - -* **LLM Provider**: [OpenRouter](https://openrouter.ai/keys) · [Zhipu](https://open.bigmodel.cn/usercenter/proj-mgmt/apikeys) · [Anthropic](https://console.anthropic.com) · [OpenAI](https://platform.openai.com) · [Gemini](https://aistudio.google.com/api-keys) -* **Web Search** (optional): - * [Brave Search](https://brave.com/search/api) - Paid ($5/1000 queries, ~$5-6/month) - * [Perplexity](https://www.perplexity.ai) - AI-powered search with chat interface - * [SearXNG](https://github.com/searxng/searxng) - Self-hosted metasearch engine (free, no API key needed) - * [Tavily](https://tavily.com) - Optimized for AI Agents (1000 requests/month) - * DuckDuckGo - Built-in fallback (no API key required) - -> **Note**: See `config.example.json` for a complete configuration template. - -**4. Chat** - -```bash -picoclaw agent -m "What is 2+2?" -``` - -That's it! You have a working AI assistant in 2 minutes. - ---- - -## 💬 Chat Apps - -Talk to your picoclaw through Telegram, Discord, WhatsApp, Matrix, QQ, DingTalk, LINE, or WeCom - -> **Note**: All webhook-based channels (LINE, WeCom, etc.) are served on a single shared Gateway HTTP server (`gateway.host`:`gateway.port`, default `127.0.0.1:18790`). There are no per-channel ports to configure. Note: Feishu uses WebSocket/SDK mode and does not use the shared HTTP webhook server. - -| Channel | Setup | -| ------------ | ---------------------------------- | -| **Telegram** | Easy (just a token) | -| **Discord** | Easy (bot token + intents) | -| **WhatsApp** | Easy (native: QR scan; or bridge URL) | -| **Matrix** | Medium (homeserver + bot access token) | -| **QQ** | Easy (AppID + AppSecret) | -| **DingTalk** | Medium (app credentials) | -| **LINE** | Medium (credentials + webhook URL) | -| **WeCom AI Bot** | Medium (Token + AES key) | - -
-Telegram (Recommended) - -**1. Create a bot** - -* Open Telegram, search `@BotFather` -* Send `/newbot`, follow prompts -* Copy the token - -**2. Configure** - -```json -{ - "channels": { - "telegram": { - "enabled": true, - "token": "YOUR_BOT_TOKEN", - "allow_from": ["YOUR_USER_ID"] - } - } -} -``` - -> Get your user ID from `@userinfobot` on Telegram. - -**3. Run** - -```bash -picoclaw gateway -``` - -**4. Telegram command menu (auto-registered at startup)** - -PicoClaw now keeps command definitions in one shared registry. On startup, Telegram will automatically register supported bot commands (for example `/start`, `/help`, `/show`, `/list`) so command menu and runtime behavior stay in sync. -Telegram command menu registration remains channel-local discovery UX; generic command execution is handled centrally in the agent loop via the commands executor. - -If command registration fails (network/API transient errors), the channel still starts and PicoClaw retries registration in the background. - -
- -
-Discord - -**1. Create a bot** - -* Go to -* Create an application → Bot → Add Bot -* Copy the bot token - -**2. Enable intents** - -* In the Bot settings, enable **MESSAGE CONTENT INTENT** -* (Optional) Enable **SERVER MEMBERS INTENT** if you plan to use allow lists based on member data - -**3. Get your User ID** -* Discord Settings → Advanced → enable **Developer Mode** -* Right-click your avatar → **Copy User ID** - -**4. Configure** - -```json -{ - "channels": { - "discord": { - "enabled": true, - "token": "YOUR_BOT_TOKEN", - "allow_from": ["YOUR_USER_ID"] - } - } -} -``` - -**5. Invite the bot** - -* OAuth2 → URL Generator -* Scopes: `bot` -* Bot Permissions: `Send Messages`, `Read Message History` -* Open the generated invite URL and add the bot to your server - -**Optional: Group trigger mode** - -By default the bot responds to all messages in a server channel. To restrict responses to @-mentions only, add: - -```json -{ - "channels": { - "discord": { - "group_trigger": { "mention_only": true } - } - } -} -``` - -You can also trigger by keyword prefixes (e.g. `!bot`): - -```json -{ - "channels": { - "discord": { - "group_trigger": { "prefixes": ["!bot"] } - } - } -} -``` - -**6. Run** - -```bash -picoclaw gateway -``` - -
- -
-WhatsApp (native via whatsmeow) - -PicoClaw can connect to WhatsApp in two ways: - -- **Native (recommended):** In-process using [whatsmeow](https://github.com/tulir/whatsmeow). No separate bridge. Set `"use_native": true` and leave `bridge_url` empty. On first run, scan the QR code with WhatsApp (Linked Devices). Session is stored under your workspace (e.g. `workspace/whatsapp/`). The native channel is **optional** to keep the default binary small; build with `-tags whatsapp_native` (e.g. `make build-whatsapp-native` or `go build -tags whatsapp_native ./cmd/...`). -- **Bridge:** Connect to an external WebSocket bridge. Set `bridge_url` (e.g. `ws://localhost:3001`) and keep `use_native` false. - -**Configure (native)** - -```json -{ - "channels": { - "whatsapp": { - "enabled": true, - "use_native": true, - "session_store_path": "", - "allow_from": [] - } - } -} -``` - -If `session_store_path` is empty, the session is stored in `<workspace>/whatsapp/`. Run `picoclaw gateway`; on first run, scan the QR code printed in the terminal with WhatsApp → Linked Devices. - -
- -
-QQ - -**1. Create a bot** - -- Go to [QQ Open Platform](https://q.qq.com/#) -- Create an application → Get **AppID** and **AppSecret** - -**2. Configure** - -```json -{ - "channels": { - "qq": { - "enabled": true, - "app_id": "YOUR_APP_ID", - "app_secret": "YOUR_APP_SECRET", - "allow_from": [] - } - } -} -``` - -> Set `allow_from` to empty to allow all users, or specify QQ numbers to restrict access. - -**3. Run** - -```bash -picoclaw gateway -``` - -
- -
-DingTalk - -**1. Create a bot** - -* Go to [Open Platform](https://open.dingtalk.com/) -* Create an internal app -* Copy Client ID and Client Secret - -**2. Configure** - -```json -{ - "channels": { - "dingtalk": { - "enabled": true, - "client_id": "YOUR_CLIENT_ID", - "client_secret": "YOUR_CLIENT_SECRET", - "allow_from": [] - } - } -} -``` - -> Set `allow_from` to empty to allow all users, or specify DingTalk user IDs to restrict access. - -**3. Run** - -```bash -picoclaw gateway -``` -
- -
-Matrix - -**1. Prepare bot account** - -* Use your preferred homeserver (e.g. `https://matrix.org` or self-hosted) -* Create a bot user and obtain its access token - -**2. Configure** - -```json -{ - "channels": { - "matrix": { - "enabled": true, - "homeserver": "https://matrix.org", - "user_id": "@your-bot:matrix.org", - "access_token": "YOUR_MATRIX_ACCESS_TOKEN", - "allow_from": [] - } - } -} -``` - -**3. Run** - -```bash -picoclaw gateway -``` - -For full options (`device_id`, `join_on_invite`, `group_trigger`, `placeholder`, `reasoning_channel_id`), see [Matrix Channel Configuration Guide](docs/channels/matrix/README.md). - -
- -
-LINE - -**1. Create a LINE Official Account** - -- Go to [LINE Developers Console](https://developers.line.biz/) -- Create a provider → Create a Messaging API channel -- Copy **Channel Secret** and **Channel Access Token** - -**2. Configure** - -```json -{ - "channels": { - "line": { - "enabled": true, - "channel_secret": "YOUR_CHANNEL_SECRET", - "channel_access_token": "YOUR_CHANNEL_ACCESS_TOKEN", - "webhook_path": "/webhook/line", - "allow_from": [] - } - } -} -``` - -> LINE webhook is served on the shared Gateway server (`gateway.host`:`gateway.port`, default `127.0.0.1:18790`). - -**3. Set up Webhook URL** - -LINE requires HTTPS for webhooks. Use a reverse proxy or tunnel: - -```bash -# Example with ngrok (gateway default port is 18790) -ngrok http 18790 -``` - -Then set the Webhook URL in LINE Developers Console to `https://your-domain/webhook/line` and enable **Use webhook**. - -**4. Run** - -```bash -picoclaw gateway -``` - -> In group chats, the bot responds only when @mentioned. Replies quote the original message. - -
- -
-WeCom (企业微信) - -PicoClaw supports three types of WeCom integration: - -**Option 1: WeCom Bot (Bot)** - Easier setup, supports group chats -**Option 2: WeCom App (Custom App)** - More features, proactive messaging, private chat only -**Option 3: WeCom AI Bot (AI Bot)** - Official AI Bot, streaming replies, supports group & private chat - -See [WeCom AI Bot Configuration Guide](docs/channels/wecom/wecom_aibot/README.zh.md) for detailed setup instructions. - -**Quick Setup - WeCom AI Bot:** - -**1. Create an AI Bot** - -* Go to WeCom Admin Console → AI Bot -* Create a new AI Bot → Set name, avatar, etc. -* Copy **Bot ID** and **Secret** - -**2. Configure** - -```json -{ - "channels": { - "wecom_aibot": { - "enabled": true, - "bot_id": "YOUR_BOT_ID", - "secret": "YOUR_SECRET", - "allow_from": [], - "welcome_message": "Hello! How can I help you?" - } - } -} -``` - -**3. Run** - -```bash -picoclaw gateway -``` - -> **Note**: WeCom AI Bot uses streaming pull protocol — no reply timeout concerns. Long tasks (>30 seconds) automatically switch to `response_url` push delivery. - -
- -## ClawdChat Join the Agent Social Network - -Connect Picoclaw to the Agent Social Network simply by sending a single message via the CLI or any integrated Chat App. - -**Read `https://clawdchat.ai/skill.md` and follow the instructions to join [ClawdChat.ai](https://clawdchat.ai)** - -## 🖥️ CLI Reference - -| Command | Description | -| ------------------------- | ----------------------------- | -| `picoclaw onboard` | Initialize config & workspace | -| `picoclaw agent -m "..."` | Chat with the agent | -| `picoclaw agent` | Interactive chat mode | -| `picoclaw gateway` | Start the gateway | -| `picoclaw status` | Show status | -| `picoclaw version` | Show version info | -| `picoclaw cron list` | List all scheduled jobs | -| `picoclaw cron add ...` | Add a scheduled job | -| `picoclaw cron disable` | Disable a scheduled job | -| `picoclaw cron remove` | Remove a scheduled job | -| `picoclaw skills list` | List installed skills | -| `picoclaw skills install` | Install a skill | -| `picoclaw migrate` | Migrate data from older versions | -| `picoclaw auth login` | Authenticate with providers | - -### Scheduled Tasks / Reminders - -PicoClaw supports scheduled reminders and recurring tasks through the `cron` tool: - -* **One-time reminders**: "Remind me in 10 minutes" → triggers once after 10min -* **Recurring tasks**: "Remind me every 2 hours" → triggers every 2 hours -* **Cron expressions**: "Remind me at 9am daily" → uses cron expression - -## 🤝 Contribute & Roadmap - -PRs welcome! The codebase is intentionally small and readable. 🤗 - -See our full [Community Roadmap](https://github.com/sipeed/picoclaw/blob/main/ROADMAP.md). - -Developer group building, join after your first merged PR! - -User Groups: - -discord: - -PicoClaw - -## ClawdChat Join the Agent Social Network - -Connect Picoclaw to the Agent Social Network simply by sending a single message via the CLI or any integrated Chat App. - -**Read `https://clawdchat.ai/skill.md` and follow the instructions to join [ClawdChat.ai](https://clawdchat.ai)** - -## 🖥️ CLI Reference - -| Command | Description | -| ------------------------- | ----------------------------- | -| `picoclaw onboard` | Initialize config & workspace | -| `picoclaw agent -m "..."` | Chat with the agent | -| `picoclaw agent` | Interactive chat mode | -| `picoclaw gateway` | Start the gateway | -| `picoclaw status` | Show status | -| `picoclaw version` | Show version info | -| `picoclaw cron list` | List all scheduled jobs | -| `picoclaw cron add ...` | Add a scheduled job | -| `picoclaw cron disable` | Disable a scheduled job | -| `picoclaw cron remove` | Remove a scheduled job | -| `picoclaw skills list` | List installed skills | -| `picoclaw skills install` | Install a skill | -| `picoclaw migrate` | Migrate data from older versions | -| `picoclaw auth login` | Authenticate with providers | -| `picoclaw model` | View or switch the default model | - -### Scheduled Tasks / Reminders - -PicoClaw supports scheduled reminders and recurring tasks through the `cron` tool: - -* **One-time reminders**: "Remind me in 10 minutes" → triggers once after 10min -* **Recurring tasks**: "Remind me every 2 hours" → triggers every 2 hours -* **Cron expressions**: "Remind me at 9am daily" → uses cron expression - -## 🤝 Contribute & Roadmap - -PRs welcome! The codebase is intentionally small and readable. 🤗 - -See our full [Community Roadmap](https://github.com/sipeed/picoclaw/blob/main/ROADMAP.md). - -Developer group building, join after your first merged PR! - -User Groups: - -discord: - -PicoClaw +WeChat: +WeChat group QR code diff --git a/assets/wechat.png b/assets/wechat.png index 6512421ed..effb4dab9 100644 Binary files a/assets/wechat.png and b/assets/wechat.png differ diff --git a/cmd/picoclaw/internal/agent/helpers.go b/cmd/picoclaw/internal/agent/helpers.go index c3ddbb77f..0af743bb5 100644 --- a/cmd/picoclaw/internal/agent/helpers.go +++ b/cmd/picoclaw/internal/agent/helpers.go @@ -23,16 +23,16 @@ func agentCmd(message, sessionKey, model string, debug bool) error { sessionKey = "cli:default" } - if debug { - logger.SetLevel(logger.DEBUG) - fmt.Println("🔍 Debug mode enabled") - } - cfg, err := internal.LoadConfig() if err != nil { return fmt.Errorf("error loading config: %w", err) } + if debug { + logger.SetLevel(logger.DEBUG) + fmt.Println("🔍 Debug mode enabled") + } + if model != "" { cfg.Agents.Defaults.ModelName = model } diff --git a/cmd/picoclaw/internal/helpers.go b/cmd/picoclaw/internal/helpers.go index 6b2d65c91..ae1d58c29 100644 --- a/cmd/picoclaw/internal/helpers.go +++ b/cmd/picoclaw/internal/helpers.go @@ -5,6 +5,7 @@ import ( "path/filepath" "github.com/sipeed/picoclaw/pkg/config" + "github.com/sipeed/picoclaw/pkg/logger" ) const Logo = "🦞" @@ -27,7 +28,12 @@ func GetConfigPath() string { } func LoadConfig() (*config.Config, error) { - return config.LoadConfig(GetConfigPath()) + cfg, err := config.LoadConfig(GetConfigPath()) + if err != nil { + return nil, err + } + logger.SetLevelFromString(cfg.Agents.Defaults.LogLevel) + return cfg, nil } // FormatVersion returns the version string with optional git commit diff --git a/config/config.example.json b/config/config.example.json index 81c9014ec..69e8feeae 100644 --- a/config/config.example.json +++ b/config/config.example.json @@ -1,6 +1,7 @@ { "agents": { "defaults": { + "log_level": "fatal", "workspace": "~/.picoclaw/workspace", "restrict_to_workspace": true, "model_name": "gpt-5.4", diff --git a/docker/Dockerfile.heavy b/docker/Dockerfile.heavy new file mode 100644 index 000000000..cbc243e39 --- /dev/null +++ b/docker/Dockerfile.heavy @@ -0,0 +1,67 @@ +# ============================================================ +# Stage 1: Build the picoclaw binary +# ============================================================ +FROM golang:1.26.0-alpine AS builder + +RUN apk add --no-cache git make + +WORKDIR /src + +# Cache dependencies +COPY go.mod go.sum ./ +RUN go mod download + +# Copy source and build +COPY . . +RUN make build + +# ============================================================ +# Stage 2: Node.js runtime with Python + MCP support +# ============================================================ +FROM node:24-alpine3.23 + +RUN apk add --no-cache \ + ca-certificates \ + curl \ + git \ + python3 \ + py3-pip \ + chromium \ + jq + +# Install Playwright browsers for agent-browser +ENV PLAYWRIGHT_BROWSERS_PATH=/opt/playwright-browsers +RUN npm install -g agent-browser && \ + npx playwright install chromium && \ + chmod -R o+rx $PLAYWRIGHT_BROWSERS_PATH + +# Install uv +RUN curl -LsSf https://astral.sh/uv/install.sh | sh && \ + ln -s /root/.local/bin/uv /usr/local/bin/uv && \ + ln -s /root/.local/bin/uvx /usr/local/bin/uvx && \ + uv --version + +# Health check +HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \ + CMD wget -q --spider http://localhost:18790/health || exit 1 + +# Copy binary +COPY --from=builder /src/build/picoclaw /usr/local/bin/picoclaw + +# Reuse existing node user (UID/GID 1000) — rename to picoclaw +RUN deluser node 2>/dev/null; delgroup node 2>/dev/null; \ + addgroup -g 1000 picoclaw 2>/dev/null; \ + adduser -D -u 1000 -G picoclaw -h /home/picoclaw picoclaw 2>/dev/null || true + +USER picoclaw + +# Run onboard to create initial directories and config +RUN /usr/local/bin/picoclaw onboard + +# Copy default workspace +COPY --chown=picoclaw:picoclaw workspace/ /home/picoclaw/.picoclaw/workspace/ + +VOLUME /home/picoclaw/.picoclaw/workspace + +ENTRYPOINT ["picoclaw"] +CMD ["gateway"] diff --git a/go.mod b/go.mod index 744e05e17..cfc930d37 100644 --- a/go.mod +++ b/go.mod @@ -3,13 +3,12 @@ module github.com/sipeed/picoclaw go 1.25.8 require ( - fyne.io/systray v1.12.0 github.com/BurntSushi/toml v1.6.0 + fyne.io/systray v1.12.0 github.com/adhocore/gronx v1.19.6 github.com/anthropics/anthropic-sdk-go v1.26.0 github.com/bwmarrin/discordgo v0.29.0 github.com/caarlos0/env/v11 v11.4.0 - github.com/creack/pty v1.1.9 github.com/ergochat/irc-go v0.6.0 github.com/ergochat/readline v0.1.3 github.com/gdamore/tcell/v2 v2.13.8 diff --git a/go.sum b/go.sum index dc82d46ef..f24b997d4 100644 --- a/go.sum +++ b/go.sum @@ -37,7 +37,6 @@ github.com/coder/websocket v1.8.14 h1:9L0p0iKiNOibykf283eHkKUHHrpG7f65OE3BhhO7v9 github.com/coder/websocket v1.8.14/go.mod h1:NX3SzP+inril6yawo5CQXx8+fk145lPDC6pumgx0mVg= github.com/coreos/go-systemd/v22 v22.5.0/go.mod h1:Y58oyj3AT4RCenI/lSvhwexgC+NSVTIJ3seZv2GcEnc= github.com/cpuguy83/go-md2man/v2 v2.0.6/go.mod h1:oOW0eioCTA6cOiMLiUPZOpcVxMig6NIQQ7OS05n1F4g= -github.com/creack/pty v1.1.9 h1:uDmaGzcdjhF4i/plgjmEsriH11Y0o7RKapEf/LDaM3w= github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E= github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= diff --git a/pkg/agent/instance_test.go b/pkg/agent/instance_test.go index 84cfa81df..b3318ad1f 100644 --- a/pkg/agent/instance_test.go +++ b/pkg/agent/instance_test.go @@ -236,9 +236,8 @@ func TestNewAgentInstance_AllowsMediaTempDirForReadListAndExec(t *testing.T) { t.Fatal("exec tool not registered") } execResult := execTool.Execute(context.Background(), map[string]any{ - "action": "run", - "command": "cat " + filepath.Base(mediaPath), - "cwd": mediaDir, + "command": "cat " + filepath.Base(mediaPath), + "working_dir": mediaDir, }) if execResult.IsError { t.Fatalf("exec should allow media temp dir, got: %s", execResult.ForLLM) diff --git a/pkg/config/config.go b/pkg/config/config.go index 70a52d86a..0bc914f95 100644 --- a/pkg/config/config.go +++ b/pkg/config/config.go @@ -253,6 +253,7 @@ type AgentDefaults struct { SteeringMode string `json:"steering_mode,omitempty" env:"PICOCLAW_AGENTS_DEFAULTS_STEERING_MODE"` // "one-at-a-time" (default) or "all" SubTurn SubTurnConfig `json:"subturn" envPrefix:"PICOCLAW_AGENTS_DEFAULTS_SUBTURN_"` ToolFeedback ToolFeedbackConfig `json:"tool_feedback,omitempty"` + LogLevel string `json:"log_level,omitempty" env:"PICOCLAW_LOG_LEVEL"` } const ( diff --git a/pkg/config/config_test.go b/pkg/config/config_test.go index 588c04645..45906ee70 100644 --- a/pkg/config/config_test.go +++ b/pkg/config/config_test.go @@ -470,6 +470,13 @@ func TestDefaultConfig_CronAllowCommandEnabled(t *testing.T) { } } +func TestDefaultConfig_LogLevel(t *testing.T) { + cfg := DefaultConfig() + if cfg.Agents.Defaults.LogLevel != "fatal" { + t.Errorf("LogLevel = %q, want \"fatal\"", cfg.Agents.Defaults.LogLevel) + } +} + func TestLoadConfig_OpenAIWebSearchDefaultsTrueWhenUnset(t *testing.T) { dir := t.TempDir() configPath := filepath.Join(dir, "config.json") @@ -1057,3 +1064,38 @@ func TestLoadConfig_UsesPassphraseProvider(t *testing.T) { t.Errorf("api_key = %q, want %q", cfg.ModelList[0].APIKey, plainKey) } } + +func TestConfigParsesLogLevel(t *testing.T) { + dir := t.TempDir() + cfgPath := filepath.Join(dir, "config.json") + data := `{"agents":{"defaults":{"log_level":"debug"}}}` + if err := os.WriteFile(cfgPath, []byte(data), 0o600); err != nil { + t.Fatalf("setup: %v", err) + } + + cfg, err := LoadConfig(cfgPath) + if err != nil { + t.Fatalf("LoadConfig: %v", err) + } + if cfg.Agents.Defaults.LogLevel != "debug" { + t.Errorf("LogLevel = %q, want \"debug\"", cfg.Agents.Defaults.LogLevel) + } +} + +func TestConfigLogLevelEmpty(t *testing.T) { + dir := t.TempDir() + cfgPath := filepath.Join(dir, "config.json") + data := `{}` + if err := os.WriteFile(cfgPath, []byte(data), 0o600); err != nil { + t.Fatalf("setup: %v", err) + } + + cfg, err := LoadConfig(cfgPath) + if err != nil { + t.Fatalf("LoadConfig: %v", err) + } + // When config omits log_level, the DefaultConfig value ("fatal") is preserved. + if cfg.Agents.Defaults.LogLevel != "fatal" { + t.Errorf("LogLevel = %q, want \"fatal\"", cfg.Agents.Defaults.LogLevel) + } +} diff --git a/pkg/config/defaults.go b/pkg/config/defaults.go index 791a546e3..8665370f5 100644 --- a/pkg/config/defaults.go +++ b/pkg/config/defaults.go @@ -26,6 +26,7 @@ func DefaultConfig() *Config { return &Config{ Agents: AgentsConfig{ Defaults: AgentDefaults{ + LogLevel: "fatal", Workspace: workspacePath, RestrictToWorkspace: true, Provider: "", diff --git a/pkg/gateway/gateway.go b/pkg/gateway/gateway.go index 9a2706b3b..4ad4e950e 100644 --- a/pkg/gateway/gateway.go +++ b/pkg/gateway/gateway.go @@ -79,16 +79,18 @@ func (p *startupBlockedProvider) GetDefaultModel() string { // Run starts the gateway runtime using the configuration loaded from configPath. func Run(debug bool, configPath string, allowEmptyStartup bool) error { - if debug { - logger.SetLevel(logger.DEBUG) - fmt.Println("🔍 Debug mode enabled") - } - cfg, err := config.LoadConfig(configPath) if err != nil { return fmt.Errorf("error loading config: %w", err) } + logger.SetLevelFromString(cfg.Agents.Defaults.LogLevel) + + if debug { + logger.SetLevel(logger.DEBUG) + fmt.Println("🔍 Debug mode enabled") + } + provider, modelID, err := createStartupProvider(cfg, allowEmptyStartup) if err != nil { return fmt.Errorf("error creating provider: %w", err) diff --git a/pkg/heartbeat/service.go b/pkg/heartbeat/service.go index 09c93fc6b..5dda78ea9 100644 --- a/pkg/heartbeat/service.go +++ b/pkg/heartbeat/service.go @@ -26,6 +26,7 @@ import ( const ( minIntervalMinutes = 5 defaultIntervalMinutes = 30 + userTasksMarker = "Add your heartbeat tasks below this line:" ) // HeartbeatHandler is the function type for handling heartbeat. @@ -232,7 +233,7 @@ func (hs *HeartbeatService) buildPrompt() string { } content := string(data) - if len(content) == 0 { + if !heartbeatHasUserTasks(content) { return "" } @@ -284,6 +285,32 @@ Add your heartbeat tasks below this line: } } +func heartbeatHasUserTasks(content string) bool { + trimmed := strings.TrimSpace(content) + if trimmed == "" { + return false + } + + markerIdx := strings.Index(content, userTasksMarker) + if markerIdx < 0 { + return true + } + + tasksSection := content[markerIdx+len(userTasksMarker):] + for _, line := range strings.Split(tasksSection, "\n") { + trimmedLine := strings.TrimSpace(line) + if trimmedLine == "" { + continue + } + if strings.HasPrefix(trimmedLine, "#") { + continue + } + return true + } + + return false +} + // sendResponse sends the heartbeat response to the last channel func (hs *HeartbeatService) sendResponse(response string) { hs.mu.RLock() diff --git a/pkg/heartbeat/service_test.go b/pkg/heartbeat/service_test.go index 3b7eeeefb..309b4378f 100644 --- a/pkg/heartbeat/service_test.go +++ b/pkg/heartbeat/service_test.go @@ -3,6 +3,7 @@ package heartbeat import ( "os" "path/filepath" + "strings" "testing" "time" @@ -203,3 +204,47 @@ func TestHeartbeatFilePath(t *testing.T) { t.Errorf("Expected HEARTBEAT.md at %s, but it doesn't exist", expectedPath) } } + +func TestBuildPrompt_DefaultTemplateStaysIdle(t *testing.T) { + tmpDir, err := os.MkdirTemp("", "heartbeat-test-*") + if err != nil { + t.Fatalf("Failed to create temp dir: %v", err) + } + defer os.RemoveAll(tmpDir) + + hs := NewHeartbeatService(tmpDir, 30, true) + hs.createDefaultHeartbeatTemplate() + + if prompt := hs.buildPrompt(); prompt != "" { + t.Fatalf("buildPrompt() = %q, want empty prompt for untouched default template", prompt) + } +} + +func TestBuildPrompt_UserTasksAfterMarkerProducePrompt(t *testing.T) { + tmpDir, err := os.MkdirTemp("", "heartbeat-test-*") + if err != nil { + t.Fatalf("Failed to create temp dir: %v", err) + } + defer os.RemoveAll(tmpDir) + + hs := NewHeartbeatService(tmpDir, 30, true) + hs.createDefaultHeartbeatTemplate() + + path := filepath.Join(tmpDir, "HEARTBEAT.md") + data, err := os.ReadFile(path) + if err != nil { + t.Fatalf("Failed to read HEARTBEAT.md: %v", err) + } + updated := string(data) + "\n- Check unread Feishu messages\n" + if err := os.WriteFile(path, []byte(updated), 0o644); err != nil { + t.Fatalf("Failed to update HEARTBEAT.md: %v", err) + } + + prompt := hs.buildPrompt() + if prompt == "" { + t.Fatal("buildPrompt() = empty, want non-empty prompt when user tasks are present") + } + if !strings.Contains(prompt, "Check unread Feishu messages") { + t.Fatalf("prompt = %q, want user task content", prompt) + } +} diff --git a/pkg/identity/identity.go b/pkg/identity/identity.go index 372bbe38b..045725a8d 100644 --- a/pkg/identity/identity.go +++ b/pkg/identity/identity.go @@ -94,13 +94,18 @@ func MatchAllowed(sender bus.SenderInfo, allowed string) bool { return false } -// isNumeric returns true if s consists entirely of digits. +// isNumeric returns true if s consists entirely of digits, allowing for an optional leading minus sign +// (required for Telegram group/channel IDs like -1001234567890). func isNumeric(s string) bool { if s == "" { return false } - for _, r := range s { - if r < '0' || r > '9' { + start := 0 + if s[0] == '-' && len(s) > 1 { + start = 1 + } + for i := start; i < len(s); i++ { + if s[i] < '0' || s[i] > '9' { return false } } diff --git a/pkg/identity/identity_test.go b/pkg/identity/identity_test.go index a588f1484..c60402d19 100644 --- a/pkg/identity/identity_test.go +++ b/pkg/identity/identity_test.go @@ -97,6 +97,15 @@ func TestMatchAllowed(t *testing.T) { allowed: "654321", want: false, }, + { + name: "negative numeric ID matches PlatformID", + sender: bus.SenderInfo{ + Platform: "telegram", + PlatformID: "-1001234567890", + }, + allowed: "-1001234567890", + want: true, + }, // Username matching { name: "@username matches Username", @@ -238,6 +247,9 @@ func TestIsNumeric(t *testing.T) { {"abc", false}, {"12a34", false}, {"telegram", false}, + {"-1001234567890", true}, + {"-", false}, + {"-12a34", false}, } for _, tt := range tests { diff --git a/pkg/logger/logger.go b/pkg/logger/logger.go index c5a1f895a..179804607 100644 --- a/pkg/logger/logger.go +++ b/pkg/logger/logger.go @@ -106,6 +106,36 @@ func GetLevel() LogLevel { return currentLevel } +// ParseLevel converts a case-insensitive level name to a LogLevel. +// Returns the level and true if valid, or (INFO, false) if unrecognized. +func ParseLevel(s string) (LogLevel, bool) { + switch strings.ToLower(strings.TrimSpace(s)) { + case "debug": + return DEBUG, true + case "info": + return INFO, true + case "warn", "warning": + return WARN, true + case "error": + return ERROR, true + case "fatal": + return FATAL, true + default: + return INFO, false + } +} + +// SetLevelFromString sets the log level from a string value. +// If the string is empty or not a recognized level name, the current level is kept. +func SetLevelFromString(s string) { + if s == "" { + return + } + if level, ok := ParseLevel(s); ok { + SetLevel(level) + } +} + func EnableFileLogging(filePath string) error { mu.Lock() defer mu.Unlock() diff --git a/pkg/logger/logger_test.go b/pkg/logger/logger_test.go index 31b40484c..e551db58e 100644 --- a/pkg/logger/logger_test.go +++ b/pkg/logger/logger_test.go @@ -252,3 +252,88 @@ func TestFormatFieldValue(t *testing.T) { }) } } + +func TestDefaultLevelIsInfo(t *testing.T) { + // The package-level default (before any SetLevel call) should be INFO. + // Because earlier tests may have changed it, we just verify the constant is wired correctly. + if logLevelNames[INFO] != "INFO" { + t.Errorf("INFO constant mapped to %q, want \"INFO\"", logLevelNames[INFO]) + } +} + +func TestParseLevelValid(t *testing.T) { + tests := []struct { + input string + want LogLevel + }{ + {"debug", DEBUG}, + {"DEBUG", DEBUG}, + {"Debug", DEBUG}, + {"info", INFO}, + {"INFO", INFO}, + {"warn", WARN}, + {"WARN", WARN}, + {"warning", WARN}, + {"WARNING", WARN}, + {"error", ERROR}, + {"ERROR", ERROR}, + {"fatal", FATAL}, + {"FATAL", FATAL}, + {" info ", INFO}, + } + + for _, tt := range tests { + t.Run(tt.input, func(t *testing.T) { + got, ok := ParseLevel(tt.input) + if !ok { + t.Fatalf("ParseLevel(%q) returned ok=false, want true", tt.input) + } + if got != tt.want { + t.Errorf("ParseLevel(%q) = %v, want %v", tt.input, got, tt.want) + } + }) + } +} + +func TestParseLevelInvalid(t *testing.T) { + tests := []string{"", "garbage", "verbose", "trace", "critical"} + + for _, input := range tests { + t.Run(input, func(t *testing.T) { + _, ok := ParseLevel(input) + if ok { + t.Errorf("ParseLevel(%q) returned ok=true, want false", input) + } + }) + } +} + +func TestSetLevelFromString(t *testing.T) { + initialLevel := GetLevel() + defer SetLevel(initialLevel) + + // Valid string changes the level + SetLevel(INFO) + SetLevelFromString("error") + if got := GetLevel(); got != ERROR { + t.Errorf("after SetLevelFromString(\"error\"): GetLevel() = %v, want ERROR", got) + } + + // Empty string is a no-op + SetLevelFromString("") + if got := GetLevel(); got != ERROR { + t.Errorf("after SetLevelFromString(\"\"): GetLevel() = %v, want ERROR (unchanged)", got) + } + + // Invalid string is a no-op + SetLevelFromString("garbage") + if got := GetLevel(); got != ERROR { + t.Errorf("after SetLevelFromString(\"garbage\"): GetLevel() = %v, want ERROR (unchanged)", got) + } + + // Case-insensitive + SetLevelFromString("FATAL") + if got := GetLevel(); got != FATAL { + t.Errorf("after SetLevelFromString(\"FATAL\"): GetLevel() = %v, want FATAL", got) + } +} diff --git a/pkg/tools/session.go b/pkg/tools/session.go deleted file mode 100644 index e32bc3ddf..000000000 --- a/pkg/tools/session.go +++ /dev/null @@ -1,252 +0,0 @@ -package tools - -import ( - "bytes" - "errors" - "io" - "os" - "sync" - "time" - - "github.com/google/uuid" -) - -const maxOutputBufferSize = 100 * 1024 * 1024 // 100MB - -const outputTruncateMarker = "\n... [output truncated, exceeded 100MB]\n" - -// PtyKeyMode represents arrow key encoding mode for PTY sessions. -// Programs send smkx/rmkx sequences to switch between CSI and SS3 modes. -type PtyKeyMode uint8 - -const ( - PtyKeyModeCSI PtyKeyMode = iota // triggered by rmkx (\x1b[?1l) - PtyKeyModeSS3 // triggered by smkx (\x1b[?1h) -) - -const PtyKeyModeNotFound PtyKeyMode = 255 - -var ( - ErrSessionNotFound = errors.New("session not found") - ErrSessionDone = errors.New("session already completed") - ErrPTYNotSupported = errors.New("PTY is not supported on this platform") - ErrNoStdin = errors.New("no stdin available") -) - -type ProcessSession struct { - mu sync.Mutex - ID string - PID int - Command string - PTY bool - Background bool - StartTime int64 - ExitCode int - Status string - stdinWriter io.Writer - stdoutPipe io.Reader - outputBuffer *bytes.Buffer - outputTruncated bool - ptyMaster *os.File - - // ptyKeyMode tracks arrow key encoding mode (CSI vs SS3) - ptyKeyMode PtyKeyMode -} - -func (s *ProcessSession) IsDone() bool { - s.mu.Lock() - defer s.mu.Unlock() - return s.Status == "done" || s.Status == "exited" -} - -func (s *ProcessSession) GetPtyKeyMode() PtyKeyMode { - s.mu.Lock() - defer s.mu.Unlock() - return s.ptyKeyMode -} - -func (s *ProcessSession) SetPtyKeyMode(mode PtyKeyMode) { - s.mu.Lock() - defer s.mu.Unlock() - s.ptyKeyMode = mode -} - -func (s *ProcessSession) GetStatus() string { - s.mu.Lock() - defer s.mu.Unlock() - return s.Status -} - -func (s *ProcessSession) SetStatus(status string) { - s.mu.Lock() - defer s.mu.Unlock() - s.Status = status -} - -func (s *ProcessSession) GetExitCode() int { - s.mu.Lock() - defer s.mu.Unlock() - return s.ExitCode -} - -func (s *ProcessSession) SetExitCode(code int) { - s.mu.Lock() - defer s.mu.Unlock() - s.ExitCode = code -} - -func (s *ProcessSession) killProcess() error { - s.mu.Lock() - defer s.mu.Unlock() - - if s.Status != "running" { - return ErrSessionDone - } - - pid := s.PID - if pid <= 0 { - return ErrSessionNotFound - } - - if err := killProcessGroup(pid); err != nil { - return err - } - - s.Status = "done" - s.ExitCode = -1 - return nil -} - -func (s *ProcessSession) Kill() error { - return s.killProcess() -} - -func (s *ProcessSession) Write(data string) error { - s.mu.Lock() - defer s.mu.Unlock() - - if s.Status != "running" { - return ErrSessionDone - } - - var writer io.Writer - if s.PTY && s.ptyMaster != nil { - writer = s.ptyMaster - } else if s.stdinWriter != nil { - writer = s.stdinWriter - } else { - return ErrNoStdin - } - - _, err := writer.Write([]byte(data)) - return err -} - -func (s *ProcessSession) Read() string { - s.mu.Lock() - defer s.mu.Unlock() - - if s.outputBuffer.Len() == 0 { - return "" - } - - data := s.outputBuffer.String() - s.outputBuffer.Reset() - return data -} - -func (s *ProcessSession) ToSessionInfo() SessionInfo { - s.mu.Lock() - defer s.mu.Unlock() - - return SessionInfo{ - ID: s.ID, - Command: s.Command, - Status: s.Status, - PID: s.PID, - StartedAt: s.StartTime, - } -} - -type SessionManager struct { - mu sync.RWMutex - sessions map[string]*ProcessSession -} - -func NewSessionManager() *SessionManager { - sm := &SessionManager{ - sessions: make(map[string]*ProcessSession), - } - - // Start cleaner goroutine - runs every 5 minutes, cleans up sessions done for >30 minutes - go func() { - ticker := time.NewTicker(5 * time.Minute) - defer ticker.Stop() - for range ticker.C { - sm.cleanupOldSessions() - } - }() - - return sm -} - -// cleanupOldSessions removes sessions that are done and older than 30 minutes -func (sm *SessionManager) cleanupOldSessions() { - sm.mu.Lock() - defer sm.mu.Unlock() - - cutoff := time.Now().Add(-30 * time.Minute) - for id, session := range sm.sessions { - if session.IsDone() && session.StartTime < cutoff.Unix() { - delete(sm.sessions, id) - } - } -} - -func (sm *SessionManager) Add(session *ProcessSession) { - sm.mu.Lock() - defer sm.mu.Unlock() - sm.sessions[session.ID] = session -} - -func (sm *SessionManager) Get(sessionID string) (*ProcessSession, error) { - sm.mu.RLock() - defer sm.mu.RUnlock() - - session, ok := sm.sessions[sessionID] - if !ok { - return nil, ErrSessionNotFound - } - - return session, nil -} - -func (sm *SessionManager) Remove(sessionID string) { - sm.mu.Lock() - defer sm.mu.Unlock() - delete(sm.sessions, sessionID) -} - -func (sm *SessionManager) List() []SessionInfo { - sm.mu.RLock() - defer sm.mu.RUnlock() - - result := make([]SessionInfo, 0, len(sm.sessions)) - for _, session := range sm.sessions { - result = append(result, session.ToSessionInfo()) - } - - return result -} - -func generateSessionID() string { - return uuid.New().String()[:8] -} - -type SessionInfo struct { - ID string `json:"id"` - Command string `json:"command"` - Status string `json:"status"` - PID int `json:"pid"` - StartedAt int64 `json:"startedAt"` -} diff --git a/pkg/tools/session_process_unix.go b/pkg/tools/session_process_unix.go deleted file mode 100644 index 2fe30166e..000000000 --- a/pkg/tools/session_process_unix.go +++ /dev/null @@ -1,14 +0,0 @@ -//go:build !windows - -package tools - -import ( - "syscall" -) - -func killProcessGroup(pid int) error { - if err := syscall.Kill(-pid, syscall.SIGKILL); err != nil { - _ = syscall.Kill(pid, syscall.SIGKILL) - } - return nil -} diff --git a/pkg/tools/session_process_windows.go b/pkg/tools/session_process_windows.go deleted file mode 100644 index 7cf558954..000000000 --- a/pkg/tools/session_process_windows.go +++ /dev/null @@ -1,13 +0,0 @@ -//go:build windows - -package tools - -import ( - "os/exec" - "strconv" -) - -func killProcessGroup(pid int) error { - _ = exec.Command("taskkill", "/T", "/F", "/PID", strconv.Itoa(pid)).Run() - return nil -} diff --git a/pkg/tools/session_test.go b/pkg/tools/session_test.go deleted file mode 100644 index 6cfe72a10..000000000 --- a/pkg/tools/session_test.go +++ /dev/null @@ -1,99 +0,0 @@ -package tools - -import ( - "testing" - - "github.com/stretchr/testify/require" -) - -func TestSessionManager_AddGet(t *testing.T) { - sm := NewSessionManager() - session := &ProcessSession{ - ID: "test-1", - Command: "echo hello", - Status: "running", - StartTime: 1000, - } - - sm.Add(session) - - got, err := sm.Get("test-1") - require.NoError(t, err) - require.Equal(t, "test-1", got.ID) -} - -func TestSessionManager_Remove(t *testing.T) { - sm := NewSessionManager() - session := &ProcessSession{ - ID: "test-1", - Command: "echo hello", - Status: "running", - StartTime: 1000, - } - sm.Add(session) - sm.Remove("test-1") - - _, err := sm.Get("test-1") - require.ErrorIs(t, err, ErrSessionNotFound) -} - -func TestSessionManager_List(t *testing.T) { - sm := NewSessionManager() - sm.Add(&ProcessSession{ - ID: "test-1", - Command: "echo hello", - Status: "running", - StartTime: 1000, - }) - sm.Add(&ProcessSession{ - ID: "test-2", - Command: "echo world", - Status: "running", - StartTime: 1001, - }) - sm.Add(&ProcessSession{ - ID: "test-3", - Command: "echo done", - Status: "done", - StartTime: 1002, - }) - - sessions := sm.List() - require.Len(t, sessions, 3) - - ids := make(map[string]bool) - for _, s := range sessions { - ids[s.ID] = true - } - require.True(t, ids["test-1"]) - require.True(t, ids["test-2"]) - require.True(t, ids["test-3"]) -} - -func TestProcessSession_IsDone(t *testing.T) { - session := &ProcessSession{Status: "running"} - require.False(t, session.IsDone()) - - session.Status = "done" - require.True(t, session.IsDone()) - - session.Status = "exited" - require.True(t, session.IsDone()) -} - -func TestProcessSession_ToSessionInfo(t *testing.T) { - session := &ProcessSession{ - ID: "test-1", - PID: 12345, - Command: "echo hello", - Status: "running", - StartTime: 1000, - } - - info := session.ToSessionInfo() - require.Equal(t, "test-1", info.ID) - require.Equal(t, "echo hello", info.Command) - require.Equal(t, "running", info.Status) - require.Equal(t, 12345, info.PID) - require.Equal(t, int64(1000), info.StartedAt) -} diff --git a/pkg/tools/shell.go b/pkg/tools/shell.go index f3869cc1c..78ad2b26d 100644 --- a/pkg/tools/shell.go +++ b/pkg/tools/shell.go @@ -3,37 +3,20 @@ package tools import ( "bytes" "context" - "encoding/json" "errors" "fmt" - "io" "os" "os/exec" "path/filepath" "regexp" "runtime" "strings" - "sync" - "syscall" "time" - "github.com/creack/pty" - "github.com/sipeed/picoclaw/pkg/config" "github.com/sipeed/picoclaw/pkg/constants" ) -var ( - globalSessionManager = NewSessionManager() - sessionManagerMu sync.RWMutex -) - -func getSessionManager() *SessionManager { - sessionManagerMu.RLock() - defer sessionManagerMu.RUnlock() - return globalSessionManager -} - type ExecTool struct { workingDir string timeout time.Duration @@ -43,7 +26,6 @@ type ExecTool struct { allowedPathPatterns []*regexp.Regexp restrictToWorkspace bool allowRemote bool - sessionManager *SessionManager } var ( @@ -163,7 +145,7 @@ func NewExecToolWithConfig( denyPatterns = append(denyPatterns, defaultDenyPatterns...) } - var timeout time.Duration + timeout := 60 * time.Second if config != nil && config.Tools.Exec.TimeoutSeconds > 0 { timeout = time.Duration(config.Tools.Exec.TimeoutSeconds) * time.Second } @@ -177,7 +159,6 @@ func NewExecToolWithConfig( allowedPathPatterns: allowedPathPatterns, restrictToWorkspace: restrict, allowRemote: allowRemote, - sessionManager: getSessionManager(), }, nil } @@ -186,146 +167,27 @@ func (t *ExecTool) Name() string { } func (t *ExecTool) Description() string { - return `Execute shell commands. Use background=true for long-running commands (returns sessionId). Use pty=true for interactive commands (can combine with background=true). Use poll/read/write/send-keys/kill with sessionId to manage background sessions. Sessions auto-cleanup 30 minutes after process exits; use kill to terminate early. Output buffer limit: 100MB.` + return "Execute a shell command and return its output. Use with caution." } func (t *ExecTool) Parameters() map[string]any { return map[string]any{ - "oneOf": []map[string]any{ - { - "type": "object", - "properties": map[string]any{ - "action": map[string]any{"const": "run", "description": "Execute a shell command"}, - "command": map[string]any{"type": "string", "description": "Shell command to execute"}, - "background": map[string]any{ - "type": "string", - "description": "Run in background immediately", - }, - "pty": map[string]any{ - "type": "string", - "description": "Run in a pseudo-terminal (PTY) when available", - }, - "cwd": map[string]any{ - "type": "string", - "description": "Working directory for the command", - }, - "timeout": map[string]any{ - "type": "integer", - "description": "Timeout in seconds (default: 0 = no timeout, kills process on expiry)", - }, - }, - "required": []string{"action", "command"}, + "type": "object", + "properties": map[string]any{ + "command": map[string]any{ + "type": "string", + "description": "The shell command to execute", }, - { - "type": "object", - "properties": map[string]any{ - "action": map[string]any{"const": "list", "description": "List all active sessions"}, - }, - "required": []string{"action"}, - }, - { - "type": "object", - "properties": map[string]any{ - "action": map[string]any{ - "const": "poll", - "description": "Check session status. Returns: {sessionId, status: running|done, exitCode}. exitCode only meaningful when status=done", - }, - "sessionId": map[string]any{ - "type": "string", - "description": "Session ID returned from background command", - }, - }, - "required": []string{"action", "sessionId"}, - }, - { - "type": "object", - "properties": map[string]any{ - "action": map[string]any{ - "const": "read", - "description": "Read output from session. Returns: {sessionId, output, status: running|done}", - }, - "sessionId": map[string]any{ - "type": "string", - "description": "Session ID returned from background command", - }, - }, - "required": []string{"action", "sessionId"}, - }, - { - "type": "object", - "properties": map[string]any{ - "action": map[string]any{ - "const": "write", - "description": "Send input to session stdin (only when status=running)", - }, - "sessionId": map[string]any{ - "type": "string", - "description": "Session ID returned from background command", - }, - "data": map[string]any{"type": "string", "description": "Data to write to session stdin."}, - }, - "required": []string{"action", "sessionId", "data"}, - }, - { - "type": "object", - "properties": map[string]any{ - "action": map[string]any{"const": "kill", "description": "Terminate session"}, - "sessionId": map[string]any{ - "type": "string", - "description": "Session ID returned from background command", - }, - }, - "required": []string{"action", "sessionId"}, - }, - { - "type": "object", - "properties": map[string]any{ - "action": map[string]any{ - "const": "send-keys", - "description": "Send special keys to PTY session. Keys: down/up/left/right/enter/escape/tab/backspace/ctrl-c/ctrl-d/ctrl-z. Multiple keys separated by comma", - }, - "sessionId": map[string]any{ - "type": "string", - "description": "Session ID returned from background command", - }, - "keys": map[string]any{ - "type": "string", - "description": "Comma-separated key names (optional spaces around comma). Valid keys: up, down, left, right, enter, tab, escape, backspace, ctrl-c, ctrl-d, home, end, pageup, pagedown, f1-f12.", - }, - }, - "required": []string{"action", "sessionId", "keys"}, + "working_dir": map[string]any{ + "type": "string", + "description": "Optional working directory for the command", }, }, + "required": []string{"command"}, } } func (t *ExecTool) Execute(ctx context.Context, args map[string]any) *ToolResult { - action, _ := args["action"].(string) - if action == "" { - return ErrorResult("action is required") - } - - switch action { - case "run": - return t.executeRun(ctx, args) - case "list": - return t.executeList() - case "poll": - return t.executePoll(args) - case "read": - return t.executeRead(args) - case "write": - return t.executeWrite(args) - case "kill": - return t.executeKill(args) - case "send-keys": - return t.executeSendKeys(args) - default: - return ErrorResult(fmt.Sprintf("unknown action: %s", action)) - } -} - -func (t *ExecTool) executeRun(ctx context.Context, args map[string]any) *ToolResult { command, ok := args["command"].(string) if !ok { return ErrorResult("command is required") @@ -344,26 +206,8 @@ func (t *ExecTool) executeRun(ctx context.Context, args map[string]any) *ToolRes } } - getBoolArg := func(key string) bool { - switch v := args[key].(type) { - case bool: - return v - case string: - return v == "true" - } - return false - } - isPty := getBoolArg("pty") - isBackground := getBoolArg("background") - - if isPty { - if runtime.GOOS == "windows" { - return ErrorResult("PTY is not supported on Windows. Use background=true without pty.") - } - } - cwd := t.workingDir - if wd, ok := args["cwd"].(string); ok && wd != "" { + if wd, ok := args["working_dir"].(string); ok && wd != "" { if t.restrictToWorkspace && t.workingDir != "" { resolvedWD, err := validatePathWithAllowPaths(wd, t.workingDir, true, t.allowedPathPatterns) if err != nil { @@ -409,14 +253,6 @@ func (t *ExecTool) executeRun(ctx context.Context, args map[string]any) *ToolRes } } - if isBackground { - return t.runBackground(ctx, command, cwd, isPty) - } - - return t.runSync(ctx, command, cwd) -} - -func (t *ExecTool) runSync(ctx context.Context, command, cwd string) *ToolResult { // timeout == 0 means no timeout var cmdCtx context.Context var cancel context.CancelFunc @@ -525,560 +361,6 @@ func (t *ExecTool) runSync(ctx context.Context, command, cwd string) *ToolResult } } -func (t *ExecTool) runBackground(ctx context.Context, command, cwd string, ptyEnabled bool) *ToolResult { - sessionID := generateSessionID() - session := &ProcessSession{ - ID: sessionID, - Command: command, - PTY: ptyEnabled, - Background: true, - StartTime: time.Now().Unix(), - Status: "running", - ptyKeyMode: PtyKeyModeCSI, - } - - var cmd *exec.Cmd - if runtime.GOOS == "windows" { - cmd = exec.Command("powershell", "-NoProfile", "-NonInteractive", "-Command", command) - } else { - cmd = exec.Command("sh", "-c", command) - } - if cwd != "" { - cmd.Dir = cwd - } - - prepareCommandForTermination(cmd) - - var stdoutReader io.ReadCloser - var stderrReader io.ReadCloser - var stdinWriter io.WriteCloser - - if ptyEnabled { - ptmx, tty, err := pty.Open() - if err != nil { - return ErrorResult(fmt.Sprintf("failed to create PTY: %v", err)) - } - - cmd.Stdin = tty - cmd.Stdout = tty - cmd.Stderr = tty - - // For PTY, we need Setsid to create a new session. - // Note: Setsid and Setpgid conflict, so we must replace SysProcAttr entirely. - cmd.SysProcAttr = &syscall.SysProcAttr{Setsid: true} - - session.ptyMaster = ptmx - } else { - var err error - stdoutReader, err = cmd.StdoutPipe() - if err != nil { - return ErrorResult(fmt.Sprintf("failed to create stdout pipe: %v", err)) - } - stderrReader, err = cmd.StderrPipe() - if err != nil { - return ErrorResult(fmt.Sprintf("failed to create stderr pipe: %v", err)) - } - stdinWriter, err = cmd.StdinPipe() - if err != nil { - return ErrorResult(fmt.Sprintf("failed to create stdin pipe: %v", err)) - } - session.stdoutPipe = io.MultiReader(stdoutReader, stderrReader) - session.stdinWriter = stdinWriter - } - - if err := cmd.Start(); err != nil { - if session.ptyMaster != nil { - session.ptyMaster.Close() - } - return ErrorResult(fmt.Sprintf("failed to start command: %v", err)) - } - - session.PID = cmd.Process.Pid - t.sessionManager.Add(session) - - session.outputBuffer = &bytes.Buffer{} - - // PTY mode: read from ptyMaster and wait for process - // Note: On Linux, closing ptyMaster doesn't interrupt blocking Read() calls, - // so we need cmd.Wait() in a separate goroutine to detect process exit. - if session.PTY && session.ptyMaster != nil { - go func() { - cmd.Wait() // Wait for process to exit - session.mu.Lock() - if cmd.ProcessState != nil { - session.ExitCode = cmd.ProcessState.ExitCode() - } - session.Status = "done" - session.mu.Unlock() - }() - - go func() { - buf := make([]byte, 4096) - for { - n, err := session.ptyMaster.Read(buf) - if n > 0 { - raw := string(buf[:n]) - if mode := detectPtyKeyMode(raw); mode != PtyKeyModeNotFound && mode != session.GetPtyKeyMode() { - session.SetPtyKeyMode(mode) - } - - session.mu.Lock() - if session.outputBuffer.Len() >= maxOutputBufferSize { - if !session.outputTruncated { - session.outputBuffer.WriteString(outputTruncateMarker) - session.outputTruncated = true - } - } else { - session.outputBuffer.Write(buf[:n]) - } - session.mu.Unlock() - } - if err != nil { - break - } - } - }() - } else { - // Non-PTY mode: single goroutine reads pipes. - // When Read() returns EOF (pipe closed), we break. - // When process exits, OS closes pipe write end → Read() returns EOF → we exit. - go func() { - buf := make([]byte, 4096) - - // Read stdout - for { - n, err := stdoutReader.Read(buf) - if n > 0 { - session.mu.Lock() - if session.outputBuffer.Len() >= maxOutputBufferSize { - if !session.outputTruncated { - session.outputBuffer.WriteString(outputTruncateMarker) - session.outputTruncated = true - } - } else { - session.outputBuffer.Write(buf[:n]) - } - session.mu.Unlock() - } - if err != nil { - break - } - } - - // Read stderr - for { - n, err := stderrReader.Read(buf) - if n > 0 { - session.mu.Lock() - if session.outputBuffer.Len() >= maxOutputBufferSize { - if !session.outputTruncated { - session.outputBuffer.WriteString(outputTruncateMarker) - session.outputTruncated = true - } - } else { - session.outputBuffer.Write(buf[:n]) - } - session.mu.Unlock() - } - if err != nil { - break - } - } - - // All pipes closed, get exit status - if stdinWriter != nil { - stdinWriter.Close() - } - cmd.Wait() - - session.mu.Lock() - if cmd.ProcessState != nil { - session.ExitCode = cmd.ProcessState.ExitCode() - } - session.Status = "done" - session.mu.Unlock() - }() - } - - resp := ExecResponse{ - SessionID: sessionID, - Status: "running", - } - data, _ := json.Marshal(resp) - return &ToolResult{ - ForLLM: string(data), - ForUser: fmt.Sprintf("Session %s started", sessionID), - IsError: false, - } -} - -func (t *ExecTool) executeList() *ToolResult { - sessions := t.sessionManager.List() - resp := ExecResponse{ - Sessions: sessions, - } - data, _ := json.Marshal(resp) - return &ToolResult{ - ForLLM: string(data), - ForUser: fmt.Sprintf("%d active sessions", len(sessions)), - IsError: false, - } -} - -func (t *ExecTool) executePoll(args map[string]any) *ToolResult { - sessionID, ok := args["sessionId"].(string) - if !ok { - return ErrorResult("sessionId is required") - } - - session, err := t.sessionManager.Get(sessionID) - if err != nil { - if errors.Is(err, ErrSessionNotFound) { - return ErrorResult(fmt.Sprintf("session not found: %s", sessionID)) - } - return ErrorResult(err.Error()) - } - - resp := ExecResponse{ - SessionID: sessionID, - Status: session.GetStatus(), - ExitCode: session.GetExitCode(), - } - data, _ := json.Marshal(resp) - return &ToolResult{ - ForLLM: string(data), - IsError: false, - } -} - -func (t *ExecTool) executeRead(args map[string]any) *ToolResult { - sessionID, ok := args["sessionId"].(string) - if !ok { - return ErrorResult("sessionId is required") - } - - session, err := t.sessionManager.Get(sessionID) - if err != nil { - if errors.Is(err, ErrSessionNotFound) { - return ErrorResult(fmt.Sprintf("session not found: %s", sessionID)) - } - return ErrorResult(err.Error()) - } - - output := session.Read() - - resp := ExecResponse{ - SessionID: sessionID, - Output: output, - Status: session.GetStatus(), - } - data, _ := json.Marshal(resp) - return &ToolResult{ - ForLLM: string(data), - IsError: false, - } -} - -func (t *ExecTool) executeWrite(args map[string]any) *ToolResult { - sessionID, ok := args["sessionId"].(string) - if !ok { - return ErrorResult("sessionId is required") - } - - data, ok := args["data"].(string) - if !ok { - return ErrorResult("data is required") - } - - session, err := t.sessionManager.Get(sessionID) - if err != nil { - if errors.Is(err, ErrSessionNotFound) { - return ErrorResult(fmt.Sprintf("session not found: %s", sessionID)) - } - return ErrorResult(err.Error()) - } - - if session.IsDone() { - return ErrorResult(fmt.Sprintf("process already exited with code %d", session.GetExitCode())) - } - - if err := session.Write(data); err != nil { - if errors.Is(err, ErrSessionDone) { - return ErrorResult(fmt.Sprintf("process already exited with code %d", session.GetExitCode())) - } - return ErrorResult(fmt.Sprintf("failed to write to session: %v", err)) - } - - resp := ExecResponse{ - SessionID: sessionID, - Status: session.GetStatus(), - } - respData, _ := json.Marshal(resp) - return &ToolResult{ - ForLLM: string(respData), - IsError: false, - } -} - -func (t *ExecTool) executeKill(args map[string]any) *ToolResult { - sessionID, ok := args["sessionId"].(string) - if !ok { - return ErrorResult("sessionId is required") - } - - session, err := t.sessionManager.Get(sessionID) - if err != nil { - if errors.Is(err, ErrSessionNotFound) { - return ErrorResult(fmt.Sprintf("session not found: %s", sessionID)) - } - return ErrorResult(err.Error()) - } - - if session.IsDone() { - return ErrorResult(fmt.Sprintf("process already exited with code %d", session.GetExitCode())) - } - - if err := session.Kill(); err != nil { - return ErrorResult(fmt.Sprintf("failed to kill session: %v", err)) - } - - t.sessionManager.Remove(sessionID) - - resp := ExecResponse{ - SessionID: sessionID, - Status: "done", - } - data, _ := json.Marshal(resp) - return &ToolResult{ - ForLLM: string(data), - ForUser: fmt.Sprintf("Session %s killed", sessionID), - IsError: false, - } -} - -// keyMap maps key names to their escape sequences. -var keyMap = map[string]string{ - "enter": "\r", - "return": "\r", - "tab": "\t", - "escape": "\x1b", - "esc": "\x1b", - "space": " ", - "backspace": "\x7f", - "bspace": "\x7f", - "up": "\x1b[A", - "down": "\x1b[B", - "right": "\x1b[C", - "left": "\x1b[D", - "home": "\x1b[1~", - "end": "\x1b[4~", - "pageup": "\x1b[5~", - "pagedown": "\x1b[6~", - "pgup": "\x1b[5~", - "pgdn": "\x1b[6~", - "insert": "\x1b[2~", - "ic": "\x1b[2~", - "delete": "\x1b[3~", - "del": "\x1b[3~", - "dc": "\x1b[3~", - "btab": "\x1b[Z", - "f1": "\x1bOP", - "f2": "\x1bOQ", - "f3": "\x1bOR", - "f4": "\x1bOS", - "f5": "\x1b[15~", - "f6": "\x1b[17~", - "f7": "\x1b[18~", - "f8": "\x1b[19~", - "f9": "\x1b[20~", - "f10": "\x1b[21~", - "f11": "\x1b[23~", - "f12": "\x1b[24~", -} - -// ss3KeysMap maps key names to SS3 escape sequences -var ss3KeysMap = map[string]string{ - "up": "\x1bOA", - "down": "\x1bOB", - "right": "\x1bOC", - "left": "\x1bOD", - "home": "\x1bOH", - "end": "\x1bOF", -} - -func detectPtyKeyMode(raw string) PtyKeyMode { - const SMKX = "\x1b[?1h" - const RMKX = "\x1b[?1l" - - lastSmkx := strings.LastIndex(raw, SMKX) - lastRmkx := strings.LastIndex(raw, RMKX) - - if lastSmkx == -1 && lastRmkx == -1 { - return PtyKeyModeNotFound - } - - if lastSmkx > lastRmkx { - return PtyKeyModeSS3 - } - return PtyKeyModeCSI -} - -// encodeKeyToken encodes a single key token into its escape sequence. -// Supports: -// - Named keys: "enter", "tab", "up", "ctrl-c", "alt-x", etc. -// - Ctrl modifier: "ctrl-c" or "c-c" (sends Ctrl+char) -// - Alt modifier: "alt-x" or "m-x" (sends ESC+char) -func encodeKeyToken(token string, ptyKeyMode PtyKeyMode) (string, error) { - token = strings.ToLower(strings.TrimSpace(token)) - if token == "" { - return "", nil - } - - // Handle ctrl-X format (c-x) - if strings.HasPrefix(token, "c-") { - char := token[2] - if char >= 'a' && char <= 'z' { - return string(rune(char) & 0x1f), nil // ctrl-a through ctrl-z - } - return "", fmt.Errorf("invalid ctrl key: %s", token) - } - - // Handle ctrl-X format (ctrl-x) - if strings.HasPrefix(token, "ctrl-") { - char := token[5] - if char >= 'a' && char <= 'z' { - return string(rune(char) & 0x1f), nil - } - return "", fmt.Errorf("invalid ctrl key: %s", token) - } - - // Handle alt-X format (m-x or alt-x) - if strings.HasPrefix(token, "m-") || strings.HasPrefix(token, "alt-") { - var char string - if strings.HasPrefix(token, "m-") { - char = token[2:] - } else { - char = token[4:] - } - if len(char) == 1 { - return "\x1b" + char, nil - } - return "", fmt.Errorf("invalid alt key: %s", token) - } - - // Handle shift modifier for special keys (shift-up, shift-down, etc.) - if strings.HasPrefix(token, "s-") || strings.HasPrefix(token, "shift-") { - var key string - if strings.HasPrefix(token, "s-") { - key = token[2:] - } else { - key = token[6:] - } - // Apply shift modifier: for single-char keys, return uppercase - if seq, ok := keyMap[key]; ok { - // For escape sequences, we can't easily add shift - // For single-char keys (letters), return uppercase - if len(seq) == 1 { - return strings.ToUpper(seq), nil - } - return seq, nil - } - return "", fmt.Errorf("unknown key with shift: %s", key) - } - - if ptyKeyMode == PtyKeyModeSS3 { - if seq, ok := ss3KeysMap[token]; ok { - return seq, nil - } - } - - if seq, ok := keyMap[token]; ok { - return seq, nil - } - - return "", fmt.Errorf("unknown key: %s (use write action for text input)", token) -} - -// encodeKeySequence encodes a slice of key tokens into a single string. -func encodeKeySequence(tokens []string, ptyKeyMode PtyKeyMode) (string, error) { - var result string - for _, token := range tokens { - seq, err := encodeKeyToken(token, ptyKeyMode) - if err != nil { - return "", err - } - result += seq - } - return result, nil -} - -func (t *ExecTool) executeSendKeys(args map[string]any) *ToolResult { - sessionID, ok := args["sessionId"].(string) - if !ok { - return ErrorResult("sessionId is required") - } - - keysStr, ok := args["keys"].(string) - if !ok { - return ErrorResult("keys must be a string") - } - - if keysStr == "" { - return ErrorResult("keys cannot be empty") - } - - // Parse comma-separated key names - keyNames := strings.Split(keysStr, ",") - var keys []string - for _, k := range keyNames { - k = strings.TrimSpace(k) - if k != "" { - keys = append(keys, k) - } - } - - if len(keys) == 0 { - return ErrorResult("keys cannot be empty") - } - - session, err := t.sessionManager.Get(sessionID) - if err != nil { - if errors.Is(err, ErrSessionNotFound) { - return ErrorResult(fmt.Sprintf("session not found: %s", sessionID)) - } - return ErrorResult(err.Error()) - } - - ptyKeyMode := session.GetPtyKeyMode() - - data, err := encodeKeySequence(keys, ptyKeyMode) - if err != nil { - return ErrorResult(fmt.Sprintf("invalid key: %v", err)) - } - - if session.IsDone() { - return ErrorResult(fmt.Sprintf("process already exited with code %d", session.GetExitCode())) - } - - if err := session.Write(data); err != nil { - if errors.Is(err, ErrSessionDone) { - return ErrorResult(fmt.Sprintf("process already exited with code %d", session.GetExitCode())) - } - return ErrorResult(fmt.Sprintf("failed to send keys: %v", err)) - } - - resp := ExecResponse{ - SessionID: sessionID, - Status: "running", - Output: fmt.Sprintf("Sent keys: %v", keys), - } - respData, _ := json.Marshal(resp) - return &ToolResult{ - ForLLM: string(respData), - IsError: false, - } -} - func (t *ExecTool) guardCommand(command, cwd string) string { cmd := strings.TrimSpace(command) lower := strings.ToLower(cmd) diff --git a/pkg/tools/shell_test.go b/pkg/tools/shell_test.go index a8de2f4c9..f8f83ea74 100644 --- a/pkg/tools/shell_test.go +++ b/pkg/tools/shell_test.go @@ -2,16 +2,12 @@ package tools import ( "context" - "encoding/json" "os" "path/filepath" - "runtime" "strings" "testing" "time" - "github.com/stretchr/testify/require" - "github.com/sipeed/picoclaw/pkg/config" ) @@ -24,7 +20,6 @@ func TestShellTool_Success(t *testing.T) { ctx := context.Background() args := map[string]any{ - "action": "run", "command": "echo 'hello world'", } @@ -55,7 +50,6 @@ func TestShellTool_Failure(t *testing.T) { ctx := context.Background() args := map[string]any{ - "action": "run", "command": "ls /nonexistent_directory_12345", } @@ -88,7 +82,6 @@ func TestShellTool_Timeout(t *testing.T) { ctx := context.Background() args := map[string]any{ - "action": "run", "command": "sleep 10", } @@ -119,9 +112,8 @@ func TestShellTool_WorkingDir(t *testing.T) { ctx := context.Background() args := map[string]any{ - "action": "run", - "command": "cat test.txt", - "cwd": tmpDir, + "command": "cat test.txt", + "working_dir": tmpDir, } result := tool.Execute(ctx, args) @@ -144,7 +136,6 @@ func TestShellTool_DangerousCommand(t *testing.T) { ctx := context.Background() args := map[string]any{ - "action": "run", "command": "rm -rf /", } @@ -168,7 +159,6 @@ func TestShellTool_DangerousCommand_KillBlocked(t *testing.T) { ctx := context.Background() args := map[string]any{ - "action": "run", "command": "kill 12345", } @@ -208,7 +198,6 @@ func TestShellTool_StderrCapture(t *testing.T) { ctx := context.Background() args := map[string]any{ - "action": "run", "command": "sh -c 'echo stdout; echo stderr >&2'", } @@ -233,7 +222,6 @@ func TestShellTool_OutputTruncation(t *testing.T) { ctx := context.Background() // Generate long output (>10000 chars) args := map[string]any{ - "action": "run", "command": "python3 -c \"print('x' * 20000)\" || echo " + strings.Repeat("x", 20000), } @@ -263,9 +251,8 @@ func TestShellTool_WorkingDir_OutsideWorkspace(t *testing.T) { } result := tool.Execute(context.Background(), map[string]any{ - "action": "run", - "command": "pwd", - "cwd": outsideDir, + "command": "pwd", + "working_dir": outsideDir, }) if !result.IsError { @@ -302,9 +289,8 @@ func TestShellTool_WorkingDir_SymlinkEscape(t *testing.T) { } result := tool.Execute(context.Background(), map[string]any{ - "action": "run", - "command": "cat secret.txt", - "cwd": link, + "command": "cat secret.txt", + "working_dir": link, }) if !result.IsError { @@ -326,7 +312,7 @@ func TestShellTool_RemoteChannelBlockedByDefault(t *testing.T) { t.Fatalf("NewExecToolWithConfig() error: %v", err) } ctx := WithToolContext(context.Background(), "telegram", "chat-1") - result := tool.Execute(ctx, map[string]any{"action": "run", "command": "echo hi"}) + result := tool.Execute(ctx, map[string]any{"command": "echo hi"}) if !result.IsError { t.Fatal("expected remote-channel exec to be blocked") @@ -347,7 +333,7 @@ func TestShellTool_InternalChannelAllowed(t *testing.T) { t.Fatalf("NewExecToolWithConfig() error: %v", err) } ctx := WithToolContext(context.Background(), "cli", "direct") - result := tool.Execute(ctx, map[string]any{"action": "run", "command": "echo hi"}) + result := tool.Execute(ctx, map[string]any{"command": "echo hi"}) if result.IsError { t.Fatalf("expected internal channel exec to succeed, got: %s", result.ForLLM) @@ -387,7 +373,7 @@ func TestShellTool_AllowRemoteBypassesChannelCheck(t *testing.T) { t.Fatalf("NewExecToolWithConfig() error: %v", err) } ctx := WithToolContext(context.Background(), "telegram", "chat-1") - result := tool.Execute(ctx, map[string]any{"action": "run", "command": "echo hi"}) + result := tool.Execute(ctx, map[string]any{"command": "echo hi"}) if result.IsError { t.Fatalf("expected allowRemote=true to permit remote channel, got: %s", result.ForLLM) @@ -406,7 +392,6 @@ func TestShellTool_RestrictToWorkspace(t *testing.T) { ctx := context.Background() args := map[string]any{ - "action": "run", "command": "cat ../../etc/passwd", } @@ -444,7 +429,7 @@ func TestShellTool_DevNullAllowed(t *testing.T) { } for _, cmd := range commands { - result := tool.Execute(context.Background(), map[string]any{"action": "run", "command": cmd}) + result := tool.Execute(context.Background(), map[string]any{"command": cmd}) if result.IsError && strings.Contains(result.ForLLM, "blocked") { t.Errorf("command should not be blocked: %s\n error: %s", cmd, result.ForLLM) } @@ -473,7 +458,7 @@ func TestShellTool_BlockDevices(t *testing.T) { } for _, cmd := range blocked { - result := tool.Execute(context.Background(), map[string]any{"action": "run", "command": cmd}) + result := tool.Execute(context.Background(), map[string]any{"command": cmd}) if !result.IsError { t.Errorf("expected block device write to be blocked: %s", cmd) } @@ -497,7 +482,7 @@ func TestShellTool_SafePathsInWorkspaceRestriction(t *testing.T) { } for _, cmd := range commands { - result := tool.Execute(context.Background(), map[string]any{"action": "run", "command": cmd}) + result := tool.Execute(context.Background(), map[string]any{"command": cmd}) if result.IsError && strings.Contains(result.ForLLM, "path outside working dir") { t.Errorf("safe path should not be blocked by workspace check: %s\n error: %s", cmd, result.ForLLM) } @@ -513,7 +498,6 @@ func TestShellTool_ExitCodeDetails(t *testing.T) { ctx := context.Background() args := map[string]any{ - "action": "run", "command": "sh -c 'exit 42'", } @@ -550,7 +534,6 @@ func TestShellTool_TimeoutWithPartialOutput(t *testing.T) { ctx := context.Background() // Use a command that outputs immediately then sleeps args := map[string]any{ - "action": "run", "command": "echo 'partial output before timeout' && sleep 30", } @@ -625,9 +608,7 @@ func TestShellTool_URLsNotBlocked(t *testing.T) { } for _, cmd := range commands { - ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second) - result := tool.Execute(ctx, map[string]any{"action": "run", "command": cmd}) - cancel() + result := tool.Execute(context.Background(), map[string]any{"command": cmd}) if result.IsError && strings.Contains(result.ForLLM, "path outside working dir") { t.Errorf("command with URL should not be blocked by workspace check: %s\n error: %s", cmd, result.ForLLM) } @@ -652,7 +633,7 @@ func TestShellTool_FileURISandboxing(t *testing.T) { } for _, cmd := range blockedCommands { - result := tool.Execute(context.Background(), map[string]any{"action": "run", "command": cmd}) + result := tool.Execute(context.Background(), map[string]any{"command": cmd}) if !result.IsError || !strings.Contains(result.ForLLM, "path outside working dir") { t.Errorf("file:// URI outside workspace should be blocked: %s", cmd) } @@ -670,7 +651,7 @@ func TestShellTool_FileURISandboxing(t *testing.T) { } for _, cmd := range allowedCommands { - result := tool.Execute(context.Background(), map[string]any{"action": "run", "command": cmd}) + result := tool.Execute(context.Background(), map[string]any{"command": cmd}) if result.IsError && strings.Contains(result.ForLLM, "path outside working dir") { t.Errorf("file:// URI inside workspace should be allowed: %s\n error: %s", cmd, result.ForLLM) } @@ -696,920 +677,9 @@ func TestShellTool_URLBypassPrevented(t *testing.T) { } for _, cmd := range blockedCommands { - result := tool.Execute(context.Background(), map[string]any{"action": "run", "command": cmd}) + result := tool.Execute(context.Background(), map[string]any{"command": cmd}) if !result.IsError || !strings.Contains(result.ForLLM, "path outside working dir") { t.Errorf("bypass attempt should be blocked: %q\n got: %s", cmd, result.ForLLM) } } } - -func TestShellTool_Background_ReturnsImmediately(t *testing.T) { - tool, err := NewExecTool("", false) - require.NoError(t, err) - - ctx := context.Background() - args := map[string]any{ - "action": "run", - "command": "sleep 5", - "background": "true", - } - - start := time.Now() - result := tool.Execute(ctx, args) - elapsed := time.Since(start) - - require.False(t, result.IsError, "background run should not error: %s", result.ForLLM) - require.Less(t, elapsed, time.Second, "background run should return immediately") - require.Contains(t, result.ForLLM, "sessionId") -} - -func TestShellTool_List_Empty(t *testing.T) { - tool, err := NewExecTool("", false) - require.NoError(t, err) - - sm := NewSessionManager() - tool.sessionManager = sm - - ctx := context.Background() - args := map[string]any{"action": "list"} - - result := tool.Execute(ctx, args) - require.False(t, result.IsError) - require.Contains(t, result.ForUser, "0 active sessions") -} - -func TestShellTool_RunBackground_List(t *testing.T) { - tool, err := NewExecTool("", false) - require.NoError(t, err) - - sm := NewSessionManager() - tool.sessionManager = sm - - ctx := WithToolContext(context.Background(), "cli", "test") - - runResult := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": "sleep 10", - "background": "true", - }) - require.False(t, runResult.IsError, "run should succeed: %s", runResult.ForLLM) - - var resp ExecResponse - err = json.Unmarshal([]byte(runResult.ForLLM), &resp) - require.NoError(t, err) - require.NotEmpty(t, resp.SessionID) - - time.Sleep(100 * time.Millisecond) - - listResult := tool.Execute(ctx, map[string]any{"action": "list"}) - require.False(t, listResult.IsError) - - var listResp ExecResponse - err = json.Unmarshal([]byte(listResult.ForLLM), &listResp) - require.NoError(t, err) - require.Len(t, listResp.Sessions, 1) - require.Equal(t, resp.SessionID, listResp.Sessions[0].ID) - - killResult := tool.Execute(ctx, map[string]any{ - "action": "kill", - "sessionId": resp.SessionID, - }) - require.False(t, killResult.IsError, "kill should succeed: %s", killResult.ForLLM) -} - -func TestShellTool_Read_Output(t *testing.T) { - tool, err := NewExecTool("", false) - require.NoError(t, err) - - sm := NewSessionManager() - tool.sessionManager = sm - - ctx := WithToolContext(context.Background(), "cli", "test") - - runResult := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": "echo hello", - "background": "true", - }) - require.False(t, runResult.IsError) - - var resp ExecResponse - err = json.Unmarshal([]byte(runResult.ForLLM), &resp) - require.NoError(t, err) - - time.Sleep(200 * time.Millisecond) - - readResult := tool.Execute(ctx, map[string]any{ - "action": "read", - "sessionId": resp.SessionID, - }) - - if !readResult.IsError { - var readResp ExecResponse - err = json.Unmarshal([]byte(readResult.ForLLM), &readResp) - require.NoError(t, err) - } -} - -func TestShellTool_Kill(t *testing.T) { - tool, err := NewExecTool("", false) - require.NoError(t, err) - - sm := NewSessionManager() - tool.sessionManager = sm - - ctx := WithToolContext(context.Background(), "cli", "test") - - runResult := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": "sleep 100", - "background": "true", - }) - require.False(t, runResult.IsError) - - var resp ExecResponse - err = json.Unmarshal([]byte(runResult.ForLLM), &resp) - require.NoError(t, err) - - killResult := tool.Execute(ctx, map[string]any{ - "action": "kill", - "sessionId": resp.SessionID, - }) - require.False(t, killResult.IsError, "kill should succeed: %s", killResult.ForLLM) - - time.Sleep(100 * time.Millisecond) - - listResult := tool.Execute(ctx, map[string]any{"action": "list"}) - var listResp ExecResponse - err = json.Unmarshal([]byte(listResult.ForLLM), &listResp) - require.NoError(t, err) - require.Len(t, listResp.Sessions, 0) -} - -func TestShellTool_PTY_AllowedCommands(t *testing.T) { - if runtime.GOOS == "windows" { - t.Skip("PTY not supported on Windows") - } - - tool, err := NewExecTool("", false) - require.NoError(t, err) - - sm := NewSessionManager() - tool.sessionManager = sm - - ctx := WithToolContext(context.Background(), "cli", "test") - - // Test that PTY is allowed for non-interpreter commands - result := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": "cat", - "pty": "true", - "background": "true", - }) - require.False(t, result.IsError, "PTY with cat should succeed: %s", result.ForLLM) - require.Contains(t, result.ForLLM, "sessionId") - - var resp ExecResponse - err = json.Unmarshal([]byte(result.ForLLM), &resp) - require.NoError(t, err) - require.NotEmpty(t, resp.SessionID) - - // Clean up - tool.Execute(ctx, map[string]any{ - "action": "kill", - "sessionId": resp.SessionID, - }) -} - -func TestShellTool_PTY_WriteRead(t *testing.T) { - if runtime.GOOS == "windows" { - t.Skip("PTY not supported on Windows") - } - - tool, err := NewExecTool("", false) - require.NoError(t, err) - - sm := NewSessionManager() - tool.sessionManager = sm - - ctx := WithToolContext(context.Background(), "cli", "test") - - // Start a PTY session with a command that waits for input - // Using 'cat' which will wait for stdin - result := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": "cat", - "pty": "true", - "background": "true", - }) - require.False(t, result.IsError, "PTY run should succeed: %s", result.ForLLM) - - var resp ExecResponse - err = json.Unmarshal([]byte(result.ForLLM), &resp) - require.NoError(t, err) - - // Write some input to cat - writeResult := tool.Execute(ctx, map[string]any{ - "action": "write", - "sessionId": resp.SessionID, - "data": "hello\n", - }) - require.False(t, writeResult.IsError, "write should succeed: %s", writeResult.ForLLM) - - // Give cat time to process and output - time.Sleep(200 * time.Millisecond) - - // Read the output - readResult := tool.Execute(ctx, map[string]any{ - "action": "read", - "sessionId": resp.SessionID, - }) - - require.False(t, readResult.IsError, "read should succeed: %s", readResult.ForLLM) - - var readResp ExecResponse - err = json.Unmarshal([]byte(readResult.ForLLM), &readResp) - require.NoError(t, err) - // PTY output should contain "hello" - require.Contains(t, readResp.Output, "hello") - - // Clean up - tool.Execute(ctx, map[string]any{ - "action": "kill", - "sessionId": resp.SessionID, - }) -} - -func TestShellTool_PTY_Poll(t *testing.T) { - if runtime.GOOS == "windows" { - t.Skip("PTY not supported on Windows") - } - - tool, err := NewExecTool("", false) - require.NoError(t, err) - - sm := NewSessionManager() - tool.sessionManager = sm - - ctx := WithToolContext(context.Background(), "cli", "test") - - // Start a PTY session with a long-running command - result := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": "sleep 2", - "pty": "true", - "background": "true", - }) - require.False(t, result.IsError, "PTY run should succeed: %s", result.ForLLM) - - var resp ExecResponse - err = json.Unmarshal([]byte(result.ForLLM), &resp) - require.NoError(t, err) - - // Poll should show running - pollResult := tool.Execute(ctx, map[string]any{ - "action": "poll", - "sessionId": resp.SessionID, - }) - require.False(t, pollResult.IsError, "poll should succeed: %s", pollResult.ForLLM) - - var pollResp ExecResponse - err = json.Unmarshal([]byte(pollResult.ForLLM), &pollResp) - require.NoError(t, err) - require.Equal(t, "running", pollResp.Status) - - // Wait for sleep to complete - time.Sleep(2500 * time.Millisecond) - - // Poll should show done - pollResult = tool.Execute(ctx, map[string]any{ - "action": "poll", - "sessionId": resp.SessionID, - }) - require.False(t, pollResult.IsError) - - err = json.Unmarshal([]byte(pollResult.ForLLM), &pollResp) - require.NoError(t, err) - require.Equal(t, "done", pollResp.Status) -} - -func TestShellTool_PTY_Kill(t *testing.T) { - if runtime.GOOS == "windows" { - t.Skip("PTY not supported on Windows") - } - - tool, err := NewExecTool("", false) - require.NoError(t, err) - - sm := NewSessionManager() - tool.sessionManager = sm - - ctx := WithToolContext(context.Background(), "cli", "test") - - // Start a PTY session with a long-running command - result := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": "sleep 10", - "pty": "true", - "background": "true", - }) - require.False(t, result.IsError, "PTY run should succeed: %s", result.ForLLM) - - var resp ExecResponse - err = json.Unmarshal([]byte(result.ForLLM), &resp) - require.NoError(t, err) - - // Kill the session - killResult := tool.Execute(ctx, map[string]any{ - "action": "kill", - "sessionId": resp.SessionID, - }) - require.False(t, killResult.IsError, "kill should succeed: %s", killResult.ForLLM) - - // Verify kill response shows done status - var killResp ExecResponse - err = json.Unmarshal([]byte(killResult.ForLLM), &killResp) - require.NoError(t, err) - require.Equal(t, "done", killResp.Status) - - // Poll should return error since session is removed after kill - pollResult := tool.Execute(ctx, map[string]any{ - "action": "poll", - "sessionId": resp.SessionID, - }) - // Session is removed after kill, so poll returns error with "session not found" - require.True(t, pollResult.IsError, "poll should error after kill (session removed)") - require.Contains(t, pollResult.ForLLM, "session not found") -} - -func TestShellTool_Write_Read_NonPTY(t *testing.T) { - tool, err := NewExecTool("", false) - require.NoError(t, err) - - sm := NewSessionManager() - tool.sessionManager = sm - - ctx := WithToolContext(context.Background(), "cli", "test") - - // Start a background process that reads from stdin and outputs it - // Using 'cat' which echoes stdin to stdout - result := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": "cat", - "pty": false, - "background": "true", - }) - require.False(t, result.IsError, "run should succeed: %s", result.ForLLM) - - var resp ExecResponse - err = json.Unmarshal([]byte(result.ForLLM), &resp) - require.NoError(t, err) - - // Write some input to cat - writeResult := tool.Execute(ctx, map[string]any{ - "action": "write", - "sessionId": resp.SessionID, - "data": "hello world\n", - }) - require.False(t, writeResult.IsError, "write should succeed: %s", writeResult.ForLLM) - - // Give cat time to process and output - time.Sleep(200 * time.Millisecond) - - // Read the output - readResult := tool.Execute(ctx, map[string]any{ - "action": "read", - "sessionId": resp.SessionID, - }) - require.False(t, readResult.IsError, "read should succeed: %s", readResult.ForLLM) - - var readResp ExecResponse - err = json.Unmarshal([]byte(readResult.ForLLM), &readResp) - require.NoError(t, err) - require.Contains(t, readResp.Output, "hello world") - - // Clean up - tool.Execute(ctx, map[string]any{ - "action": "kill", - "sessionId": resp.SessionID, - }) -} - -func TestShellTool_Read_NonPTY_Running(t *testing.T) { - tool, err := NewExecTool("", false) - require.NoError(t, err) - - sm := NewSessionManager() - tool.sessionManager = sm - - ctx := WithToolContext(context.Background(), "cli", "test") - - // Start a long-running process that produces output over time - // Using sh -c with sleep at the end so process doesn't exit immediately - result := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": "sh -c 'echo line1; sleep 0.5; echo line2; sleep 0.5; echo line3; sleep 10'", - "pty": false, - "background": "true", - }) - require.False(t, result.IsError, "run should succeed: %s", result.ForLLM) - - var resp ExecResponse - err = json.Unmarshal([]byte(result.ForLLM), &resp) - require.NoError(t, err) - - // Give time for first outputs to be produced - time.Sleep(300 * time.Millisecond) - - // Read output while process is running - readResult := tool.Execute(ctx, map[string]any{ - "action": "read", - "sessionId": resp.SessionID, - }) - require.False(t, readResult.IsError, "read should succeed: %s", readResult.ForLLM) - - var readResp ExecResponse - err = json.Unmarshal([]byte(readResult.ForLLM), &readResp) - require.NoError(t, err) - // Should have at least line1 - require.Contains(t, readResp.Output, "line1") - - // Wait for line3 to be produced (line1=0s, line2=0.5s, line3=1s, then sleep 10) - time.Sleep(1200 * time.Millisecond) - - // Read again - should have line3 as well - readResult = tool.Execute(ctx, map[string]any{ - "action": "read", - "sessionId": resp.SessionID, - }) - require.False(t, readResult.IsError, "read should succeed: %s", readResult.ForLLM) - - err = json.Unmarshal([]byte(readResult.ForLLM), &readResp) - require.NoError(t, err) - require.Contains(t, readResp.Output, "line3") - - // Clean up - tool.Execute(ctx, map[string]any{ - "action": "kill", - "sessionId": resp.SessionID, - }) -} - -func TestShellTool_ProcessGroupKill(t *testing.T) { - if runtime.GOOS == "windows" { - t.Skip("Process group kill not supported on Windows") - } - - // Note: Testing process group kill with PTY is tricky because the command - // must be run through an interpreter (sh, bash) which is blocked for PTY. - // Instead, we test with non-PTY mode which also uses Setsid for background processes. - - tool, err := NewExecTool("", false) - require.NoError(t, err) - - sm := NewSessionManager() - tool.sessionManager = sm - - ctx := WithToolContext(context.Background(), "cli", "test") - - // Start a shell that spawns child processes (non-PTY mode) - // The sh -c command creates child sleep processes - result := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": "sh -c 'sleep 30 & sleep 30 & wait'", - "pty": false, - "background": "true", - }) - require.False(t, result.IsError, "run should succeed: %s", result.ForLLM) - - var resp ExecResponse - err = json.Unmarshal([]byte(result.ForLLM), &resp) - require.NoError(t, err) - - // Give time for child processes to spawn - time.Sleep(500 * time.Millisecond) - - // Kill the session - should kill the entire process group - killResult := tool.Execute(ctx, map[string]any{ - "action": "kill", - "sessionId": resp.SessionID, - }) - require.False(t, killResult.IsError, "kill should succeed: %s", killResult.ForLLM) - - // Verify kill response shows done status - var killResp ExecResponse - err = json.Unmarshal([]byte(killResult.ForLLM), &killResp) - require.NoError(t, err) - require.Equal(t, "done", killResp.Status) - - // Poll should return error since session is removed after kill - pollResult := tool.Execute(ctx, map[string]any{ - "action": "poll", - "sessionId": resp.SessionID, - }) - require.True(t, pollResult.IsError, "poll should error after kill (session removed)") - require.Contains(t, pollResult.ForLLM, "session not found") -} - -func TestShellTool_PTY_ProcessGroupKill(t *testing.T) { - if runtime.GOOS == "windows" { - t.Skip("PTY process group kill not supported on Windows") - } - - // This test binary creates 4 child sleep processes and waits for signals. - // It's not an interpreter, so it's allowed with PTY mode. - // The binary is created in /tmp/test_pgroup.c and compiled as part of test setup. - testBinary := "/tmp/test_pgroup" - if _, err := os.Stat(testBinary); os.IsNotExist(err) { - t.Skip("Test binary /tmp/test_pgroup not found - run: gcc -o /tmp/test_pgroup /tmp/test_pgroup.c") - } - - tool, err := NewExecTool("", false) - require.NoError(t, err) - - sm := NewSessionManager() - tool.sessionManager = sm - - ctx := WithToolContext(context.Background(), "cli", "test") - - // Start the test binary with PTY mode - // It forks 4 child sleep processes and waits for signals - result := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": testBinary, - "pty": "true", - "background": "true", - }) - require.False(t, result.IsError, "run should succeed: %s", result.ForLLM) - - var resp ExecResponse - err = json.Unmarshal([]byte(result.ForLLM), &resp) - require.NoError(t, err) - - // Give time for child processes to spawn - time.Sleep(500 * time.Millisecond) - - // Kill the session - should kill the entire process group - killResult := tool.Execute(ctx, map[string]any{ - "action": "kill", - "sessionId": resp.SessionID, - }) - require.False(t, killResult.IsError, "kill should succeed: %s", killResult.ForLLM) - - // Verify kill response shows done status - var killResp ExecResponse - err = json.Unmarshal([]byte(killResult.ForLLM), &killResp) - require.NoError(t, err) - require.Equal(t, "done", killResp.Status) - - // Poll should return error since session is removed after kill - pollResult := tool.Execute(ctx, map[string]any{ - "action": "poll", - "sessionId": resp.SessionID, - }) - require.True(t, pollResult.IsError, "poll should error after kill (session removed)") - require.Contains(t, pollResult.ForLLM, "session not found") -} - -func TestShellTool_PTY_Background_Read(t *testing.T) { - if runtime.GOOS == "windows" { - t.Skip("PTY not supported on Windows") - } - - tool, err := NewExecTool("", false) - require.NoError(t, err) - - sm := NewSessionManager() - tool.sessionManager = sm - - ctx := WithToolContext(context.Background(), "cli", "test") - - // Start a fast command with PTY + background mode - runResult := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": "echo hello", - "pty": "true", - "background": "true", - }) - require.False(t, runResult.IsError, "run should succeed: %s", runResult.ForLLM) - - var runResp ExecResponse - err = json.Unmarshal([]byte(runResult.ForLLM), &runResp) - require.NoError(t, err) - require.NotEmpty(t, runResp.SessionID) - require.Equal(t, "running", runResp.Status) - - // Wait for command to complete - time.Sleep(500 * time.Millisecond) - - // Read output - this is the key test: PTY + background mode should preserve output - readResult := tool.Execute(ctx, map[string]any{ - "action": "read", - "sessionId": runResp.SessionID, - }) - require.False(t, readResult.IsError, "read should succeed: %s", readResult.ForLLM) - require.Contains(t, readResult.ForLLM, "hello", "output should contain 'hello'") -} - -func TestShellTool_PTY_Background_ReadNoBlock(t *testing.T) { - if runtime.GOOS == "windows" { - t.Skip("PTY not supported on Windows") - } - - tool, err := NewExecTool("", false) - require.NoError(t, err) - - sm := NewSessionManager() - tool.sessionManager = sm - - ctx := WithToolContext(context.Background(), "cli", "test") - - // Start a long-running command with PTY + background mode - // This command produces no output, just sleeps - runResult := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": "sleep 10", - "pty": "true", - "background": "true", - }) - require.False(t, runResult.IsError, "run should succeed: %s", runResult.ForLLM) - - var runResp ExecResponse - err = json.Unmarshal([]byte(runResult.ForLLM), &runResp) - require.NoError(t, err) - require.NotEmpty(t, runResp.SessionID) - - // Read immediately - should NOT block even though process is running and has no output - // This tests that Read() returns quickly (within 1 second) instead of blocking for 10 seconds - start := time.Now() - readResult := tool.Execute(ctx, map[string]any{ - "action": "read", - "sessionId": runResp.SessionID, - }) - elapsed := time.Since(start) - - require.False(t, readResult.IsError, "read should succeed: %s", readResult.ForLLM) - require.Less(t, elapsed.Seconds(), 1.0, "read should not block, should return within 1 second") - - // Kill the session to clean up - killResult := tool.Execute(ctx, map[string]any{ - "action": "kill", - "sessionId": runResp.SessionID, - }) - require.False(t, killResult.IsError, "kill should succeed: %s", killResult.ForLLM) -} - -func TestShellTool_Poll_Status(t *testing.T) { - tool, err := NewExecTool("", false) - require.NoError(t, err) - - sm := NewSessionManager() - tool.sessionManager = sm - - ctx := WithToolContext(context.Background(), "cli", "test") - - runResult := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": "sleep 1", - "background": "true", - }) - require.False(t, runResult.IsError) - - var resp ExecResponse - err = json.Unmarshal([]byte(runResult.ForLLM), &resp) - require.NoError(t, err) - - pollResult := tool.Execute(ctx, map[string]any{ - "action": "poll", - "sessionId": resp.SessionID, - }) - require.False(t, pollResult.IsError) - - var pollResp ExecResponse - err = json.Unmarshal([]byte(pollResult.ForLLM), &pollResp) - require.NoError(t, err) - require.Equal(t, "running", pollResp.Status) - - time.Sleep(1200 * time.Millisecond) - - pollResult = tool.Execute(ctx, map[string]any{ - "action": "poll", - "sessionId": resp.SessionID, - }) - require.False(t, pollResult.IsError) - - err = json.Unmarshal([]byte(pollResult.ForLLM), &pollResp) - require.NoError(t, err) - require.Equal(t, "done", pollResp.Status) -} - -func TestShellTool_Action_Run_Sync(t *testing.T) { - tool, err := NewExecTool("", false) - require.NoError(t, err) - - ctx := context.Background() - - result := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": "echo hello", - }) - - require.False(t, result.IsError) - require.Contains(t, result.ForLLM, "hello") -} - -// TestShellTool_Background_ReadAfterExit verifies that we can read -// buffered output even after the background process has exited. -func TestShellTool_Background_ReadAfterExit(t *testing.T) { - tool, err := NewExecTool("", false) - require.NoError(t, err) - - ctx := context.Background() - - // Start a background command that produces output and exits quickly - runResult := tool.Execute(ctx, map[string]any{ - "action": "run", - "command": "echo hello && sleep 1 && echo world", - "background": "true", - }) - require.False(t, runResult.IsError, "run should succeed: %s", runResult.ForUser) - - // Parse session ID from response - var resp ExecResponse - err = json.Unmarshal([]byte(runResult.ForLLM), &resp) - require.NoError(t, err) - require.NotEmpty(t, resp.SessionID) - sessionID := resp.SessionID - - // Wait for process to exit (sleep 1 + some buffer) - time.Sleep(1500 * time.Millisecond) - - // Poll to verify process is done - pollResult := tool.Execute(ctx, map[string]any{ - "action": "poll", - "sessionId": sessionID, - }) - require.False(t, pollResult.IsError, "poll should succeed: %s", pollResult.ForLLM) - var pollResp ExecResponse - err = json.Unmarshal([]byte(pollResult.ForLLM), &pollResp) - require.NoError(t, err) - require.Equal(t, "done", pollResp.Status, "process should be done") - - // Try to read output AFTER process has exited - readResult := tool.Execute(ctx, map[string]any{ - "action": "read", - "sessionId": sessionID, - }) - require.False(t, readResult.IsError, "read should succeed after exit: %s", readResult.ForLLM) - - var readResp ExecResponse - err = json.Unmarshal([]byte(readResult.ForLLM), &readResp) - require.NoError(t, err) - - // Output should contain both "hello" and "world" - require.Contains(t, readResp.Output, "hello", "should contain hello") - require.Contains(t, readResp.Output, "world", "should contain world after sleep") -} - -func TestSendKeys_CtrlC(t *testing.T) { - // Note: Ctrl-C as a signal requires sending SIGINT to the process group, - // which requires elevated privileges. Writing "\x03" to PTY passes the byte - // to the process but doesn't generate SIGINT for processes that don't read stdin. - // For interrupting processes, use the kill action instead. - t.Skip("Ctrl-C as signal not supported - use kill action for interruption") -} - -func TestEncodeKeyToken(t *testing.T) { - tests := []struct { - token string - expected string - hasError bool - }{ - // Named keys - {"enter", "\r", false}, - {"return", "\r", false}, - {"tab", "\t", false}, - {"escape", "\x1b", false}, - {"esc", "\x1b", false}, - {"backspace", "\x7f", false}, - {"up", "\x1b[A", false}, - {"down", "\x1b[B", false}, - {"left", "\x1b[D", false}, - {"right", "\x1b[C", false}, - {"home", "\x1b[1~", false}, - {"end", "\x1b[4~", false}, - {"pageup", "\x1b[5~", false}, - {"pagedown", "\x1b[6~", false}, - {"delete", "\x1b[3~", false}, - {"f1", "\x1bOP", false}, - {"f12", "\x1b[24~", false}, - - // Ctrl keys - {"ctrl-c", "\x03", false}, - {"ctrl-d", "\x04", false}, - {"ctrl-a", "\x01", false}, - {"ctrl-z", "\x1a", false}, - {"c-c", "\x03", false}, - {"c-d", "\x04", false}, - - // Alt keys - {"alt-x", "\x1bx", false}, - {"m-x", "\x1bx", false}, - - // Case insensitive tests - {"ENTER", "\r", false}, - {"TAB", "\t", false}, - {"CTRL-C", "\x03", false}, - {"Ctrl-D", "\x04", false}, - {"ALT-X", "\x1bx", false}, - {"M-X", "\x1bx", false}, - {"UP", "\x1b[A", false}, - {"DOWN", "\x1b[B", false}, - - // Unknown keys should return error (use write action for text input) - {"unknown-key", "", true}, - } - - for _, tt := range tests { - t.Run(tt.token, func(t *testing.T) { - result, err := encodeKeyToken(tt.token, PtyKeyModeCSI) - if tt.hasError { - require.Error(t, err, "expected error for %s", tt.token) - } else { - require.NoError(t, err, "unexpected error for %s", tt.token) - require.Equal(t, tt.expected, result, "wrong encoding for %s", tt.token) - } - }) - } -} - -// TestDetectPtyKeyMode tests smkx/rmkx detection in PTY output -func TestDetectPtyKeyMode(t *testing.T) { - tests := []struct { - name string - raw string - expected PtyKeyMode - }{ - {"no toggle", "hello world", PtyKeyModeNotFound}, - {"smkx only", "\x1b[?1h\x1b=", PtyKeyModeSS3}, - {"rmkx only", "\x1b[?1l\x1b>", PtyKeyModeCSI}, - {"both smkx first", "\x1b[?1h\x1b=...\x1b[?1l\x1b>", PtyKeyModeCSI}, - {"both rmkx first", "\x1b[?1l\x1b>...\x1b[?1h\x1b=", PtyKeyModeSS3}, - {"multiple toggles smkx last", "\x1b[?1h\x1b=...\x1b[?1l\x1b>...\x1b[?1h\x1b=", PtyKeyModeSS3}, - {"multiple toggles rmkx last", "\x1b[?1l\x1b>...\x1b[?1h\x1b=...\x1b[?1l\x1b>", PtyKeyModeCSI}, - {"partial smkx", "\x1b[?1h", PtyKeyModeSS3}, - {"partial rmkx", "\x1b[?1l", PtyKeyModeCSI}, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - result := detectPtyKeyMode(tt.raw) - require.Equal(t, tt.expected, result, "wrong mode for %s", tt.name) - }) - } -} - -func TestEncodeKeyTokenWithPtyKeyMode(t *testing.T) { - tests := []struct { - name string - token string - mode PtyKeyMode - expected string - hasError bool - }{ - // CSI mode - {"up csi", "up", PtyKeyModeCSI, "\x1b[A", false}, - {"down csi", "down", PtyKeyModeCSI, "\x1b[B", false}, - {"left csi", "left", PtyKeyModeCSI, "\x1b[D", false}, - {"right csi", "right", PtyKeyModeCSI, "\x1b[C", false}, - - // SS3 mode - {"up ss3", "up", PtyKeyModeSS3, "\x1bOA", false}, - {"down ss3", "down", PtyKeyModeSS3, "\x1bOB", false}, - {"left ss3", "left", PtyKeyModeSS3, "\x1bOD", false}, - {"right ss3", "right", PtyKeyModeSS3, "\x1bOC", false}, - {"home ss3", "home", PtyKeyModeSS3, "\x1bOH", false}, - {"end ss3", "end", PtyKeyModeSS3, "\x1bOF", false}, - - // Other keys unaffected by mode - {"enter ss3", "enter", PtyKeyModeSS3, "\r", false}, - {"tab ss3", "tab", PtyKeyModeSS3, "\t", false}, - {"ctrl-c ss3", "ctrl-c", PtyKeyModeSS3, "\x03", false}, - - // NotFound behaves like CSI - {"up notfound", "up", PtyKeyModeNotFound, "\x1b[A", false}, - {"down notfound", "down", PtyKeyModeNotFound, "\x1b[B", false}, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - result, err := encodeKeyToken(tt.token, tt.mode) - if tt.hasError { - require.Error(t, err, "expected error for %s", tt.name) - } else { - require.NoError(t, err, "unexpected error for %s", tt.name) - require.Equal(t, tt.expected, result, "wrong encoding for %s", tt.name) - } - }) - } -} diff --git a/pkg/tools/shell_timeout_unix_test.go b/pkg/tools/shell_timeout_unix_test.go index dfd28454c..357e1276e 100644 --- a/pkg/tools/shell_timeout_unix_test.go +++ b/pkg/tools/shell_timeout_unix_test.go @@ -30,7 +30,6 @@ func TestShellTool_TimeoutKillsChildProcess(t *testing.T) { tool.SetTimeout(500 * time.Millisecond) args := map[string]any{ - "action": "run", // Spawn a child process that would outlive the shell unless process-group kill is used. "command": "sleep 60 & echo $! > child.pid; wait", } diff --git a/pkg/tools/types.go b/pkg/tools/types.go index 4d1a18d5a..a6015cde3 100644 --- a/pkg/tools/types.go +++ b/pkg/tools/types.go @@ -56,24 +56,3 @@ type ToolFunctionDefinition struct { Description string `json:"description"` Parameters map[string]any `json:"parameters"` } - -type ExecRequest struct { - Action string `json:"action"` - Command string `json:"command,omitempty"` - PTY bool `json:"pty,omitempty"` - Background bool `json:"background,omitempty"` - Timeout int `json:"timeout,omitempty"` - Env map[string]string `json:"env,omitempty"` - Cwd string `json:"cwd,omitempty"` - SessionID string `json:"sessionId,omitempty"` - Data string `json:"data,omitempty"` -} - -type ExecResponse struct { - SessionID string `json:"sessionId,omitempty"` - Status string `json:"status,omitempty"` - ExitCode int `json:"exitCode,omitempty"` - Output string `json:"output,omitempty"` - Error string `json:"error,omitempty"` - Sessions []SessionInfo `json:"sessions,omitempty"` -} diff --git a/workspace/skills/agent-browser/SKILL.md b/workspace/skills/agent-browser/SKILL.md new file mode 100644 index 000000000..43505996d --- /dev/null +++ b/workspace/skills/agent-browser/SKILL.md @@ -0,0 +1,129 @@ +--- +name: agent-browser +description: "Browser automation via agent-browser CLI. Use when the user needs to navigate websites, fill forms, click buttons, take screenshots, extract data, or test web apps." +metadata: {"nanobot":{"emoji":"🌐","requires":{"bins":["agent-browser"]},"install":[{"id":"npm","kind":"npm","package":"agent-browser","global":true,"bins":["agent-browser"],"label":"Install agent-browser (npm)"}]}} +--- + +# Agent Browser + +CLI browser automation via Chrome/Chromium CDP. Install: `npm i -g agent-browser && agent-browser install`. + +**Before using this skill**, verify the tool is available by running `which agent-browser`. If the command is not found, tell the user that browser automation requires the `agent-browser` CLI and Chromium, which are only available in the heavy container image. Do not attempt to install it at runtime. + +## Core Workflow + +1. `agent-browser open ` — navigate +2. `agent-browser snapshot -i` — get interactive elements with refs (`@e1`, `@e2`, ...) +3. Interact using refs — `click @e1`, `fill @e2 "text"` +4. Re-snapshot after any navigation or DOM change — refs are invalidated + +```bash +agent-browser open https://example.com/form +agent-browser snapshot -i +# @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Submit" +agent-browser fill @e1 "user@example.com" +agent-browser fill @e2 "secret" +agent-browser click @e3 +agent-browser wait --load networkidle +agent-browser snapshot -i +``` + +Chain commands with `&&` when you don't need intermediate output: +```bash +agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i +``` + +## Commands + +```bash +# Navigation +agent-browser open +agent-browser close + +# Snapshot +agent-browser snapshot -i # Interactive elements with refs +agent-browser snapshot -s "#selector" # Scope to CSS selector + +# Interaction (use @refs from snapshot) +agent-browser click @e1 +agent-browser fill @e2 "text" # Clear + type +agent-browser type @e2 "text" # Type without clearing +agent-browser select @e1 "option" +agent-browser check @e1 +agent-browser press Enter +agent-browser scroll down 500 + +# Get info +agent-browser get text @e1 +agent-browser get url +agent-browser get title + +# Wait +agent-browser wait @e1 # Wait for element +agent-browser wait --load networkidle # Wait for network idle +agent-browser wait --url "**/dashboard" # Wait for URL pattern +agent-browser wait --text "Welcome" # Wait for text +agent-browser wait 2000 # Wait ms + +# Capture +agent-browser screenshot # Screenshot to temp dir +agent-browser screenshot --full # Full page +agent-browser screenshot --annotate # With numbered element labels ([N] -> @eN) +agent-browser pdf output.pdf + +# Semantic locators (when refs unavailable) +agent-browser find text "Sign In" click +agent-browser find label "Email" fill "user@test.com" +agent-browser find role button click --name "Submit" +``` + +## Authentication + +```bash +# Option 1: Import from user's running Chrome +agent-browser --auto-connect state save ./auth.json +agent-browser --state ./auth.json open https://app.example.com + +# Option 2: Persistent profile +agent-browser --profile ~/.myapp open https://app.example.com/login +# ... login once, all future runs are authenticated + +# Option 3: Session name (auto-save/restore) +agent-browser --session-name myapp open https://app.example.com/login +# ... login, close, next run state is restored + +# Option 4: State file +agent-browser state save auth.json +agent-browser state load auth.json +``` + +## Iframes + +Iframe content is inlined in snapshots. Interact with iframe refs directly — no frame switch needed. + +## Parallel Sessions + +```bash +agent-browser --session s1 open https://site-a.com +agent-browser --session s2 open https://site-b.com +agent-browser session list +``` + +## JavaScript Eval + +```bash +agent-browser eval 'document.title' + +# Complex JS — use --stdin to avoid shell quoting issues +agent-browser eval --stdin <<'EVALEOF' +JSON.stringify(Array.from(document.querySelectorAll("a")).map(a => a.href)) +EVALEOF +``` + +## Cleanup + +Always close sessions when done: +```bash +agent-browser close +agent-browser --session s1 close +```