diff --git a/docs/configuration.md b/docs/configuration.md index f15a14c9a..4e77300cf 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -754,6 +754,7 @@ Scheduled tasks persist across restarts and are stored in `~/.picoclaw/workspace | Topic | Description | | ----- | ----------- | +| [Sensitive Data Filtering](sensitive_data_filtering.md) | Filter API keys and tokens from tool results before sending to LLM | | [Hook System](hooks/README.md) | Event-driven hooks: observers, interceptors, approval hooks | | [Steering](steering.md) | Inject messages into a running agent loop between tool calls | | [SubTurn](subturn.md) | Subagent coordination, concurrency control, lifecycle | diff --git a/docs/sensitive_data_filtering.md b/docs/sensitive_data_filtering.md new file mode 100644 index 000000000..0c10ff01d --- /dev/null +++ b/docs/sensitive_data_filtering.md @@ -0,0 +1,107 @@ +# Sensitive Data Filtering + +PicoClaw can filter sensitive values (API keys, tokens, secrets, passwords) from tool call results before they are sent to the LLM. This prevents the LLM from seeing its own credentials, which could otherwise leak through tool output or cause confusing behavior. + +--- + +## Overview + +When the LLM uses a tool that returns its own credentials (e.g., a tool that echoes the API key being used), those values are automatically replaced with `[FILTERED]` in the message sent to the LLM. + +Sensitive values are collected from [`.security.yml`](./credential_encryption.md) — the centralized storage for all sensitive configuration (API keys, tokens, secrets stored alongside `config.json`). This includes: + +- Model API keys +- Channel tokens (Telegram, Discord, Slack, Matrix, etc.) +- Web tool API keys (Brave, Tavily, Perplexity, etc.) +- Skills tokens (GitHub, ClawHub) + +--- + +## Configuration + +Sensitive data filtering is configured in the `tools` section of `config.json`: + +| Config | Type | Default | Description | +|--------|------|---------|-------------| +| `filter_sensitive_data` | bool | `true` | Enable/disable filtering. When `false`, no filtering is performed. | +| `filter_min_length` | int | `8` | Minimum content length to trigger filtering. Short content is skipped for performance. | + +```json +{ + "tools": { + "filter_sensitive_data": true, + "filter_min_length": 8 + } +} +``` + +### Environment Variable + +| Variable | Description | +|----------|-------------| +| `PICOCLAW_TOOLS_FILTER_SENSITIVE_DATA` | Set to `true` or `false` to override the config value | + +--- + +## How It Works + +1. **On startup**: All sensitive values are collected from `.security.yml` using reflection and compiled into a `strings.Replacer` (O(n+m) performance, computed once). + +2. **Per tool result**: Before sending any tool result content to the LLM: + - If `filter_sensitive_data` is `false`, content is passed through unchanged + - If content length < `filter_min_length`, content is passed through unchanged (fast path) + - Otherwise, all sensitive values are replaced with `[FILTERED]` + +3. **Replacement**: Uses `strings.Replacer` for efficient O(n+m) string substitution, where n = content length and m = total sensitive value length. + +--- + +## Example + +Given the following `.security.yml`: + +```yaml +model_list: + my-model: + api_keys: + - sk-secret-key-12345 + +channels: + telegram: + token: "123456:ABC-DEF" +``` + +And a tool result containing: + +``` +The model is using API key sk-secret-key-12345 and Telegram bot 123456:ABC-DEF +``` + +The LLM will receive: + +``` +The model is using API key [FILTERED] and Telegram bot [FILTERED] +``` + +--- + +## Performance + +- **Fast path**: Content shorter than `filter_min_length` (default 8) is returned unchanged without any string scanning +- **Efficient replacement**: Uses `strings.Replacer` with O(n+m) complexity instead of regex +- **Lazy initialization**: The replacement map is built once on first access via `sync.Once` + +--- + +## Security Considerations + +- **Credential exposure prevention**: Without filtering, tools that echo credentials could cause the LLM to see its own API keys, potentially leading to confusion or credential leakage in logs +- **Defense in depth**: Filtering complements (but does not replace) credential encryption — both features should be used together +- **No false positives**: Only values explicitly stored in `.security.yml` are filtered; the LLM's general knowledge is unaffected + +--- + +## Related + +- [Credential Encryption](./credential_encryption.md) — encrypting API keys in config +- [Tools Configuration](./tools_configuration.md) diff --git a/docs/tools_configuration.md b/docs/tools_configuration.md index 0528fe714..b5907b991 100644 --- a/docs/tools_configuration.md +++ b/docs/tools_configuration.md @@ -26,6 +26,17 @@ PicoClaw's tools configuration is located in the `tools` field of `config.json`. } ``` +## Sensitive Data Filtering + +Before tool results are sent to the LLM, PicoClaw can filter sensitive values (API keys, tokens, secrets) from the output. This prevents the LLM from seeing its own credentials. + +See [Sensitive Data Filtering](../sensitive_data_filtering.md) for full documentation. + +| Config | Type | Default | Description | +|--------|------|---------|-------------| +| `filter_sensitive_data` | bool | `true` | Enable/disable filtering | +| `filter_min_length` | int | `8` | Minimum content length to trigger filtering | + ## Web Tools Web tools are used for web search and fetching. diff --git a/docs/zh/configuration.md b/docs/zh/configuration.md index 695e22829..335566d36 100644 --- a/docs/zh/configuration.md +++ b/docs/zh/configuration.md @@ -623,6 +623,7 @@ PicoClaw 通过 `cron` 工具支持 cron 风格的定时任务。Agent 可以设 | 主题 | 说明 | | ---- | ---- | +| [敏感数据过滤](../sensitive_data_filtering.md) | 在发送给 LLM 前,从工具结果中过滤 API 密钥和令牌 | | [Hook 系统](../hooks/README.zh.md) | 事件驱动 Hook:观察者、拦截器、审批 Hook | | [Steering](../steering.md) | 在工具调用间向运行中的 Agent 注入消息 | | [SubTurn](../subturn.md) | 子 Agent 协调、并发控制、生命周期管理 | diff --git a/docs/zh/sensitive_data_filtering.md b/docs/zh/sensitive_data_filtering.md new file mode 100644 index 000000000..4382706ed --- /dev/null +++ b/docs/zh/sensitive_data_filtering.md @@ -0,0 +1,107 @@ +# 敏感数据过滤 + +PicoClaw 可以从工具调用结果中过滤敏感值(API 密钥、令牌、密码等),然后再发送给 LLM。这可以防止 LLM 看到自己的凭据,避免通过工具输出泄露或产生混淆行为。 + +--- + +## 概述 + +当 LLM 使用的工具返回其自身的凭据时(例如,一个回显正在使用的 API 密钥的工具),这些值会自动替换为 `[FILTERED]` 再发送给 LLM。 + +敏感值从 `.security.yml` 中收集 —— 这是所有敏感配置的集中存储,包括: + +- 模型 API 密钥 +- 频道令牌(Telegram、Discord、Slack、Matrix 等) +- Web 工具 API 密钥(Brave、Tavily、Perplexity 等) +- 技能令牌(GitHub、ClawHub) + +--- + +## 配置 + +敏感数据过滤在 `config.json` 的 `tools` 部分配置: + +| 配置 | 类型 | 默认值 | 说明 | +|------|------|--------|------| +| `filter_sensitive_data` | bool | `true` | 启用/禁用过滤。为 `false` 时,不进行任何过滤。 | +| `filter_min_length` | int | `8` | 触发过滤的最小内容长度。短内容会被跳过以提高性能。 | + +```json +{ + "tools": { + "filter_sensitive_data": true, + "filter_min_length": 8 + } +} +``` + +### 环境变量 + +| 变量 | 说明 | +|------|------| +| `PICOCLAW_TOOLS_FILTER_SENSITIVE_DATA` | 设置为 `true` 或 `false` 以覆盖配置值 | + +--- + +## 工作原理 + +1. **启动时**:使用反射从 `.security.yml` 中收集所有敏感值,并编译成 `strings.Replacer`(O(n+m) 性能,仅计算一次)。 + +2. **每个工具结果**:在将任何工具结果发送给 LLM 之前: + - 如果 `filter_sensitive_data` 为 `false`,内容原样传递 + - 如果内容长度 < `filter_min_length`,内容原样传递(快速路径) + - 否则,所有敏感值都会被替换为 `[FILTERED]` + +3. **替换**:使用 `strings.Replacer` 进行高效的 O(n+m) 字符串替换,其中 n = 内容长度,m = 敏感值总长度。 + +--- + +## 示例 + +给定以下 `.security.yml`: + +```yaml +model_list: + my-model: + api_keys: + - sk-secret-key-12345 + +channels: + telegram: + token: "123456:ABC-DEF" +``` + +以及包含以下内容的工具结果: + +``` +The model is using API key sk-secret-key-12345 and Telegram bot 123456:ABC-DEF +``` + +LLM 将收到: + +``` +The model is using API key [FILTERED] and Telegram bot [FILTERED] +``` + +--- + +## 性能 + +- **快速路径**:短于 `filter_min_length`(默认 8)的内容会直接返回,不进行任何字符串扫描 +- **高效替换**:使用 `strings.Replacer`,复杂度为 O(n+m),而非正则表达式 +- **延迟初始化**:替换映射通过 `sync.Once` 在首次访问时构建一次 + +--- + +## 安全注意事项 + +- **凭据泄露防护**:如果没有过滤,返回凭据的工具可能导致 LLM 看到自己的 API 密钥,可能导致日志中泄露凭据或产生混淆 +- **纵深防御**:过滤是对凭据加密的补充(而非替代)—— 应同时使用这两个功能 +- **无误报**:只有明确存储在 `.security.yml` 中的值才会被过滤;LLM 的通用知识不受影响 + +--- + +## 相关文档 + +- [凭据加密](../credential_encryption.md) — 配置中 API 密钥的加密 +- [工具配置](../tools_configuration.md) diff --git a/docs/zh/tools_configuration.md b/docs/zh/tools_configuration.md index a3816a35a..63ac5000b 100644 --- a/docs/zh/tools_configuration.md +++ b/docs/zh/tools_configuration.md @@ -28,6 +28,17 @@ PicoClaw 的工具配置位于 `config.json` 的 `tools` 字段中。 } ``` +## 敏感数据过滤 + +在将工具结果发送给 LLM 之前,PicoClaw 可以从输出中过滤敏感值(API 密钥、令牌、密码)。这可以防止 LLM 看到自己的凭据。 + +详细说明请参阅[敏感数据过滤](../sensitive_data_filtering.md)。 + +| 配置项 | 类型 | 默认值 | 描述 | +|--------|------|--------|------| +| `filter_sensitive_data` | bool | `true` | 启用/禁用过滤 | +| `filter_min_length` | int | `8` | 触发过滤的最小内容长度 | + ## Web 工具 Web 工具用于网页搜索和抓取。 diff --git a/pkg/agent/loop.go b/pkg/agent/loop.go index 72c78c729..24d628d66 100644 --- a/pkg/agent/loop.go +++ b/pkg/agent/loop.go @@ -1733,7 +1733,8 @@ turnLoop: select { case result, ok := <-ts.pendingResults: if ok && result != nil && result.ForLLM != "" { - msg := providers.Message{Role: "user", Content: fmt.Sprintf("[SubTurn Result] %s", result.ForLLM)} + content := al.cfg.FilterSensitiveData(result.ForLLM) + msg := providers.Message{Role: "user", Content: fmt.Sprintf("[SubTurn Result] %s", content)} pendingMessages = append(pendingMessages, msg) } default: @@ -2336,6 +2337,9 @@ turnLoop: return } + // Filter sensitive data before publishing + content = al.cfg.FilterSensitiveData(content) + logger.InfoCF("agent", "Async tool completed, publishing result", map[string]any{ "tool": asyncToolName, @@ -2451,6 +2455,11 @@ turnLoop: contentForLLM = toolResult.Err.Error() } + // Filter sensitive data (API keys, tokens, secrets) before sending to LLM + if al.cfg.Tools.IsFilterSensitiveDataEnabled() { + contentForLLM = al.cfg.FilterSensitiveData(contentForLLM) + } + toolResultMsg := providers.Message{ Role: "tool", Content: contentForLLM, @@ -2528,7 +2537,8 @@ turnLoop: select { case result, ok := <-ts.pendingResults: if ok && result != nil && result.ForLLM != "" { - msg := providers.Message{Role: "user", Content: fmt.Sprintf("[SubTurn Result] %s", result.ForLLM)} + content := al.cfg.FilterSensitiveData(result.ForLLM) + msg := providers.Message{Role: "user", Content: fmt.Sprintf("[SubTurn Result] %s", content)} messages = append(messages, msg) ts.agent.Sessions.AddFullMessage(ts.sessionKey, msg) } diff --git a/pkg/config/config.go b/pkg/config/config.go index 33919d9d7..68cfdcb54 100644 --- a/pkg/config/config.go +++ b/pkg/config/config.go @@ -114,6 +114,25 @@ func (c *Config) WithSecurity(sec *SecurityConfig) *Config { return c } +// FilterSensitiveData filters sensitive values from content before sending to LLM. +// This prevents the LLM from seeing its own credentials. +// Uses strings.Replacer for O(n+m) performance (computed once per SecurityConfig). +// Short content (below FilterMinLength) is returned unchanged for performance. +func (c *Config) FilterSensitiveData(content string) string { + if c.security == nil || content == "" { + return content + } + // Check if filtering is enabled (default: true) + if !c.Tools.IsFilterSensitiveDataEnabled() { + return content + } + // Fast path: skip filtering for short content + if len(content) < c.Tools.GetFilterMinLength() { + return content + } + return c.security.SensitiveDataReplacer().Replace(content) +} + type HooksConfig struct { Enabled bool `json:"enabled"` Defaults HookDefaultsConfig `json:"defaults,omitempty"` @@ -1201,8 +1220,16 @@ type ReadFileToolConfig struct { } type ToolsConfig struct { - AllowReadPaths []string `json:"allow_read_paths" env:"PICOCLAW_TOOLS_ALLOW_READ_PATHS"` - AllowWritePaths []string `json:"allow_write_paths" env:"PICOCLAW_TOOLS_ALLOW_WRITE_PATHS"` + AllowReadPaths []string `json:"allow_read_paths" env:"PICOCLAW_TOOLS_ALLOW_READ_PATHS"` + AllowWritePaths []string `json:"allow_write_paths" env:"PICOCLAW_TOOLS_ALLOW_WRITE_PATHS"` + // FilterSensitiveData controls whether to filter sensitive values (API keys, + // tokens, secrets) from tool results before sending to the LLM. + // Default: true (enabled) + FilterSensitiveData bool `json:"filter_sensitive_data" env:"PICOCLAW_TOOLS_FILTER_SENSITIVE_DATA"` + // FilterMinLength is the minimum content length required for filtering. + // Content shorter than this will be returned unchanged for performance. + // Default: 8 + FilterMinLength int `json:"filter_min_length" env:"PICOCLAW_TOOLS_FILTER_MIN_LENGTH"` Web WebToolsConfig `json:"web"` Cron CronToolsConfig `json:"cron"` Exec ExecConfig `json:"exec"` @@ -1226,6 +1253,19 @@ type ToolsConfig struct { WriteFile ToolConfig `json:"write_file" envPrefix:"PICOCLAW_TOOLS_WRITE_FILE_"` } +// IsFilterSensitiveDataEnabled returns true if sensitive data filtering is enabled +func (c *ToolsConfig) IsFilterSensitiveDataEnabled() bool { + return c.FilterSensitiveData +} + +// GetFilterMinLength returns the minimum content length for filtering (default: 8) +func (c *ToolsConfig) GetFilterMinLength() int { + if c.FilterMinLength <= 0 { + return 8 + } + return c.FilterMinLength +} + type SearchCacheConfig struct { MaxSize int `json:"max_size" env:"PICOCLAW_SKILLS_SEARCH_CACHE_MAX_SIZE"` TTLSeconds int `json:"ttl_seconds" env:"PICOCLAW_SKILLS_SEARCH_CACHE_TTL_SECONDS"` diff --git a/pkg/config/config_test.go b/pkg/config/config_test.go index 3f8ec6150..88a48fc21 100644 --- a/pkg/config/config_test.go +++ b/pkg/config/config_test.go @@ -436,6 +436,40 @@ func TestDefaultConfig_ExecAllowRemoteEnabled(t *testing.T) { } } +func TestDefaultConfig_FilterSensitiveDataEnabled(t *testing.T) { + cfg := DefaultConfig() + if !cfg.Tools.FilterSensitiveData { + t.Fatal("DefaultConfig().Tools.FilterSensitiveData should be true") + } +} + +func TestDefaultConfig_FilterMinLength(t *testing.T) { + cfg := DefaultConfig() + if cfg.Tools.FilterMinLength != 8 { + t.Fatalf("DefaultConfig().Tools.FilterMinLength = %d, want 8", cfg.Tools.FilterMinLength) + } +} + +func TestToolsConfig_GetFilterMinLength(t *testing.T) { + tests := []struct { + name string + minLen int + expected int + }{ + {"zero returns default", 0, 8}, + {"negative returns default", -1, 8}, + {"positive returns value", 16, 16}, + } + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + cfg := &ToolsConfig{FilterMinLength: tt.minLen} + if got := cfg.GetFilterMinLength(); got != tt.expected { + t.Errorf("GetFilterMinLength() = %v, want %v", got, tt.expected) + } + }) + } +} + func TestDefaultConfig_CronAllowCommandEnabled(t *testing.T) { cfg := DefaultConfig() if !cfg.Tools.Cron.AllowCommand { @@ -1252,3 +1286,179 @@ func TestDefaultConfig_MinimaxExtraBody(t *testing.T) { t.Fatalf("Minimax ExtraBody[reasoning_split] = %v, want true", got) } } + +func TestFilterSensitiveData(t *testing.T) { + // Test with nil security config + cfg := &Config{} + if got := cfg.FilterSensitiveData("hello sk-key123 world"); got != "hello sk-key123 world" { + t.Errorf("nil security: got %q, want original", got) + } + + // Test with empty content + cfg.security = &SecurityConfig{} + if got := cfg.FilterSensitiveData(""); got != "" { + t.Errorf("empty content: got %q, want empty", got) + } + + // Test short content (less than FilterMinLength=8, should skip filtering) + cfg.security.ModelList = map[string]ModelSecurityEntry{ + "test": {APIKeys: []string{"sk-long-key-12345"}}, + } + cfg.Tools.FilterSensitiveData = true + cfg.Tools.FilterMinLength = 8 + + // Debug: check if sensitive values are collected + values := cfg.security.collectSensitiveValues() + t.Logf("collected %d sensitive values: %v", len(values), values) + + if got := cfg.FilterSensitiveData("sk-key"); got != "sk-key" { + t.Errorf("short content should not be filtered: got %q", got) + } + + // Test filtering works + content := "Your API key is sk-long-key-12345 and token abc123" + // abc123 is not in sensitive values, only sk-long-key-12345 should be filtered + expected := "Your API key is [FILTERED] and token abc123" + if got := cfg.FilterSensitiveData(content); got != expected { + t.Errorf("filtering failed: got %q, want %q", got, expected) + } + + // Test disabled filtering + cfg.Tools.FilterSensitiveData = false + if got := cfg.FilterSensitiveData(content); got != content { + t.Errorf("disabled filtering: got %q, want original %q", got, content) + } +} + +func TestFilterSensitiveData_MultipleKeys(t *testing.T) { + cfg := &Config{ + Tools: ToolsConfig{ + FilterSensitiveData: true, + FilterMinLength: 8, + }, + } + cfg.security = &SecurityConfig{ + ModelList: map[string]ModelSecurityEntry{ + "model1": {APIKeys: []string{"key-one", "key-two"}}, + "model2": {APIKeys: []string{"key-three"}}, + }, + } + + content := "key-one and key-two and key-three should be filtered" + expected := "[FILTERED] and [FILTERED] and [FILTERED] should be filtered" + if got := cfg.FilterSensitiveData(content); got != expected { + t.Errorf("multiple keys: got %q, want %q", got, expected) + } +} + +func TestFilterSensitiveData_AllTokenTypes(t *testing.T) { + cfg := &Config{ + Tools: ToolsConfig{ + FilterSensitiveData: true, + FilterMinLength: 8, + }, + } + cfg.security = &SecurityConfig{ + // Model API keys + ModelList: map[string]ModelSecurityEntry{ + "test-model": {APIKeys: []string{"sk-model-key-12345"}}, + }, + // Channel tokens + Channels: ChannelsSecurity{ + Telegram: &TelegramSecurity{Token: "telegram-bot-token-abcdef"}, + Discord: &DiscordSecurity{Token: "discord-bot-token-xyz789"}, + Slack: &SlackSecurity{BotToken: "xoxb-slack-bot-token", AppToken: "xapp-slack-app-token"}, + Matrix: &MatrixSecurity{AccessToken: "matrix-access-token-abc"}, + Feishu: &FeishuSecurity{AppSecret: "feishu-app-secret-123", EncryptKey: "feishu-encrypt-key"}, + DingTalk: &DingTalkSecurity{ClientSecret: "dingtalk-client-secret"}, + OneBot: &OneBotSecurity{AccessToken: "onebot-access-token"}, + WeCom: &WeComSecurity{Token: "wecom-token", EncodingAESKey: "wecom-aes-key"}, + WeComApp: &WeComAppSecurity{CorpSecret: "wecom-app-secret", Token: "wecom-app-token"}, + Pico: &PicoSecurity{Token: "pico-token-abc123"}, + IRC: &IRCSecurity{Password: "irc-password", NickServPassword: "nickserv-pass", SASLPassword: "sasl-pass"}, + }, + // Web tool API keys + Web: WebToolsSecurity{ + Brave: &BraveSecurity{APIKeys: []string{"brave-api-key"}}, + Tavily: &TavilySecurity{APIKeys: []string{"tavily-api-key"}}, + Perplexity: &PerplexitySecurity{APIKeys: []string{"perplexity-api-key"}}, + GLMSearch: &GLMSearchSecurity{APIKey: "glm-search-key"}, + BaiduSearch: &BaiduSearchSecurity{APIKey: "baidu-search-key"}, + }, + // Skills tokens + Skills: SkillsSecurity{ + Github: &GithubSecurity{Token: "github-token-xyz"}, + ClawHub: &ClawHubSecurity{AuthToken: "clawhub-auth-token"}, + }, + } + + tests := []struct { + name string + content string + want string + }{ + { + name: "model_api_key", + content: "Using model with key sk-model-key-12345", + want: "Using model with key [FILTERED]", + }, + { + name: "telegram_token", + content: "Telegram token: telegram-bot-token-abcdef", + want: "Telegram token: [FILTERED]", + }, + { + name: "discord_token", + content: "Discord token: discord-bot-token-xyz789", + want: "Discord token: [FILTERED]", + }, + { + name: "slack_tokens", + content: "Slack bot: xoxb-slack-bot-token, app: xapp-slack-app-token", + want: "Slack bot: [FILTERED], app: [FILTERED]", + }, + { + name: "matrix_token", + content: "Matrix access token: matrix-access-token-abc", + want: "Matrix access token: [FILTERED]", + }, + { + name: "brave_api_key", + content: "Brave key: brave-api-key", + want: "Brave key: [FILTERED]", + }, + { + name: "tavily_api_key", + content: "Tavily key: tavily-api-key", + want: "Tavily key: [FILTERED]", + }, + { + name: "github_token", + content: "GitHub token: github-token-xyz", + want: "GitHub token: [FILTERED]", + }, + { + name: "irc_passwords", + content: "IRC password: irc-password, nickserv: nickserv-pass", + want: "IRC password: [FILTERED], nickserv: [FILTERED]", + }, + { + name: "mixed_content", + content: "Model key sk-model-key-12345 and Telegram token telegram-bot-token-abcdef", + want: "Model key [FILTERED] and Telegram token [FILTERED]", + }, + { + name: "short_key_not_filtered", + content: "Key abc not filtered because length < 8", + want: "Key abc not filtered because length < 8", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + if got := cfg.FilterSensitiveData(tt.content); got != tt.want { + t.Errorf("got %q, want %q", got, tt.want) + } + }) + } +} diff --git a/pkg/config/defaults.go b/pkg/config/defaults.go index ccfd5732a..48c03f988 100644 --- a/pkg/config/defaults.go +++ b/pkg/config/defaults.go @@ -378,6 +378,8 @@ func DefaultConfig() *Config { LogLevel: "fatal", }, Tools: ToolsConfig{ + FilterSensitiveData: true, + FilterMinLength: 8, MediaCleanup: MediaCleanupConfig{ ToolConfig: ToolConfig{ Enabled: true, diff --git a/pkg/config/security.go b/pkg/config/security.go index fe2111280..c6641f099 100644 --- a/pkg/config/security.go +++ b/pkg/config/security.go @@ -10,6 +10,9 @@ import ( "fmt" "os" "path/filepath" + "reflect" + "strings" + "sync" "github.com/caarlos0/env/v11" "github.com/tencent-connect/botgo/log" @@ -35,6 +38,9 @@ type SecurityConfig struct { Web WebToolsSecurity `yaml:"web,omitempty"` Skills SkillsSecurity `yaml:"skills,omitempty"` + + // cache for sensitive values and compiled regex (computed once) + sensitiveCache *SensitiveDataCache } // ModelSecurityEntry stores security data for a model @@ -218,3 +224,91 @@ func saveSecurityConfig(securityPath string, sec *SecurityConfig) error { } return fileutil.WriteFileAtomic(securityPath, buf.Bytes(), 0o600) } + +// SensitiveDataCache caches the compiled regex for filtering sensitive data. +// SensitiveDataCache caches the strings.Replacer for filtering sensitive data. +// Computed once on first access via sync.Once. +type SensitiveDataCache struct { + replacer *strings.Replacer + once sync.Once +} + +// SensitiveDataReplacer returns the strings.Replacer for filtering sensitive data. +// It is computed once on first access via sync.Once. +func (sec *SecurityConfig) SensitiveDataReplacer() *strings.Replacer { + sec.initSensitiveCache() + return sec.sensitiveCache.replacer +} + +// initSensitiveCache initializes the sensitive data cache if not already done. +func (sec *SecurityConfig) initSensitiveCache() { + if sec.sensitiveCache == nil { + sec.sensitiveCache = &SensitiveDataCache{} + } + sec.sensitiveCache.once.Do(func() { + values := sec.collectSensitiveValues() + if len(values) == 0 { + sec.sensitiveCache.replacer = strings.NewReplacer() + return + } + + // Build old/new pairs for strings.Replacer + var pairs []string + for _, v := range values { + if len(v) > 3 { + pairs = append(pairs, v, "[FILTERED]") + } + } + if len(pairs) == 0 { + sec.sensitiveCache.replacer = strings.NewReplacer() + return + } + sec.sensitiveCache.replacer = strings.NewReplacer(pairs...) + }) +} + +// collectSensitiveValues collects all sensitive strings from SecurityConfig using reflection. +func (sec *SecurityConfig) collectSensitiveValues() []string { + var values []string + collectSensitive(reflect.ValueOf(sec), &values) + return values +} + +// collectSensitive recursively traverses the value and collects all non-empty string fields. +func collectSensitive(v reflect.Value, values *[]string) { + // Dereference pointers/interfaces to get the underlying value + for v.Kind() == reflect.Ptr || v.Kind() == reflect.Interface { + if v.IsNil() { + return + } + v = v.Elem() + } + + switch v.Kind() { + case reflect.Struct: + for i := 0; i < v.NumField(); i++ { + field := v.Field(i) + fieldType := v.Type().Field(i) + if !fieldType.IsExported() { + continue + } + collectSensitive(field, values) + } + case reflect.String: + if v.String() != "" { + *values = append(*values, v.String()) + } + case reflect.Slice: + if v.Type().Elem().Kind() == reflect.String { + for i := 0; i < v.Len(); i++ { + if s := v.Index(i).String(); s != "" { + *values = append(*values, s) + } + } + } + case reflect.Map: + for _, key := range v.MapKeys() { + collectSensitive(v.MapIndex(key), values) + } + } +} diff --git a/pkg/logger/panic_win.go b/pkg/logger/panic_win.go index 29d3f21d8..1e6eead02 100644 --- a/pkg/logger/panic_win.go +++ b/pkg/logger/panic_win.go @@ -12,7 +12,7 @@ import ( ) func initPanicFile(panicFile string) io.WriteCloser { - file, err := os.OpenFile(panicFile, os.O_WRONLY|os.O_CREATE|os.O_SYNC|os.O_APPEND, 0600) + file, err := os.OpenFile(panicFile, os.O_WRONLY|os.O_CREATE|os.O_SYNC|os.O_APPEND, 0o600) if err != nil { panic(fmt.Sprintf("error in open panic: %v", err)) }