* feat: add extended thinking support for Anthropic models
Support configurable thinking levels (off/low/medium/high/xhigh/adaptive)
via `agents.defaults.thinking_level` config field.
- "adaptive": uses Anthropic's adaptive thinking API (Claude 4.6+)
- "low/medium/high/xhigh": uses budget_tokens (all thinking-capable models)
- "off": disables thinking (default)
API constraints handled:
- Temperature cleared when thinking is enabled
- budget_tokens clamped to max_tokens-1
- Thinking response blocks parsed into Reasoning field
Relates to #645, #966
* fix: address PR review feedback for thinking support
- Add ThinkingCapable interface for provider capability detection
- Warn when thinking_level is set but provider doesn't support it
- Warn when temperature is cleared due to thinking enabled
- Adjust budget values per Anthropic best practices (medium=16K, xhigh=64K)
- Add budget clamp warning and 80% threshold warning
- Add parseResponse thinking block tests
- Add thinking_level field to config.example.json
* refactor: move ThinkingLevel from AgentDefaults to ModelConfig
Thinking is a model-level capability, not a global agent property.
Per-model config avoids silent ignoring on non-Anthropic providers
and eliminates spurious warning logs in multi-provider setups.
Addresses PR #1076 review feedback from @yinwm.
Avoid rebuilding the entire system prompt on every BuildMessages() call
by caching the static portion (identity, bootstrap, skills summary,
memory) and only recomputing it when workspace source files change.
Key changes:
- ContextBuilder caches the static prompt behind an RWMutex with
double-checked locking. Source file changes are detected via cheap
os.Stat mtime checks so no explicit invalidation is needed.
- Track file existence at cache time (existedAtCache map) so that
newly created or deleted bootstrap/memory files also trigger a
rebuild — the old modifiedSince() silently returned false on
os.IsNotExist.
- Walk the skills directory recursively with filepath.WalkDir to
catch content-only edits at any nesting depth; directory mtime
alone misses in-place file modifications on most filesystems.
- ToolRegistry.sortedToolNames() sorts tool names before iteration,
ensuring deterministic tool definition order across calls — a
prerequisite for LLM-side prefix/KV cache reuse.
- Merge all context (static + dynamic + summary) into a single
system message for provider compatibility: the Anthropic adapter
extracts messages[0] as the top-level system parameter, and Codex
reads only the first system message as instructions.
- Fix a data race in BuildMessages() where cachedSystemPrompt was
read without holding the lock in a debug log statement.
- Add tests: single system message invariant, mtime auto-invalidation,
new-file creation detection, skill file content change, explicit
InvalidateCache, cache stability, concurrent access (20 goroutines
x 50 iterations, passes go test -race), and a benchmark.
Add support for persisting thought_signature metadata from Google/Gemini 3
models. This introduces ExtraContent and GoogleExtra types to handle
provider-specific metadata, and ensures thought signatures are properly
preserved through the tool call lifecycle.
Resolve conflicts in pkg/providers/types.go and pkg/agent/loop.go:
- types.go: use protocoltypes aliases from PR #213, keep fallback types
- loop.go: drop old single-agent createToolRegistry (replaced by multi-agent pattern)
Refactor to align with PR #213 patterns:
- instance.go: use NewExecToolWithConfig (accept full config for deny patterns)
- registry.go: pass full config to NewAgentInstance
- loop.go: add Perplexity web search options to registerSharedTools
Phase 1: centralize protocol message/tool/response types in protocoltypes and keep compatibility aliases in providers and protocol packages.
Phase 1: preserve HTTPProvider constructor compatibility and route Anthropic api_base through factory auth/provider constructors with base URL normalization.
Phase 2: expand provider routing/auth tests (deepseek/nvidia/shengsuanyun, codex/claude oauth/codex-cli) and add openai_compat + anthropic coverage for proxy transport, model normalization, numeric option coercion, token-source refresh, and base URL behavior.
Phase 3: apply gofmt and validate with Dockerized tests (go test ./pkg/providers/... ./pkg/migrate and go test ./...).