Integration Test Suites
This directory contains the integration test suites that CI runs before merging PRs and before building main.
These tests exist to catch regressions that are easy to miss with unit tests alone, especially when different PRs touch adjacent code paths and the breakage only appears once the pieces are wired together. Typical examples are:
- protocol compatibility across package boundaries
- request and response behavior over real transports
- subprocess, CLI, or container wiring
- configuration passed through environment variables
- startup, discovery, and teardown flows involving more than one component
Two Layers of Integration Testing
PicoClaw currently uses two related mechanisms:
- Go integration tests, usually in
*_integration_test.gofiles and guarded by//go:build integration - Docker-backed suites in
integration/suites/that start real dependencies and run one or more of those Go tests in CI
That distinction matters:
- a tagged Go integration test is the test implementation
- a Docker-backed suite is how we make that test reproducible and CI-safe
If a test should actively protect merges between PRs, it should be reachable from scripts/run-integration-tests.sh, either by extending an existing suite or by adding a new one.
Some integration-tagged tests are intentionally opt-in and not part of the Docker suites. For example, real CLI smoke tests under pkg/providers/cli/ depend on external binaries being installed locally. Those are useful for manual verification, but they do not gate PR merges.
What CI Runs
The integration jobs in .github/workflows/pr.yml and .github/workflows/build.yml both execute:
bash ./scripts/run-integration-tests.sh
The runner auto-discovers every suite under integration/suites/, so adding a suite does not require editing the GitHub Actions workflow.
How the Runner Works
- The shared runner is defined in
integration/docker-compose.runner.yml. - Each suite lives in
integration/suites/<suite-name>/. - The runner script loads
suite.env, merges the shared compose file with the suite-specific compose files, starts dependency services, runs the suite command, and then tears everything down. - The shared runner container sets
GOFLAGS=-tags=goolm,stdjson,integration, so tests run with the same build tags used by CI.
In practice, each suite gives us:
- a deterministic dependency graph
- a stable execution environment
- automatic cleanup after the run
- a clean place to encode the exact regression we want to prevent
Current Reference Suite
integration/suites/mcp-streamable/ is the reference example today.
It does three things:
- builds and starts a fixture MCP server from
integration/fixtures/mcp-streamable-server/ - injects connection details into the runner container through environment variables
- runs
TestIntegration_RealConfiguredServerto verify that PicoClaw can connect to a real server, discover tools, invoke one, and validate the response payload
That suite complements TestIntegration_StreamableHTTPCompatibility, which exercises the same area in-process. Together they cover both protocol behavior and real service wiring.
Suite Layout
Each suite directory must contain:
suite.env- at least one
docker-compose.ymlordocker-compose.*.yml
Example:
integration/suites/my-suite/
├── docker-compose.yml
└── suite.env
Required Manifest Fields
suite.env is sourced by the runner script and must define:
TEST_COMMAND: shell command executed inside the integration runner container
Optional fields:
RUNNER_SERVICE: override the default runner service name (integration-runner)
Example:
TEST_COMMAND='go test ./pkg/mcp -run TestIntegration_RealConfiguredServer -v'
Running Integration Tests Locally
Prerequisites
- Docker with the
docker composeplugin for Docker-backed suites - Go 1.25+ only if you want to run tagged integration tests directly on your host instead of through Docker
Run Everything That CI Runs
make integration-test
Equivalent direct command:
bash ./scripts/run-integration-tests.sh
Run a Single Suite
bash ./scripts/run-integration-tests.sh mcp-streamable
This is the fastest way to reproduce exactly what the CI integration job does for one suite.
Run a Tagged Integration Test Directly
For faster iteration while writing the test, you can run the Go test itself without Docker:
go test -tags=goolm,stdjson,integration ./pkg/mcp -run TestIntegration_StreamableHTTPCompatibility -v
You can also run the real-server smoke test directly if you provide the same environment variables that the Docker suite would inject:
PICOCLAW_MCP_REAL_SERVER_JSON='{"enabled":true,"type":"http","url":"http://127.0.0.1:8080/mcp"}' \
PICOCLAW_MCP_REAL_TOOL_NAME=echo \
PICOCLAW_MCP_REAL_TOOL_ARGS_JSON='{"message":"hello"}' \
PICOCLAW_MCP_REAL_EXPECT_SUBSTRING=hello \
go test -tags=goolm,stdjson,integration ./pkg/mcp -run TestIntegration_RealConfiguredServer -v
Notes:
- avoid
-short: the current integration tests skip in short mode - use direct
go testfor tight feedback loops, then validate the Docker suite before committing
When to Add an Integration Test
Reach for an integration test when the risk lives in the interaction, not just in the function body. Good candidates include:
- code that crosses process or container boundaries
- transport-specific behavior such as HTTP, SSE, stdio, or streamable MCP flows
- CLI parsing and wiring that depends on real subprocess execution
- configuration propagation through files, env vars, headers, or service discovery
- regressions that usually appear only after merging separately reasonable PRs
Prefer a unit test when the behavior is pure, local, and fully controllable in-process.
How to Add a New Integration Test
1. Start from the regression you want to prevent
Describe the real workflow that could break after a merge. The sharper the scenario, the better the test will age.
Examples:
- "PicoClaw can still connect to a streamable MCP server after transport normalization changes."
- "A provider wrapper still parses the real CLI output format after refactors in response handling."
2. Implement the Go test
Add or extend a *_integration_test.go file in the package that owns the behavior and gate it with:
//go:build integration
Guidelines:
- keep the assertions focused on observable behavior
- use bounded timeouts
- prefer deterministic fixtures over internet access or shared external state
- skip only when the dependency is intentionally optional, such as a locally installed third-party CLI
3. Decide whether it must gate CI merges
Use this rule of thumb:
- if the test is only a manual smoke check, a tagged Go test may be enough
- if the test should prevent regressions from landing through PR merges, wire it into a Docker-backed suite
4. Reuse or add a suite
If an existing suite already exercises the same subsystem, extend it. Otherwise create:
integration/suites/<name>/
├── docker-compose.yml
└── suite.env
Use integration/fixtures/ for reusable helper services or fake servers.
5. Define the suite command
In suite.env, point TEST_COMMAND at the Go test you want CI to run.
Examples:
TEST_COMMAND='go test ./pkg/mcp -run TestIntegration_RealConfiguredServer -v'
TEST_COMMAND='go test ./pkg/somepkg -run TestIntegration_MyScenario -v'
You can also run multiple tests if they share the same environment, but keep suites cohesive and easy to diagnose when they fail.
6. Model the dependencies in Docker Compose
Suite compose files can:
- define dependency services needed by the tests
- extend or override the shared
integration-runnerservice - inject environment variables into the runner for the tests to consume
Practical advice:
- prefer Docker service names over hard-coded host ports
- add health checks when the runner depends on service readiness
- keep the suite self-contained and deterministic
7. Validate locally before committing
At minimum:
go test -tags=goolm,stdjson,integration ./path/to/package -run TestIntegration_Name -v
bash ./scripts/run-integration-tests.sh <suite-name>
The first command helps while authoring. The second proves that the CI path works end to end.
Review Checklist for New Suites
Before opening the PR, check that the new suite:
- reproduces a realistic cross-component failure mode
- is deterministic and isolated
- does not require manual setup in CI
- has clear failure output
- finishes in a reasonable amount of time
- cleans up after itself through the normal runner teardown
Once committed, the suite will be auto-discovered by the CI integration job.