refactor(bus): fix deadlock and concurrency issues in MessageBus

PublishInbound/PublishOutbound held RLock during blocking channel sends,
deadlocking against Close() which needs a write lock when the buffer is
full. ConsumeInbound/SubscribeOutbound used bare receives instead of
comma-ok, causing zero-value processing or busy loops after close.

Replace sync.RWMutex+bool with atomic.Bool+done channel so Publish
methods use a lock-free 3-way select (send / done / ctx.Done). Add
context.Context parameter to both Publish methods so callers can cancel
or timeout blocked sends. Close() now only sets the atomic flag and
closes the done channel—never closes the data channels—eliminating
send-on-closed-channel panics.

- Remove dead code: RegisterHandler, GetHandler, handlers map,
  MessageHandler type (zero callers across the whole repo)
- Add ErrBusClosed sentinel error
- Update all 10 caller sites to pass context
- Add msgBus.Close() to gateway and agent shutdown flows
- Add pkg/bus/bus_test.go with 11 test cases covering basic round-trip,
  context cancellation, closed-bus behavior, concurrent publish+close,
  full-buffer timeout, and idempotent Close
This commit is contained in:
Hoshina
2026-02-23 00:44:45 +08:00
parent 38a26d702c
commit afc7a1988f
11 changed files with 283 additions and 54 deletions
+1
View File
@@ -48,6 +48,7 @@ func agentCmd(message, sessionKey, model string, debug bool) error {
}
msgBus := bus.NewMessageBus()
defer msgBus.Close()
agentLoop := agent.NewAgentLoop(cfg, msgBus, provider)
// Print agent startup info (only for interactive mode)
+1
View File
@@ -223,6 +223,7 @@ func gatewayCmd(debug bool) error {
cp.Close()
}
cancel()
msgBus.Close()
healthServer.Stop(context.Background())
deviceService.Stop()
heartbeatService.Stop()