fix(feishu): fix image download with API fallback and post image support (#2708)

* fix(feishu): fix image download with API fallback and post image support

- Add Image.Get API fallback when MessageResource.Get fails (different
  permission scope: im:resource vs im:message:readonly)
- Extract and download images from post (rich text) messages
- Extract images from interactive card messages
- Deduplicate post image keys across locales
- Add comprehensive tests for new helpers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(media): add image path tags alongside base64 for LLM file access

Images are still base64-encoded into msg.Media for multimodal LLMs,
but now also get [image:path] tags injected into message content so
the LLM knows the local file path for save/forward operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(media): only auto-inject images for tool results, not user messages

Channel-received images (role=user) now get path tags only, letting
the LLM decide whether to view via load_image or just operate on
the file. Tool result images (role=tool, e.g. load_image) are
base64-encoded into a synthetic user message appended after the tool
message, since many LLM APIs don't support image_url in tool messages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(media): preserve tool-message ordering for multi-tool-call scenarios

Move synthetic user message (carrying base64 tool images) to after the
entire contiguous tool-message block instead of immediately after each
tool message. This preserves the assistant→tool→tool ordering required
by OpenAI-compatible APIs.

Also fix load_image to use generic [image: photo] placeholder so
injectPathTags can properly replace it with the actual path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update load_image test for [image: photo] placeholder

The test was checking ForLLM for the media:// ref, but load_image now
emits the generic [image: photo] placeholder instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(media): match all channel image placeholders in injectPathTags

Different channels emit different placeholder formats — Telegram/Feishu
use [image: photo], WeCom/WeChat/Line use bare [image], QQ/Discord use
[image: <filename>]. The previous string-match code only handled
[image: photo], so for the other channels the path tag was appended as
a duplicate, producing content like "[image] [image:/path]".

Switch to per-type regex that matches all generic placeholder shapes
while leaving path tags ([image:/path]) untouched. Also fixes the same
issue for [audio], [video], [file] tags. Added test coverage for the
various placeholder shapes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(media): skip path tag append for JSON content (Feishu cards/posts)

When content is structured JSON (interactive cards, post messages),
injectPathTags now skips the fallback append — only placeholder
replacement is attempted. This prevents corrupting JSON payloads
like {"schema":"2.0",...} with appended [image:/path] tags.

Adds looksLikeJSON() helper and three test cases covering JSON
objects, arrays, and an end-to-end resolveMediaRefs scenario with
Feishu card content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(media): prepend path tags for JSON content, narrow looksLikeJSON

Two fixes from code review:

1. looksLikeJSON now only checks for '{' prefix (not '['), avoiding
   false positives on regular text like "[update] see attached".

2. For JSON content (Feishu cards/posts), path tags are prepended
   before the JSON instead of being silently dropped. This ensures
   the LLM can discover attached images via the path tag while the
   JSON payload stays valid for downstream parsing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Guoguo
2026-04-30 11:08:00 +08:00
committed by GitHub
parent a36472b55f
commit cb1e1a3595
9 changed files with 625 additions and 122 deletions
+56
View File
@@ -64,6 +64,62 @@ func extractJSONStringField(content, field string) string {
// Format: {"image_key": "img_xxx"}
func extractImageKey(content string) string { return extractJSONStringField(content, "image_key") }
// extractPostImageKeys extracts all image_key values from a Feishu post (rich text)
// message. Post messages have nested arrays of elements where images appear as
// {"tag":"img","image_key":"img_xxx"}.
func extractPostImageKeys(rawContent string) []string {
if rawContent == "" {
return nil
}
var post map[string]json.RawMessage
if err := json.Unmarshal([]byte(rawContent), &post); err != nil {
return nil
}
var keys []string
seen := make(map[string]struct{})
collectFromRows := func(contentRaw json.RawMessage) {
var rows [][]map[string]any
if err := json.Unmarshal(contentRaw, &rows); err != nil {
return
}
for _, row := range rows {
for _, elem := range row {
if tag, _ := elem["tag"].(string); tag == "img" {
if ik, _ := elem["image_key"].(string); ik != "" {
if _, dup := seen[ik]; !dup {
seen[ik] = struct{}{}
keys = append(keys, ik)
}
}
}
}
}
}
// Flat format: {"title":"...", "content":[[...]]}
if contentRaw, ok := post["content"]; ok {
collectFromRows(contentRaw)
}
// Localized format: {"zh_cn": {"title":"...", "content":[[...]]}, ...}
for _, raw := range post {
var locale map[string]json.RawMessage
if err := json.Unmarshal(raw, &locale); err != nil {
continue
}
contentRaw, ok := locale["content"]
if !ok {
continue
}
collectFromRows(contentRaw)
}
return keys
}
// extractFileKey extracts the file_key from a Feishu file/audio message content JSON.
// Format: {"file_key": "file_xxx", "file_name": "...", ...}
func extractFileKey(content string) string { return extractJSONStringField(content, "file_key") }
+94
View File
@@ -291,6 +291,100 @@ func TestStripMentionPlaceholders(t *testing.T) {
}
}
func TestExtractPostImageKeys(t *testing.T) {
tests := []struct {
name string
content string
want []string
}{
{
name: "empty content",
content: "",
want: nil,
},
{
name: "invalid JSON",
content: "not json",
want: nil,
},
{
name: "post with no images",
content: `{"zh_cn":{"title":"Title","content":[[{"tag":"text","text":"hello"}]]}}`,
want: nil,
},
{
name: "post with one image",
content: `{"zh_cn":{"title":"","content":[[{"tag":"img","image_key":"img_v3_001"}]]}}`,
want: []string{"img_v3_001"},
},
{
name: "post with multiple images",
content: `{"zh_cn":{"title":"","content":[[{"tag":"text","text":"see"},{"tag":"img","image_key":"img_001"}],[{"tag":"img","image_key":"img_002"}]]}}`,
want: []string{"img_001", "img_002"},
},
{
name: "post with text and image mixed in row",
content: `{"zh_cn":{"title":"","content":[[{"tag":"text","text":"hi"},{"tag":"img","image_key":"img_mix"}]]}}`,
want: []string{"img_mix"},
},
{
name: "en_us locale",
content: `{"en_us":{"title":"","content":[[{"tag":"img","image_key":"img_en"}]]}}`,
want: []string{"img_en"},
},
{
name: "multiple locales with distinct images",
content: `{"zh_cn":{"title":"","content":[[{"tag":"img","image_key":"img_zh"}]]},"en_us":{"title":"","content":[[{"tag":"img","image_key":"img_en"}]]}}`,
want: []string{"img_zh", "img_en"},
},
{
name: "duplicate image_key across locales is deduplicated",
content: `{"zh_cn":{"title":"","content":[[{"tag":"img","image_key":"img_same"}]]},"en_us":{"title":"","content":[[{"tag":"img","image_key":"img_same"}]]}}`,
want: []string{"img_same"},
},
{
name: "image with empty image_key",
content: `{"zh_cn":{"title":"","content":[[{"tag":"img","image_key":""}]]}}`,
want: nil,
},
{
name: "flat format without locale wrapper",
content: `{"title":"","content":[[{"tag":"img","image_key":"img_v3_flat","width":1826,"height":338}],[{"tag":"text","text":" check this image","style":[]}]]}`,
want: []string{"img_v3_flat"},
},
{
name: "flat format multiple images",
content: `{"title":"","content":[[{"tag":"img","image_key":"img_flat_1"}],[{"tag":"img","image_key":"img_flat_2"},{"tag":"text","text":"desc"}]]}`,
want: []string{"img_flat_1", "img_flat_2"},
},
{
name: "flat format no images",
content: `{"title":"Test","content":[[{"tag":"text","text":"just text"}]]}`,
want: nil,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := extractPostImageKeys(tt.content)
if len(got) != len(tt.want) {
t.Errorf("extractPostImageKeys() = %v, want %v", got, tt.want)
return
}
// Use set comparison to avoid map iteration order dependency
gotSet := make(map[string]bool, len(got))
for _, v := range got {
gotSet[v] = true
}
for _, v := range tt.want {
if !gotSet[v] {
t.Errorf("extractPostImageKeys() missing expected key %q; got %v", v, got)
}
}
})
}
}
func TestExtractCardImageKeys(t *testing.T) {
tests := []struct {
name string
+100 -24
View File
@@ -803,6 +803,14 @@ func (c *FeishuChannel) downloadInboundMedia(
refs = append(refs, ref)
}
case larkim.MsgTypePost:
for _, imageKey := range extractPostImageKeys(rawContent) {
ref := c.downloadResource(ctx, messageID, imageKey, "image", ".jpg", store, scope)
if ref != "" {
refs = append(refs, ref)
}
}
case larkim.MsgTypeInteractive:
// Extract and download images embedded in interactive cards
feishuKeys, _ := extractCardImageKeys(rawContent)
@@ -842,12 +850,41 @@ func (c *FeishuChannel) downloadInboundMedia(
// downloadResource downloads a message resource (image/file) from Feishu,
// writes it to the project media directory, and stores the reference in MediaStore.
// fallbackExt (e.g. ".jpg") is appended when the resolved filename has no extension.
//
// For image resources, if the primary MessageResource.Get API fails (which
// requires im:message or im:message:readonly scope), a fallback to the
// Image.Get API (which requires im:resource scope) is attempted. This ensures
// image downloads succeed regardless of which permission the user has granted.
func (c *FeishuChannel) downloadResource(
ctx context.Context,
messageID, fileKey, resourceType, fallbackExt string,
store media.MediaStore,
scope string,
) string {
file, filename := c.fetchResourceData(ctx, messageID, fileKey, resourceType)
if file == nil {
return ""
}
if closer, ok := file.(io.Closer); ok {
defer closer.Close()
}
if filename == "" {
filename = fileKey
}
if filepath.Ext(filename) == "" && fallbackExt != "" {
filename += fallbackExt
}
return c.storeResourceFile(ctx, messageID, fileKey, filename, file, store, scope)
}
// fetchResourceData tries to download a resource from Feishu, first via
// MessageResource.Get, then falling back to Image.Get for image resources.
func (c *FeishuChannel) fetchResourceData(
ctx context.Context,
messageID, fileKey, resourceType string,
) (io.Reader, string) {
req := larkim.NewGetMessageResourceReqBuilder().
MessageId(messageID).
FileKey(fileKey).
@@ -855,41 +892,80 @@ func (c *FeishuChannel) downloadResource(
Build()
resp, err := c.client.Im.V1.MessageResource.Get(ctx, req)
if err == nil && resp.Success() && resp.File != nil {
return resp.File, resp.FileName
}
if err != nil {
logger.ErrorCF("feishu", "Failed to download resource", map[string]any{
logger.WarnCF("feishu", "MessageResource.Get failed", map[string]any{
"message_id": messageID,
"file_key": fileKey,
"error": err.Error(),
})
return ""
} else if !resp.Success() {
c.invalidateTokenOnAuthError(resp.Code)
logger.WarnCF("feishu", "MessageResource.Get api error", map[string]any{
"message_id": messageID,
"file_key": fileKey,
"code": resp.Code,
"msg": resp.Msg,
})
} else {
logger.WarnCF("feishu", "MessageResource.Get returned empty file body", map[string]any{
"message_id": messageID,
"file_key": fileKey,
})
}
if resourceType != "image" {
return nil, ""
}
return c.fetchImageDirect(ctx, fileKey)
}
// fetchImageDirect downloads an image using the Image.Get API
// (/open-apis/im/v1/images/:image_key), which requires the im:resource scope.
func (c *FeishuChannel) fetchImageDirect(ctx context.Context, imageKey string) (io.Reader, string) {
req := larkim.NewGetImageReqBuilder().
ImageKey(imageKey).
Build()
resp, err := c.client.Im.V1.Image.Get(ctx, req)
if err != nil {
logger.ErrorCF("feishu", "Image.Get fallback failed", map[string]any{
"image_key": imageKey,
"error": err.Error(),
})
return nil, ""
}
if !resp.Success() {
c.invalidateTokenOnAuthError(resp.Code)
logger.ErrorCF("feishu", "Resource download api error", map[string]any{
"code": resp.Code,
"msg": resp.Msg,
logger.ErrorCF("feishu", "Image.Get fallback api error", map[string]any{
"image_key": imageKey,
"code": resp.Code,
"msg": resp.Msg,
})
return ""
return nil, ""
}
if resp.File == nil {
return ""
}
// Safely close the underlying reader if it implements io.Closer (e.g. HTTP response body).
if closer, ok := resp.File.(io.Closer); ok {
defer closer.Close()
return nil, ""
}
filename := resp.FileName
if filename == "" {
filename = fileKey
}
// If filename still has no extension, append the fallback (like Telegram's ext parameter).
if filepath.Ext(filename) == "" && fallbackExt != "" {
filename += fallbackExt
}
logger.DebugCF("feishu", "Image downloaded via Image.Get fallback", map[string]any{
"image_key": imageKey,
})
return resp.File, resp.FileName
}
// Write to the shared picoclaw_media directory using a unique name to avoid collisions.
// storeResourceFile writes downloaded resource data to disk and registers it in the MediaStore.
func (c *FeishuChannel) storeResourceFile(
ctx context.Context,
messageID, fileKey, filename string,
file io.Reader,
store media.MediaStore,
scope string,
) string {
mediaDir := media.TempDir()
if mkdirErr := os.MkdirAll(mediaDir, 0o700); mkdirErr != nil {
logger.ErrorCF("feishu", "Failed to create media directory", map[string]any{
@@ -908,7 +984,7 @@ func (c *FeishuChannel) downloadResource(
return ""
}
if _, copyErr := io.Copy(out, resp.File); copyErr != nil {
if _, copyErr := io.Copy(out, file); copyErr != nil {
out.Close()
os.Remove(localPath)
logger.ErrorCF("feishu", "Failed to write resource to file", map[string]any{
@@ -943,8 +1019,8 @@ func appendMediaTags(content, messageType string, mediaRefs []string) string {
return content
}
// Don't append tags to JSON content (interactive cards) - would produce invalid JSON
if messageType == larkim.MsgTypeInteractive {
// Don't append tags to JSON content - would produce invalid JSON
if messageType == larkim.MsgTypeInteractive || messageType == larkim.MsgTypePost {
return content
}
+7
View File
@@ -180,6 +180,13 @@ func TestAppendMediaTags(t *testing.T) {
mediaRefs: []string{"ref1"},
want: `{"schema":"2.0","body":{"elements":[{"tag":"img","img_key":"img_123"}]}}`,
},
{
name: "post message with images returns content unchanged",
content: `{"zh_cn":{"title":"","content":[[{"tag":"img","image_key":"img_001"}]]}}`,
messageType: "post",
mediaRefs: []string{"ref1"},
want: `{"zh_cn":{"title":"","content":[[{"tag":"img","image_key":"img_001"}]]}}`,
},
}
for _, tt := range tests {