Skip to content

Title: 🐞 HTTP Node Fails with Large Binary Files Due to Unsanitized Process Data Storage #27533

@AuditAIH

Description

@AuditAIH

Self Checks

  • I have read the Contributing Guide and Language Policy.
  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report, otherwise it will be closed.
  • 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

1.9.1

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

Create a workflow with an HTTP Request node.
Configure the node to send a POST request with binary body type (e.g., upload a 100MB+ MP3 file).
Trigger the workflow by uploading the large file via chat or API.
Observe the execution result.
📸 Screenshot attached: Workflow appears to complete, but logs show database storage error and UI shows “任务已完成” while backend fails silently.

[Workflow Execution Log]

  • HTTP Request node starts...
  • File uploaded: test.mp3 (141.09 MB)
  • Node status changes to "SUCCEEDED" visually
  • But database throws: "storage limit exceeded" or "value too long for type jsonb"
Image

✔️ Expected Behavior

HTTP node should successfully execute and return response.
Binary file content should NOT be stored in process_data.
Only metadata (file size, filename, transfer method) should be logged.
Workflow should not fail due to database constraints.

❌ Actual Behavior

When uploading large binary files (>50MB), the HTTP node appears to succeed in UI (“任务已完成”), but:
The actual database insertion fails due to oversized JSONB field.
No clear error is shown to user — workflow continues silently.
Subsequent nodes may receive incomplete or corrupted context.
System performance degrades due to massive log entries.
💡 Root Cause: In core/workflow/nodes/http_request/executor.py, the to_log() method decodes binary content as UTF-8 string using .decode("utf-8", errors="replace"), which expands 100MB binary into ~1GB+ text string before storing it in process_data.

Additional Context
This issue is critical for production use cases involving:

Audio/video transcription
Document processing (PDFs, images)
File uploads via API or chat
The current behavior leads to:

Database bloat
Silent failures
Security risk (storing raw binary data in logs)

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions