Skip to content

PyMuPDF 1.26.4: Multiple bugs when opening PDFs from byte streams #4723

@chaudronmagic

Description

@chaudronmagic

Description of the bug

Description

When opening PDF documents from byte streams using fitz.open(stream=pdf_content, filetype="pdf") in PyMuPDF version 1.26.4, two bugs occur that prevent normal operation.

Environment

  • PyMuPDF Version: 1.26.4
  • Python Version: 3.12.11
  • Operating System: Linux (Docker container)
  • Installation Method: pip/uv

Bug 1: AttributeError in needs_pass property

Error Message

AttributeError: 'FzDocument' object has no attribute 'super'

Stack Trace

Traceback (most recent call last):
  File "pdf_processor.py", line 325, in _compress_with_settings
    doc = fitz.open(stream=pdf_content, filetype="pdf")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/pymupdf/__init__.py", line 3008, in __init__
    if self.needs_pass:
       ^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/pymupdf/__init__.py", line 5021, in needs_pass
    document = self.this if isinstance(self.this, mupdf.FzDocument) else self.this.super()
                                                                         ^^^^^^^^^^^^^^^
AttributeError: 'FzDocument' object has no attribute 'super'

Root Cause

In pymupdf/__init__.py line 5021, the code attempts to call self.this.super() when self.this is already an FzDocument object, but FzDocument doesn't have a super() method.

Bug 2: AssertionError in _loadOutline

Error Message

AssertionError

Stack Trace

Traceback (most recent call last):
  File "pdf_processor.py", line 362, in _compress_with_settings
    doc = fitz.open(stream=pdf_content, filetype="pdf")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/pymupdf/__init__.py", line 3011, in __init__
    self.init_doc()
  File "/app/.venv/lib/python3.12/site-packages/pymupdf/__init__.py", line 4463, in init_doc
    self._outline = self._loadOutline()
                    ^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/pymupdf/__init__.py", line 3491, in _loadOutline
    assert isinstance( doc, mupdf.FzDocument)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

Root Cause

In pymupdf/__init__.py line 3491, an assertion fails when checking if doc is an instance of mupdf.FzDocument, even though it should be.

Reproduction Code

import fitz
import io

# Read any PDF file
with open("sample.pdf", "rb") as f:
    pdf_content = f.read()

# This fails with the bugs described above
try:
    doc = fitz.open(stream=pdf_content, filetype="pdf")
    print(f"Document opened: {len(doc)} pages")
    doc.close()
except (AttributeError, AssertionError) as e:
    print(f"Error: {type(e).__name__}: {e}")

Expected Behavior

The PDF should open successfully from a byte stream without errors.

Actual Behavior

Opening PDFs from byte streams fails with either:

  1. AttributeError: 'FzDocument' object has no attribute 'super'
  2. AssertionError in _loadOutline()

Workaround

Opening the PDF from a file path works correctly:

import fitz
import tempfile
import os

# Save to temporary file and open from path
with tempfile.NamedTemporaryFile(suffix='.pdf', delete=False) as tmp:
    tmp.write(pdf_content)
    tmp_path = tmp.name

try:
    doc = fitz.open(tmp_path)  # This works
    print(f"Document opened: {len(doc)} pages")
    doc.close()
finally:
    os.unlink(tmp_path)

Impact

This bug prevents using PyMuPDF with in-memory PDF data, which is critical for:

  • Web applications processing uploaded PDFs
  • Microservices handling PDF streams
  • PDF processing pipelines that avoid disk I/O
  • Cloud functions with read-only filesystems

Additional Information

  • The bugs appear to be related to how PyMuPDF handles the internal document object when created from streams vs. file paths
  • Both fitz.open() and fitz.Document() constructors are affected
  • The issue does NOT occur when opening PDFs from file paths
  • This affects document processing workflows in production environments

Suggested Fix

  1. In line 5021, check if self.this has a super attribute before calling it
  2. In line 3491, review the assertion logic for stream-opened documents
  3. Ensure consistent behavior between file-based and stream-based document opening

Related Issues

Please let me know if this is related to any existing issues or if additional debugging information would be helpful.

How to reproduce the bug

PyMuPDF version

1.26.5

Operating system

Linux

Python version

3.12

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions