Skip to content

Memory leaking in insert_htmlbox? #4727

@quachtinh761

Description

@quachtinh761

Description of the bug

I noticed that memory is constantly increasing in my application and traced this to the insert_htmlbox call, if I remove this, everything is fine.

How to reproduce the bug

I've made a simple code to show the memory:

import os
import psutil
import pymupdf
import gc
import fitz

print("Initial memory used (MB):", psutil.Process(os.getpid()).memory_info().rss / 1024**2)
text_block = (
    {
        "text": "Table 19: Human Development Index (HDI)",
        "translated_text": "Table 19: Human Development Index (HDI)",
        "font": "Verdana,Bold",
        "size": 10.979999542236328,
        "bold": True,
        "italic": False,
        "underline": False,
        "span": {
            "size": 10.979999542236328,
            "flags": 16,
            "bidi": 0,
            "char_flags": 24,
            "font": "Verdana,Bold",
            "color": 0,
            "alpha": 255,
            "ascender": 1.0720000267028809,
            "descender": -0.30300000309944153,
            "text": "Table 19: Human Development Index (HDI)  ",
            "origin": [90.0, 86.4000244140625],
            "bbox": [90.0, 74.62946319580078, 368.0684814453125, 89.72696685791016],
        },
        "color": 0,
    },
)
rect = pymupdf.Rect(text_block[0]["span"]["bbox"])
doc = pymupdf.open()

page = doc.new_page(width=rect.width, height=rect.height)
for runtime in range(200):
    page.add_redact_annot(rect, fill=None)
    page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_NONE)  # white fill in RGB
    page.insert_htmlbox(rect, f"<div style='font-size: {text_block[0]['size']}px; font-weight: {'bold' if text_block[0]['bold'] else 'normal'}; font-style: {'italic' if text_block[0]['italic'] else 'normal'}; text-decoration: {'underline' if text_block[0]['underline'] else 'none'}; font-family: {text_block[0]['font']}; color: #{text_block[0]['color']:06x};'>{text_block[0]['translated_text']}</div>")
    if runtime % 10 == 0:
        print(f"Memory used (MB) after {runtime} insertions:", psutil.Process(os.getpid()).memory_info().rss / 1024**2)
    doc.subset_fonts()
    fitz.TOOLS.store_shrink(100)
    gc.collect()  # Force garbage collection

print("Final memory used (MB):", psutil.Process(os.getpid()).memory_info().rss / 1024**2)
doc.ez_save("leak_memory.pdf")

The issue happens with line 41 from the above lines of code (page.insert_htmlbox(........))
Below is the result before/after I commented it out:

Before removing insert_htmlbox After removing insert_htmlbox
Image Image

PyMuPDF version

1.26.4

Operating system

MacOS

Python version

3.13

Metadata

Metadata

Assignees

No one assigned

    Labels

    postponepostpone to a future version

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions