-
Notifications
You must be signed in to change notification settings - Fork 660
Open
Description
Description of the bug
I find memory building up in my application and traced this to the page.widgets()。
What's puzzling is that once I use the iterator it generates, even if it's only used for printing and not referencing the object in subsequent processes, a memory leak problem will occur. In the simplified script I provided, the analysis function will cause the memory to continue to grow and cannot be released, while analysis_test will not.
- This phenomenon occurs in both 1.24.7 and 1.26.0, but when I roll back the version to 1.20.2, the memory leak disappears.
How to reproduce the bug
```python
# coding = utf-8
"""
Created on 2025/10/15 17:08
@File : pdf_mem_leak
@Author: Y
Description :
"""
import gc
import time
import fitz
import tracemalloc
def analysis(stream_data):
pdf_info = fitz.Document(stream=stream_data, filetype='pdf')
tmp_list = range(len(pdf_info))
for page_num in tmp_list:
page = pdf_info[page_num]
raw_info = page.get_text('rawdict')['blocks']
page_widgets_list = page.widgets()
for widget_info in page_widgets_list:
print(widget_info)
del page_widgets_list
pdf_info.close()
pdf_info =None
Tools_ = fitz.TOOLS
Tools_.store_shrink(100)
gc.collect()
def analysis_test(stream_data):
pdf_info = fitz.Document(stream=stream_data, filetype='pdf')
tmp_list = range(len(pdf_info))
for page_num in tmp_list:
page = pdf_info[page_num]
raw_info = page.get_text('rawdict')['blocks']
page_widgets_list = page.widgets()
# for widget_info in page_widgets_list:
# print(widget_info)
del page_widgets_list
pdf_info.close()
pdf_info =None
Tools_ = fitz.TOOLS
Tools_.store_shrink(100)
gc.collect()
if __name__ =='__main__':
file_path = r'2407.10671v4.pdf'
tracemalloc.start(30)
snapshot1 = tracemalloc.take_snapshot()
last_record = []
for i in range(100):
print('iter is :{}'.format(i))
bytes_data = open(file_path,'rb').read()
analysis(bytes_data) #with memory leak
# analysis_test(bytes_data) #with not memory leak
gc.collect()
snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, 'traceback')
# top_stats = snapshot2.compare_to(snapshot1, 'lineno')
snapshot1 = tracemalloc.take_snapshot()
top_stats = sorted(top_stats, key=lambda x: -x.size_diff)
print("-----begin comp-----")
for nums,stat in enumerate(top_stats[0:10]):
if stat.size_diff<=0 or stat in last_record or stat.size_diff == stat.size or "tracemalloc" in stat.traceback.format()[0]:
continue
else:
print("index is :{}\nstat info:{}".format(nums,stat))
print("\n".join(stat.traceback.format()))
print("-----stop comp-----\n")
last_record = top_stats
[2407.10671v4.pdf](https://github.com/user-attachments/files/22955884/2407.10671v4.pdf)/
### PyMuPDF version
1.26.0
### Operating system
Linux
### Python version
3.9
Metadata
Metadata
Assignees
Labels
No labels