Forum Image Scraper

Download images from paginated forum threads with parallel processing and duplicate detection.

Limitations

No support for forums that require login or forums protected by services like Cloudflare.

Quick Start

Install dependencies: pip install -r requirements.txt
Configure input.properties - This file is required and contains all settings (forum URL, threads, page range, filters)
Run: python forum-image-scraper.py

Output Structure

Images are saved in a hierarchical folder structure matching the thread path:

../hostname/
  └── path/
      └── to/
          └── thread/
              ├── p1_abc12345.jpg
              ├── p1_def67890.jpg
              ├── p2_ghi11223.jpg
              └── ...

hostname: Extracted domain name from the forum
Full path hierarchy created from thread URL
pX_hash.jpg: Page number and unique hash identifier

Features

Parallel downloads (10 workers) for speed
Resolution filtering to skip small images
Hash-based duplicate detection across all pages
Progress feedback and statistics

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
LICENSE		LICENSE
README.md		README.md
download.properties		download.properties
forum-image-scraper.py		forum-image-scraper.py
forum.properties		forum.properties
google-images-scraper.py		google-images-scraper.py
google.properties		google.properties
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Forum Image Scraper

Limitations

Quick Start

Output Structure

Features

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

rclakmal/forum-image-scraper

Folders and files

Latest commit

History

Repository files navigation

Forum Image Scraper

Limitations

Quick Start

Output Structure

Features

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages