🔍 DeepSeek-OCR-WebUI

Intelligent OCR System · Batch Processing · Multi-Mode Support · Bounding Box Visualization

Features • Quick Start • Version History • Documentation • Contributing

📖 Introduction

DeepSeek-OCR-WebUI is an intelligent image recognition web application based on the DeepSeek-OCR model, providing an intuitive user interface and powerful recognition capabilities.

🖼️ UI Preview

Modern user interface with multilingual support, batch processing, and bounding box visualization

✨ Core Highlights

🎯 7 Recognition Modes - Document, OCR, Chart, Find, Freeform, etc.
🖼️ Bounding Box Visualization - Find mode automatically annotates positions
📦 Batch Processing - Support for multiple image sequential recognition
🎨 Modern UI - Cool gradient backgrounds and animation effects
🌐 Multilingual Support - Simplified Chinese, Traditional Chinese, English, Japanese
🐳 Docker Deployment - One-click startup, ready to use
⚡ GPU Acceleration - High-performance inference based on NVIDIA GPU

🚀 Features

7 Recognition Modes

Mode	Icon	Description	Use Cases
Doc to Markdown	📄	Preserve format and layout	Contracts, papers, reports
General OCR	📝	Extract all visible text	Image text extraction
Plain Text	📋	Pure text without format	Simple text recognition
Chart Parser	📊	Recognize charts and formulas	Data charts, math formulas
Image Description	🖼️	Generate detailed descriptions	Image understanding, accessibility
Find & Locate ⭐	🔍	Find and annotate positions	Invoice field locating
Custom Prompt ⭐	✨	Customize recognition needs	Flexible recognition tasks

🎨 Find Mode Features

Left-Right Split Layout:

┌──────────────────────┬─────────────────────────────┐
│   Left: Control Panel │    Right: Result Display    │
├──────────────────────┼─────────────────────────────┤
│ 📤 Image Upload      │ 🖼️ Result Image (with boxes) │
│ 🎯 Search Input      │ 📊 Statistics               │
│ 🚀 Action Buttons    │ 📝 Recognition Text         │
│                      │ 📦 Match List                │
└──────────────────────┴─────────────────────────────┘

Bounding Box Visualization:

🟢 Colorful neon border auto-annotation
🎨 6 colors in rotation
📍 Precise coordinate positioning
🔄 Responsive auto-redraw

Feature Demo:

Find & Locate mode in action: Upload on left, auto-annotated results on right

🌐 Multilingual Support

Supported Languages

🇨🇳 Simplified Chinese (zh-CN)
🇹🇼 Traditional Chinese (zh-TW)
🇺🇸 English (en-US) - Default
🇯🇵 Japanese (ja-JP)

How to Switch Language

Web UI:

Click the language selector in the top-right corner
Select your desired language
Interface switches immediately, settings auto-save

📦 Quick Start

Prerequisites

Docker & Docker Compose
NVIDIA GPU + Drivers (recommended)
8GB+ RAM
20GB+ Disk Space

One-Click Startup

# 1. Clone repository
git clone https://github.com/neosun100/DeepSeek-OCR-WebUI.git
cd DeepSeek-OCR-WebUI

# 2. Start service
docker compose up -d

# 3. Wait for model loading (about 1-2 minutes)
docker logs -f deepseek-ocr-webui

# 4. Access Web UI
# http://localhost:8001

Verify Installation

# Check container status
docker compose ps

# Check health status
curl http://localhost:8001/health

# View logs
docker logs deepseek-ocr-webui

📊 Version History

v3.1 (2025-10-22) - Multilingual & Bug Fixes

🌐 New Features:

✅ Added multilingual support (Simplified Chinese, Traditional Chinese, English, Japanese)
✅ Language selector UI component
✅ Localization persistence storage
✅ Multilingual documentation (README)

🐛 Bug Fixes:

✅ Fixed mode switching issues
✅ Fixed bounding boxes exceeding image boundaries
✅ Optimized image container layout
✅ Added rendering delay for alignment

🎨 UI Optimization:

✅ Centered image display
✅ Responsive bounding box redraw
✅ Language switcher integration

v3.0 (2025-10-22) - Find Mode & Split Layout

✨ Major Updates:

✅ New Find mode (find & locate)
✅ Dedicated left-right split layout
✅ Canvas bounding box visualization
✅ Colorful neon annotation effects

🔧 Technical Improvements:

✅ transformers engine (replacing vLLM)
✅ Precise coordinate conversion algorithm
✅ Responsive design optimization

📖 Documentation

User Documentation

Technical Documentation

🎯 Usage Examples

Find Mode Example

Scenario: Find "Total" amount in invoice

Steps:
1. Select "🔍 Find & Locate" mode
2. Upload invoice image
3. Enter search term: Total
4. Click "🚀 Start Search"

Results:
✓ "Total" marked with green border on image
✓ Shows 1-2 matches found
✓ Provides precise coordinate information

Batch Processing Example

Scenario: Batch recognize 20 contracts

Steps:
1. Select "📄 Doc to Markdown" mode
2. Drag and upload 20 images
3. Adjust order (optional)
4. Click "🚀 Start Recognition"

Results:
✓ Process each image sequentially
✓ Real-time progress display
✓ Auto-merge all results
✓ One-click copy or download

🔧 Configuration

Environment Variables

# docker-compose.yml
API_HOST=0.0.0.0              # Listen address
MODEL_NAME=deepseek-ai/DeepSeek-OCR  # Model name
CUDA_VISIBLE_DEVICES=0        # GPU device

Performance Tuning

# Memory configuration
shm_size: "8g"                # Shared memory

# GPU configuration
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

🤝 Contributing

Contributions welcome! Please check the Contributing Guide.

How to Contribute

Fork this repository
Create feature branch (git checkout -b feature/AmazingFeature)
Commit changes (git commit -m 'Add some AmazingFeature')
Push to branch (git push origin feature/AmazingFeature)
Open Pull Request

📞 Support

Having Issues?

Check Troubleshooting
Check Known Issues
Submit an Issue

Feature Suggestions?

Check Roadmap
Submit a Feature Request

📱 Follow Us

Scan to get more information

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

DeepSeek-AI - DeepSeek-OCR model
deepseek_ocr_app - Reference project
All contributors and users

🔗 Related Links

⭐ If this project helps you, please give it a Star! ⭐

Made with ❤️ by neosun100

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
DeepSeek-OCR-master		DeepSeek-OCR-master
assets		assets
images		images
.dockerignore		.dockerignore
.gitignore		.gitignore
ABOUT.md		ABOUT.md
BUGFIX_SUMMARY.md		BUGFIX_SUMMARY.md
CHANGELOG.md		CHANGELOG.md
DEPLOYMENT_SUMMARY.md		DEPLOYMENT_SUMMARY.md
DeepSeek_OCR_paper.pdf		DeepSeek_OCR_paper.pdf
Dockerfile		Dockerfile
ENHANCED_FEATURES.md		ENHANCED_FEATURES.md
FINAL_SUMMARY.txt		FINAL_SUMMARY.txt
FIND_MODE_GUIDE.md		FIND_MODE_GUIDE.md
FIND_MODE_V2_GUIDE.md		FIND_MODE_V2_GUIDE.md
GITHUB_INTEGRATION.md		GITHUB_INTEGRATION.md
I18N_IMPLEMENTATION.md		I18N_IMPLEMENTATION.md
I18N_TEST_GUIDE.md		I18N_TEST_GUIDE.md
LICENSE		LICENSE
MULTILINGUAL_SUPPORT.txt		MULTILINGUAL_SUPPORT.txt
PUSH_SUMMARY.md		PUSH_SUMMARY.md
QUICK_START.md		QUICK_START.md
README.md		README.md
README_ja.md		README_ja.md
README_zh-CN.md		README_zh-CN.md
README_zh-TW.md		README_zh-TW.md
boundary_issue.png		boundary_issue.png
deepseek-ocr.service		deepseek-ocr.service
docker-compose.yml		docker-compose.yml
find_mode_issue.png		find_mode_issue.png
i18n.js		i18n.js
ocr_ui_enhanced.html		ocr_ui_enhanced.html
ocr_ui_modern.html		ocr_ui_modern.html
ocr_ui_modern.html.backup		ocr_ui_modern.html.backup
ocr_ui_modern_backup.html		ocr_ui_modern_backup.html
ocr_ui_modern_backup_v3.html		ocr_ui_modern_backup_v3.html
requirements.txt		requirements.txt
web_service.py		web_service.py
web_service_vllm_backup.py		web_service_vllm_backup.py

License

neosun100/DeepSeek-OCR-WebUI

Folders and files

Latest commit

History

Repository files navigation