Skip to content

🎨 Ready-to-use DeepSeek-OCR Web UI | Modern Interface | 7 Recognition Modes | Batch Processing | Real-time Logging | Fully Responsive

License

Notifications You must be signed in to change notification settings

neosun100/DeepSeek-OCR-WebUI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔍 DeepSeek-OCR-WebUI

Visit Application →

🌐 English | 简体中文 | 繁體中文 | 日本語

Version Docker License Language

Intelligent OCR System · Batch Processing · Multi-Mode Support · Bounding Box Visualization

FeaturesQuick StartVersion HistoryDocumentationContributing


📖 Introduction

DeepSeek-OCR-WebUI is an intelligent image recognition web application based on the DeepSeek-OCR model, providing an intuitive user interface and powerful recognition capabilities.

🖼️ UI Preview

DeepSeek-OCR-WebUI Interface

Modern user interface with multilingual support, batch processing, and bounding box visualization

✨ Core Highlights

  • 🎯 7 Recognition Modes - Document, OCR, Chart, Find, Freeform, etc.
  • 🖼️ Bounding Box Visualization - Find mode automatically annotates positions
  • 📦 Batch Processing - Support for multiple image sequential recognition
  • 🎨 Modern UI - Cool gradient backgrounds and animation effects
  • 🌐 Multilingual Support - Simplified Chinese, Traditional Chinese, English, Japanese
  • 🐳 Docker Deployment - One-click startup, ready to use
  • GPU Acceleration - High-performance inference based on NVIDIA GPU

🚀 Features

7 Recognition Modes

Mode Icon Description Use Cases
Doc to Markdown 📄 Preserve format and layout Contracts, papers, reports
General OCR 📝 Extract all visible text Image text extraction
Plain Text 📋 Pure text without format Simple text recognition
Chart Parser 📊 Recognize charts and formulas Data charts, math formulas
Image Description 🖼️ Generate detailed descriptions Image understanding, accessibility
Find & Locate 🔍 Find and annotate positions Invoice field locating
Custom Prompt Customize recognition needs Flexible recognition tasks

🎨 Find Mode Features

Left-Right Split Layout:

┌──────────────────────┬─────────────────────────────┐
│   Left: Control Panel │    Right: Result Display    │
├──────────────────────┼─────────────────────────────┤
│ 📤 Image Upload      │ 🖼️ Result Image (with boxes) │
│ 🎯 Search Input      │ 📊 Statistics               │
│ 🚀 Action Buttons    │ 📝 Recognition Text         │
│                      │ 📦 Match List                │
└──────────────────────┴─────────────────────────────┘

Bounding Box Visualization:

  • 🟢 Colorful neon border auto-annotation
  • 🎨 6 colors in rotation
  • 📍 Precise coordinate positioning
  • 🔄 Responsive auto-redraw

Feature Demo:

Find Mode Demo

Find & Locate mode in action: Upload on left, auto-annotated results on right


🌐 Multilingual Support

Supported Languages

  • 🇨🇳 Simplified Chinese (zh-CN)
  • 🇹🇼 Traditional Chinese (zh-TW)
  • 🇺🇸 English (en-US) - Default
  • 🇯🇵 Japanese (ja-JP)

How to Switch Language

Web UI:

  1. Click the language selector in the top-right corner
  2. Select your desired language
  3. Interface switches immediately, settings auto-save

📦 Quick Start

Prerequisites

  • Docker & Docker Compose
  • NVIDIA GPU + Drivers (recommended)
  • 8GB+ RAM
  • 20GB+ Disk Space

One-Click Startup

# 1. Clone repository
git clone https://github.com/neosun100/DeepSeek-OCR-WebUI.git
cd DeepSeek-OCR-WebUI

# 2. Start service
docker compose up -d

# 3. Wait for model loading (about 1-2 minutes)
docker logs -f deepseek-ocr-webui

# 4. Access Web UI
# http://localhost:8001

Verify Installation

# Check container status
docker compose ps

# Check health status
curl http://localhost:8001/health

# View logs
docker logs deepseek-ocr-webui

📊 Version History

v3.1 (2025-10-22) - Multilingual & Bug Fixes

🌐 New Features:

  • ✅ Added multilingual support (Simplified Chinese, Traditional Chinese, English, Japanese)
  • ✅ Language selector UI component
  • ✅ Localization persistence storage
  • ✅ Multilingual documentation (README)

🐛 Bug Fixes:

  • ✅ Fixed mode switching issues
  • ✅ Fixed bounding boxes exceeding image boundaries
  • ✅ Optimized image container layout
  • ✅ Added rendering delay for alignment

🎨 UI Optimization:

  • ✅ Centered image display
  • ✅ Responsive bounding box redraw
  • ✅ Language switcher integration

v3.0 (2025-10-22) - Find Mode & Split Layout

✨ Major Updates:

  • ✅ New Find mode (find & locate)
  • ✅ Dedicated left-right split layout
  • ✅ Canvas bounding box visualization
  • ✅ Colorful neon annotation effects

🔧 Technical Improvements:

  • ✅ transformers engine (replacing vLLM)
  • ✅ Precise coordinate conversion algorithm
  • ✅ Responsive design optimization

📖 Documentation

User Documentation

Technical Documentation


🎯 Usage Examples

Find Mode Example

Scenario: Find "Total" amount in invoice

Steps:
1. Select "🔍 Find & Locate" mode
2. Upload invoice image
3. Enter search term: Total
4. Click "🚀 Start Search"

Results:
✓ "Total" marked with green border on image
✓ Shows 1-2 matches found
✓ Provides precise coordinate information

Batch Processing Example

Scenario: Batch recognize 20 contracts

Steps:
1. Select "📄 Doc to Markdown" mode
2. Drag and upload 20 images
3. Adjust order (optional)
4. Click "🚀 Start Recognition"

Results:
✓ Process each image sequentially
✓ Real-time progress display
✓ Auto-merge all results
✓ One-click copy or download

🔧 Configuration

Environment Variables

# docker-compose.yml
API_HOST=0.0.0.0              # Listen address
MODEL_NAME=deepseek-ai/DeepSeek-OCR  # Model name
CUDA_VISIBLE_DEVICES=0        # GPU device

Performance Tuning

# Memory configuration
shm_size: "8g"                # Shared memory

# GPU configuration
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

🤝 Contributing

Contributions welcome! Please check the Contributing Guide.

How to Contribute

  1. Fork this repository
  2. Create feature branch (git checkout -b feature/AmazingFeature)
  3. Commit changes (git commit -m 'Add some AmazingFeature')
  4. Push to branch (git push origin feature/AmazingFeature)
  5. Open Pull Request

📞 Support

Having Issues?

  1. Check Troubleshooting
  2. Check Known Issues
  3. Submit an Issue

Feature Suggestions?

  1. Check Roadmap
  2. Submit a Feature Request

📱 Follow Us

Scan to Follow

Scan to get more information


📄 License

This project is licensed under the MIT License.


🙏 Acknowledgments


🔗 Related Links


⭐ If this project helps you, please give it a Star! ⭐

Made with ❤️ by neosun100

DeepSeek-OCR-WebUI v3.1 | © 2025