Intelligent OCR System · Batch Processing · Multi-Mode Support · Bounding Box Visualization
Features • Quick Start • Version History • Documentation • Contributing
DeepSeek-OCR-WebUI is an intelligent image recognition web application based on the DeepSeek-OCR model, providing an intuitive user interface and powerful recognition capabilities.
- 🎯 7 Recognition Modes - Document, OCR, Chart, Find, Freeform, etc.
- 🖼️ Bounding Box Visualization - Find mode automatically annotates positions
- 📦 Batch Processing - Support for multiple image sequential recognition
- 🎨 Modern UI - Cool gradient backgrounds and animation effects
- 🌐 Multilingual Support - Simplified Chinese, Traditional Chinese, English, Japanese
- 🐳 Docker Deployment - One-click startup, ready to use
- ⚡ GPU Acceleration - High-performance inference based on NVIDIA GPU
| Mode | Icon | Description | Use Cases | 
|---|---|---|---|
| Doc to Markdown | 📄 | Preserve format and layout | Contracts, papers, reports | 
| General OCR | 📝 | Extract all visible text | Image text extraction | 
| Plain Text | 📋 | Pure text without format | Simple text recognition | 
| Chart Parser | 📊 | Recognize charts and formulas | Data charts, math formulas | 
| Image Description | 🖼️ | Generate detailed descriptions | Image understanding, accessibility | 
| Find & Locate ⭐ | 🔍 | Find and annotate positions | Invoice field locating | 
| Custom Prompt ⭐ | ✨ | Customize recognition needs | Flexible recognition tasks | 
Left-Right Split Layout:
┌──────────────────────┬─────────────────────────────┐
│   Left: Control Panel │    Right: Result Display    │
├──────────────────────┼─────────────────────────────┤
│ 📤 Image Upload      │ 🖼️ Result Image (with boxes) │
│ 🎯 Search Input      │ 📊 Statistics               │
│ 🚀 Action Buttons    │ 📝 Recognition Text         │
│                      │ 📦 Match List                │
└──────────────────────┴─────────────────────────────┘
Bounding Box Visualization:
- 🟢 Colorful neon border auto-annotation
- 🎨 6 colors in rotation
- 📍 Precise coordinate positioning
- 🔄 Responsive auto-redraw
Feature Demo:
- 🇨🇳 Simplified Chinese (zh-CN)
- 🇹🇼 Traditional Chinese (zh-TW)
- 🇺🇸 English (en-US) - Default
- 🇯🇵 Japanese (ja-JP)
Web UI:
- Click the language selector in the top-right corner
- Select your desired language
- Interface switches immediately, settings auto-save
- Docker & Docker Compose
- NVIDIA GPU + Drivers (recommended)
- 8GB+ RAM
- 20GB+ Disk Space
# 1. Clone repository
git clone https://github.com/neosun100/DeepSeek-OCR-WebUI.git
cd DeepSeek-OCR-WebUI
# 2. Start service
docker compose up -d
# 3. Wait for model loading (about 1-2 minutes)
docker logs -f deepseek-ocr-webui
# 4. Access Web UI
# http://localhost:8001# Check container status
docker compose ps
# Check health status
curl http://localhost:8001/health
# View logs
docker logs deepseek-ocr-webui🌐 New Features:
- ✅ Added multilingual support (Simplified Chinese, Traditional Chinese, English, Japanese)
- ✅ Language selector UI component
- ✅ Localization persistence storage
- ✅ Multilingual documentation (README)
🐛 Bug Fixes:
- ✅ Fixed mode switching issues
- ✅ Fixed bounding boxes exceeding image boundaries
- ✅ Optimized image container layout
- ✅ Added rendering delay for alignment
🎨 UI Optimization:
- ✅ Centered image display
- ✅ Responsive bounding box redraw
- ✅ Language switcher integration
✨ Major Updates:
- ✅ New Find mode (find & locate)
- ✅ Dedicated left-right split layout
- ✅ Canvas bounding box visualization
- ✅ Colorful neon annotation effects
🔧 Technical Improvements:
- ✅ transformers engine (replacing vLLM)
- ✅ Precise coordinate conversion algorithm
- ✅ Responsive design optimization
Scenario: Find "Total" amount in invoice
Steps:
1. Select "🔍 Find & Locate" mode
2. Upload invoice image
3. Enter search term: Total
4. Click "🚀 Start Search"
Results:
✓ "Total" marked with green border on image
✓ Shows 1-2 matches found
✓ Provides precise coordinate informationScenario: Batch recognize 20 contracts
Steps:
1. Select "📄 Doc to Markdown" mode
2. Drag and upload 20 images
3. Adjust order (optional)
4. Click "🚀 Start Recognition"
Results:
✓ Process each image sequentially
✓ Real-time progress display
✓ Auto-merge all results
✓ One-click copy or download# docker-compose.yml
API_HOST=0.0.0.0              # Listen address
MODEL_NAME=deepseek-ai/DeepSeek-OCR  # Model name
CUDA_VISIBLE_DEVICES=0        # GPU device# Memory configuration
shm_size: "8g"                # Shared memory
# GPU configuration
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]Contributions welcome! Please check the Contributing Guide.
- Fork this repository
- Create feature branch (git checkout -b feature/AmazingFeature)
- Commit changes (git commit -m 'Add some AmazingFeature')
- Push to branch (git push origin feature/AmazingFeature)
- Open Pull Request
- Check Troubleshooting
- Check Known Issues
- Submit an Issue
- Check Roadmap
- Submit a Feature Request
This project is licensed under the MIT License.
- DeepSeek-AI - DeepSeek-OCR model
- deepseek_ocr_app - Reference project
- All contributors and users
⭐ If this project helps you, please give it a Star! ⭐
Made with ❤️ by neosun100
DeepSeek-OCR-WebUI v3.1 | © 2025


