A comprehensive, production-ready framework for multi-task deep learning in surgical video analysis, featuring instance segmentation, phase recognition, skill assessment, and video processing capabilities.
Cataract-LMM is an enterprise-grade AI framework designed for large-scale, multi-center surgical video analysis. Built on modern software engineering principles, this repository provides state-of-the-art deep learning models for comprehensive analysis of cataract surgery videos.
This framework implements methodologies from cutting-edge research in computer-assisted surgery, providing validated approaches for:
- Surgical Instance Segmentation using YOLO, Mask R-CNN, and SAM architectures
 - Surgical Phase Recognition with Video Transformers, 3D CNNs, and temporal models
 - Surgical Skill Assessment through multi-modal analysis and performance metrics
 - Video Processing with GPU-accelerated pipelines for medical video data
 
- Production-Ready: Enterprise-grade architecture with comprehensive testing and CI/CD
 - Multi-Task Learning: Unified framework supporting four core surgical analysis tasks
 - Scalable Design: Microservices-ready architecture with containerization support
 - Medical Compliance: HIPAA-aware design patterns and secure data handling
 - Research-to-Production: Seamless transition from research notebooks to production deployment
 
- ๐ Quick Start
 - โจ Features
 - ๐๏ธ Architecture
 - ๐ฆ Installation
 - ๐ฏ Usage Examples
 - ๐ ๏ธ Development
 - ๐ Model Zoo
 - ๐ง Configuration
 - ๐งช Testing
 - ๐ Documentation
 - ๐ค Contributing
 - ๐ License
 - ๐ฃ Citation
 - ๐จโ๐ป Author
 - ๐ Support & Community
 - ๐ Roadmap
 
- Python 3.8+
 - CUDA 11.8+ (for GPU acceleration)
 - FFmpeg (for video processing)
 - Docker (optional, for containerized deployment)
 
# Clone the repository
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM
# Install using Poetry (recommended)
cd codes
poetry install
# Activate virtual environment
poetry shell
# Or install using pip
pip install -r requirements.txt
# Validate installation
python setup.py --validate-only# Video processing
cd surgical-video-processing
python main.py --input path/to/video.mp4 --output ./results --config configs/default.yaml
# Instance segmentation  
cd surgical-instance-segmentation
python inference/predictor.py --model yolo --input data/images/
# Phase recognition
cd surgical-phase-recognition
python validation/training_framework.py --config configs/default.yaml --mode train
# Skill assessment
cd surgical-skill-assessment
python main.py --config configs/comprehensive.yaml --mode evaluate| Component | Models | Key Features | 
|---|---|---|
| Instance Segmentation | YOLO v8/11, Mask R-CNN, SAM | Real-time surgical instrument detection and segmentation | 
| Phase Recognition | Video Transformers, 3D CNNs, TeCNO | 11-phase surgical workflow analysis | 
| Skill Assessment | Multi-modal CNNs, Attention Models | Objective surgical skill evaluation | 
| Video Processing | GPU-Accelerated Pipelines | Medical-grade video preprocessing and enhancement | 
- ๐๏ธ Modular Architecture: Microservices-ready design with clear separation of concerns
 - ๐ Security First: HIPAA-compliant patterns, secure credential management
 - ๐ Comprehensive Testing: 85%+ test coverage with unit, integration, and E2E tests
 - ๐ CI/CD Pipeline: Automated testing, security scanning, and deployment workflows
 - ๐ Monitoring & Observability: Structured logging, metrics collection, and health checks
 - ๐ณ Containerization: Multi-stage Docker builds with security hardening
 
- ๐ Rich Documentation: Comprehensive guides, API references, and examples
 - ๐ฏ Configuration Management: YAML-based configuration with validation
 - ๐งช Development Tools: Pre-commit hooks, linting, formatting, and type checking
 - ๐ฆ Dependency Management: Poetry-based modern Python packaging
 - ๐ง Development Environment: VS Code integration with debugging support
 
graph TB
    A[Video Input] --> B[Video Processing Pipeline]
    B --> C[Frame Extraction & Preprocessing]
    C --> D[Multi-Task Analysis Engine]
    
    D --> E[Instance Segmentation]
    D --> F[Phase Recognition] 
    D --> G[Skill Assessment]
    
    E --> H[Surgical Instruments]
    F --> I[Surgery Phases]
    G --> J[Skill Metrics]
    
    H --> K[Clinical Decision Support]
    I --> K
    J --> K
    Cataract_LMM/
โโโ ๐  README.md                    # Project overview and documentation
โโโ ๐ LICENSE                     # CC-BY-4.0 license
โโโ ๐ค CONTRIBUTING.md             # Contribution guidelines
โโโ ๐ .gitignore                  # Git ignore patterns
โโโ ๐ codes/                      # Main codebase
โ   โโโ ๐ฌ surgical-video-processing/          # Video preprocessing and enhancement
โ   โ   โโโ core/                  # Core processing algorithms
โ   โ   โโโ pipelines/             # Processing pipelines
โ   โ   โโโ metadata/              # Video metadata management
โ   โ   โโโ quality_control/       # Quality assurance tools
โ   โ   โโโ configs/               # Configuration files
โ   โโโ ๐ฏ surgical-instance-segmentation/     # Instance segmentation models
โ   โ   โโโ models/                # YOLO, Mask R-CNN, SAM implementations
โ   โ   โโโ training/              # Training pipelines
โ   โ   โโโ inference/             # Real-time inference engines
โ   โ   โโโ evaluation/            # Model evaluation tools
โ   โ   โโโ data/                  # Dataset utilities
โ   โโโ ๐ surgical-phase-recognition/         # Phase classification models
โ   โ   โโโ models/                # Video Transformers, 3D CNNs, TeCNO
โ   โ   โโโ validation/            # Training and validation frameworks
โ   โ   โโโ preprocessing/         # Video preprocessing
โ   โ   โโโ analysis/              # Result analysis tools
โ   โ   โโโ configs/               # Model configurations
โ   โโโ ๐ surgical-skill-assessment/          # Skill evaluation framework
โ   โ   โโโ models/                # Skill assessment models
โ   โ   โโโ engine/                # Training and inference engines
โ   โ   โโโ utils/                 # Analysis utilities
โ   โ   โโโ configs/               # Assessment configurations
โ   โโโ ๐งช tests/                  # Comprehensive test suite
โ   โโโ ๐ docs/                   # Documentation source
โ   โโโ ๐ณ docker/                 # Docker configurations
โ   โโโ ๐ reports/                # Analysis reports
โ   โโโ โ๏ธ pyproject.toml         # Python project configuration
โ   โโโ ๐ Dockerfile             # Container definition
โ   โโโ ๐ Makefile               # Development automation
โ   โโโ ๐ง setup.py               # Project setup script
โโโ ๐ค .github/                   # GitHub configurations
โ   โโโ workflows/                 # CI/CD pipelines
โโโ ๐ security_scanning_demo.ipynb  # Security analysis notebook
| Component | Minimum | Recommended | 
|---|---|---|
| Python | 3.8 | 3.11+ | 
| RAM | 16GB | 32GB+ | 
| GPU Memory | 8GB | 24GB+ | 
| Storage | 50GB | 500GB+ | 
| CUDA | 11.8 | 12.0+ | 
# Install Poetry
curl -sSL https://install.python-poetry.org | python3 -
# Clone and setup
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM/codes
# Install dependencies
poetry install --extras "dev docs"
# Activate environment
poetry shell# Create environment
conda create -n cataract-lmm python=3.11
conda activate cataract-lmm
# Clone and install
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM/codes
pip install -r requirements.txt# Build container
docker build -t cataract-lmm:latest .
# Run interactive container
docker run -it --gpus all -v $(pwd)/data:/app/data cataract-lmm:latest# Run comprehensive validation
python setup.py --validate-only
# Run tests
pytest tests/ -v
# Check GPU availability
python -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}')"from surgical_video_processing import VideoProcessor, QualityController
# Initialize processor with configuration
processor = VideoProcessor("configs/high_quality.yaml")
# Process surgical video
result = processor.process_video(
    input_path="data/surgery_video.mp4",
    output_dir="outputs/processed/",
    apply_deidentification=True,
    quality_threshold=0.8
)
print(f"Processed {result.frame_count} frames")
print(f"Quality score: {result.average_quality:.3f}")from surgical_instance_segmentation import SegmentationPredictor
# Load pre-trained model
predictor = SegmentationPredictor(
    model_type="yolo_v8",
    device="cuda"
)
# Segment surgical instruments
results = predictor.predict_batch(
    image_paths=["frame001.jpg", "frame002.jpg"],
    confidence_threshold=0.7,
    save_visualizations=True
)
# Extract detections
for result in results:
    print(f"Detected {len(result.boxes)} instruments")
    print(f"Classes: {result.class_names}")from surgical_phase_recognition import PhaseClassifier
# Initialize phase recognition model
classifier = PhaseClassifier(
    model_name="video_transformer",
    config_path="configs/phase_recognition.yaml"
)
# Classify surgical phases in video sequence
phases = classifier.classify_sequence(
    video_path="data/surgery_complete.mp4",
    sequence_length=16,
    overlap=0.5
)
# Display phase timeline
for phase in phases:
    print(f"Time: {phase.timestamp:.2f}s - Phase: {phase.name}")from surgical_skill_assessment import SkillEvaluator
# Initialize skill assessment framework
evaluator = SkillEvaluator("configs/skill_assessment.yaml")
# Assess surgical performance
assessment = evaluator.evaluate_surgery(
    video_path="data/complete_surgery.mp4",
    phase_annotations="data/phases.json",
    surgeon_level="resident"  # resident, fellow, attending
)
# Generate skill report
report = evaluator.generate_report(assessment)
print(f"Overall Score: {report.overall_score}/100")
print(f"Efficiency: {report.efficiency_score}/10")
print(f"Precision: {report.precision_score}/10")# Clone repository
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM/codes
# Install development dependencies
poetry install --extras "dev"
# Setup pre-commit hooks
pre-commit install
# Run development server
make dev-server# Format code
make format
# Run linting
make lint
# Type checking
make type-check
# Security scanning
make security-scan
# Run all quality checks
make quality# Run unit tests
make test
# Run with coverage
make test-coverage
# Run integration tests
make test-integration
# Run end-to-end tests
make test-e2e
# Generate coverage report
make coverage-reportmake help              # Show all available commands
make install           # Install dependencies
make clean             # Clean build artifacts
make build             # Build distribution packages
make docker-build      # Build Docker image
make docker-run        # Run Docker container
make docs-build        # Build documentation
make docs-serve        # Serve documentation locally| Model | mAP@0.5:0.95 | 
|---|---|
| YOLOv11 โญ | 73.9% | 
| YOLOv8 | 73.8% | 
| SAM | 56.0% | 
| SAM2 | 55.2% | 
| Mask R-CNN | 53.7% | 
| Model | Backbone | Accuracy | F1-Score | Precision | Recall | 
|---|---|---|---|---|---|
| MViT-B โญ | - | 85.7% | 77.1% | 77.1% | 78.5% | 
| Swin-T | - | 85.5% | 76.2% | 77.5% | 77.2% | 
| CNN + GRU | EfficientNet-B5 | 82.1% | 71.3% | 76.0% | 70.4% | 
| CNN + TeCNO | EfficientNet-B5 | 81.7% | 71.2% | 75.1% | 71.2% | 
| CNN + LSTM | EfficientNet-B5 | 81.5% | 70.0% | 76.4% | 69.4% | 
| Model | Accuracy | Precision | Recall | F1-Score | 
|---|---|---|---|---|
| TimeSformer โญ | 82.5% | 86.0% | 82.0% | 83.9% | 
| R3D-18 | 81.7% | 82.4% | 84.9% | 83.6% | 
| Slow R50 | 80.0% | 81.8% | 81.8% | 81.8% | 
| X3D-M | 80.0% | 83.9% | 78.8% | 81.3% | 
| R(2+1)D-18 | 72.9% | 79.3% | 76.7% | 78.0% | 
The framework uses YAML-based configuration for all components:
processing:
  target_resolution: [1920, 1080]
  fps: 30
  quality_threshold: 0.75
  
deidentification:
  enabled: true
  blur_faces: true
  remove_text: true
  
output:
  format: "mp4"
  compression: "h264"
  quality: "high"model:
  architecture: "yolov8"
  size: "medium"
  pretrained: true
training:
  epochs: 100
  batch_size: 16
  learning_rate: 0.001
  
data:
  classes: ["forceps", "scissors", "needle_holder", "suction"]
  augmentation:
    enabled: true
    rotation: 15
    scaling: [0.8, 1.2]# Create .env file
cp .env.example .env
# Edit configuration
CUDA_VISIBLE_DEVICES=0,1
WANDB_PROJECT=cataract-lmm
DATA_ROOT=/path/to/data
OUTPUT_DIR=/path/to/outputs
LOG_LEVEL=INFOtests/
โโโ unit/                   # Unit tests for individual components
โโโ integration/            # Integration tests for module interactions  
โโโ e2e/                   # End-to-end workflow tests
โโโ performance/           # Performance and benchmarking tests
โโโ security/              # Security and vulnerability tests
โโโ fixtures/              # Test data and fixtures
โโโ conftest.py           # Pytest configuration
# Run all tests
pytest
# Run specific test category
pytest tests/unit/
pytest tests/integration/
pytest tests/e2e/
# Run with coverage
pytest --cov=. --cov-report=html
# Run performance tests
pytest tests/performance/ --benchmark-only
# Run with specific markers
pytest -m "gpu" --gpu-required
pytest -m "slow" --timeout=300# pytest.ini
[tool:pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
markers =
    unit: Unit tests
    integration: Integration tests
    e2e: End-to-end tests
    gpu: Tests requiring GPU
    slow: Slow running tests
    security: Security tests
addopts = 
    --strict-markers
    --verbose
    --tb=short
    --cov-report=term-missing- ๐ User Guide: Getting started, tutorials, and examples
 - ๐ง API Reference: Comprehensive API documentation
 - ๐๏ธ Developer Guide: Contributing, architecture, and development setup
 - ๐ Model Documentation: Model architectures, performance metrics, and usage
 - ๐ Security Guide: Security considerations and best practices
 
# Install documentation dependencies
poetry install --extras "docs"
# Build documentation
cd docs
make html
# Serve documentation locally
make serve
# Build PDF documentation
make latexpdf- Documentation Site: https://cataract-lmm.readthedocs.io
 - API Reference: https://cataract-lmm.readthedocs.io/api/
 - Tutorials: https://cataract-lmm.readthedocs.io/tutorials/
 - Model Zoo: https://cataract-lmm.readthedocs.io/models/
 
We welcome contributions from the surgical AI community! Please see our CONTRIBUTING.md for detailed guidelines.
- Fork the repository
 - Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
 
# Setup development environment
make dev-setup
# Run pre-commit checks
pre-commit run --all-files
# Run tests before committing
make test-all
# Submit pull request
gh pr create --title "Feature: Add amazing feature"- Python Style: Black formatter
 - Import Sorting: isort
 - Linting: Flake8 with medical AI conventions
 - Type Checking: MyPy for type safety
 - Documentation: Google style docstrings
 
This project framework and code are licensed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0). See the LICENSE file for details.
The dataset has specific ownership and licensing requirements. See DATA_LICENSE.md for detailed information about:
- Data ownership by Farabi Eye Hospital and Noor Eye Hospital
 - Annotation ownership by participating institutions
 - Attribution requirements under CC-BY 4.0
 - Proper usage guidelines
 
If you use this benchmark dataset or framework in your research, please cite our work. The benchmark has been submitted to Scientific Data (Nature Portfolio).
@misc{ahmadi2025cataractlmmlargescalemultitask,
      title={Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis}, 
      author={Mohammad Javad Ahmadi and Iman Gandomi and Parisa Abdi and Seyed-Farzad Mohammadi and Amirhossein Taslimi and Mehdi Khodaparast and Hassan Hashemi and Mahdi Tavakoli and Hamid D. Taghirad},
      year={2025},
      eprint={2510.16371},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.16371},
      doi={10.48550/arXiv.2510.16371}
}Ahmadi, M. J., Gandomi, I., Abdi, P., Mohammadi, S.-F., Taslimi, A., Khodaparast, M., Hashemi, H., Tavakoli, M., & Taghirad, H. D. (2025). Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis. arXiv. https://doi.org/10.48550/arXiv.2510.16371
M. J. Ahmadi et al., "Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis," 2025, arXiv:2510.16371. [Online]. Available: https://arxiv.org/abs/2510.16371
Ahmadi, Mohammad Javad, Iman Gandomi, Parisa Abdi, Seyed-Farzad Mohammadi, Amirhossein Taslimi, Mehdi Khodaparast, Hassan Hashemi, Mahdi Tavakoli, and Hamid D. Taghirad. 2025. "Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis." arXiv. https://doi.org/10.48550/arXiv.2510.16371.
@software{cataract_lmm_repo_2025,
  title={{Cataract-LMM}: Large-Scale, Multi-Source, Multi-Task Benchmark and Framework for Surgical Video Analysis},
  author={Ahmadi, Mohammad Javad and Gandomi, Iman and Abdi, Parisa and Mohammadi, Seyed-Farzad and Taslimi, Amirhossein and Khodaparast, Mehdi and Hashemi, Hassan and Tavakoli, Mahdi and Taghirad, Hamid D.},
  year={2025},
  url={https://github.com/MJAHMADEE/Cataract-LMM},
  version={1.0.0}
}Mohammad Javad Ahmadi
- ๐ Documentation: Refer to individual README files in each module
 - ๐ Issues: GitHub Issues
 - ๐ฌ Discussions: GitHub Discussions
 - ๐ง Email: mjahmadee@gmail.com
 
- โ Multi-task surgical video analysis framework
 - โ Instance segmentation with YOLO/Mask R-CNN/SAM
 - โ Phase recognition with Video Transformers
 - โ Skill assessment framework
 - โ Production-ready CI/CD pipeline
 
- ๐ Real-time inference optimization
 - ๐ Multi-GPU distributed training
 - ๐ Model quantization and pruning
 - ๐ REST API and web interface
 - ๐ Advanced analytics dashboard
 
- ๐ฎ Multi-modal learning (video + audio + sensor data)
 - ๐ฎ Federated learning across institutions
 - ๐ฎ Real-time surgical guidance system
 - ๐ฎ Integration with surgical robots
 - ๐ฎ Multi-language support