Skip to content

Advanced computer vision framework for surgical video analysis, featuring multi-task learning for cataract surgery procedure understanding, instrument segmentation, and skill assessment.

License

Notifications You must be signed in to change notification settings

MJAHMADEE/Cataract-LMM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

๐Ÿฅ Cataract-LMM: Surgical Video Analysis

Python 3.8+ PyTorch Poetry Docker License: CC-BY-4.0 CI/CD Code Quality Medical AI

A comprehensive, production-ready framework for multi-task deep learning in surgical video analysis, featuring instance segmentation, phase recognition, skill assessment, and video processing capabilities.


๐ŸŽฏ Overview

Cataract-LMM is an enterprise-grade AI framework designed for large-scale, multi-center surgical video analysis. Built on modern software engineering principles, this repository provides state-of-the-art deep learning models for comprehensive analysis of cataract surgery videos.

๐Ÿ”ฌ Research Foundation

This framework implements methodologies from cutting-edge research in computer-assisted surgery, providing validated approaches for:

  • Surgical Instance Segmentation using YOLO, Mask R-CNN, and SAM architectures
  • Surgical Phase Recognition with Video Transformers, 3D CNNs, and temporal models
  • Surgical Skill Assessment through multi-modal analysis and performance metrics
  • Video Processing with GPU-accelerated pipelines for medical video data

๐Ÿ† Key Differentiators

  • Production-Ready: Enterprise-grade architecture with comprehensive testing and CI/CD
  • Multi-Task Learning: Unified framework supporting four core surgical analysis tasks
  • Scalable Design: Microservices-ready architecture with containerization support
  • Medical Compliance: HIPAA-aware design patterns and secure data handling
  • Research-to-Production: Seamless transition from research notebooks to production deployment

๐Ÿ“‹ Table of Contents


๐Ÿš€ Quick Start

Prerequisites

  • Python 3.8+
  • CUDA 11.8+ (for GPU acceleration)
  • FFmpeg (for video processing)
  • Docker (optional, for containerized deployment)

Installation

# Clone the repository
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM

# Install using Poetry (recommended)
cd codes
poetry install

# Activate virtual environment
poetry shell

# Or install using pip
pip install -r requirements.txt

# Validate installation
python setup.py --validate-only

Basic Usage

# Video processing
cd surgical-video-processing
python main.py --input path/to/video.mp4 --output ./results --config configs/default.yaml

# Instance segmentation  
cd surgical-instance-segmentation
python inference/predictor.py --model yolo --input data/images/

# Phase recognition
cd surgical-phase-recognition
python validation/training_framework.py --config configs/default.yaml --mode train

# Skill assessment
cd surgical-skill-assessment
python main.py --config configs/comprehensive.yaml --mode evaluate

โœจ Features

๐Ÿง  AI/ML Capabilities

Component Models Key Features
Instance Segmentation YOLO v8/11, Mask R-CNN, SAM Real-time surgical instrument detection and segmentation
Phase Recognition Video Transformers, 3D CNNs, TeCNO 11-phase surgical workflow analysis
Skill Assessment Multi-modal CNNs, Attention Models Objective surgical skill evaluation
Video Processing GPU-Accelerated Pipelines Medical-grade video preprocessing and enhancement

๐Ÿ› ๏ธ Engineering Excellence

  • ๐Ÿ—๏ธ Modular Architecture: Microservices-ready design with clear separation of concerns
  • ๐Ÿ”’ Security First: HIPAA-compliant patterns, secure credential management
  • ๐Ÿ“Š Comprehensive Testing: 85%+ test coverage with unit, integration, and E2E tests
  • ๐Ÿš€ CI/CD Pipeline: Automated testing, security scanning, and deployment workflows
  • ๐Ÿ“ˆ Monitoring & Observability: Structured logging, metrics collection, and health checks
  • ๐Ÿณ Containerization: Multi-stage Docker builds with security hardening

๐Ÿ”ง Developer Experience

  • ๐Ÿ“š Rich Documentation: Comprehensive guides, API references, and examples
  • ๐ŸŽฏ Configuration Management: YAML-based configuration with validation
  • ๐Ÿงช Development Tools: Pre-commit hooks, linting, formatting, and type checking
  • ๐Ÿ“ฆ Dependency Management: Poetry-based modern Python packaging
  • ๐Ÿ”ง Development Environment: VS Code integration with debugging support

๐Ÿ—๏ธ Architecture

System Overview

graph TB
    A[Video Input] --> B[Video Processing Pipeline]
    B --> C[Frame Extraction & Preprocessing]
    C --> D[Multi-Task Analysis Engine]
    
    D --> E[Instance Segmentation]
    D --> F[Phase Recognition] 
    D --> G[Skill Assessment]
    
    E --> H[Surgical Instruments]
    F --> I[Surgery Phases]
    G --> J[Skill Metrics]
    
    H --> K[Clinical Decision Support]
    I --> K
    J --> K
Loading

Project Structure

Cataract_LMM/
โ”œโ”€โ”€ ๐Ÿ  README.md                    # Project overview and documentation
โ”œโ”€โ”€ ๐Ÿ“„ LICENSE                     # CC-BY-4.0 license
โ”œโ”€โ”€ ๐Ÿค CONTRIBUTING.md             # Contribution guidelines
โ”œโ”€โ”€ ๐Ÿ”’ .gitignore                  # Git ignore patterns
โ”œโ”€โ”€ ๐Ÿ“Š codes/                      # Main codebase
โ”‚   โ”œโ”€โ”€ ๐ŸŽฌ surgical-video-processing/          # Video preprocessing and enhancement
โ”‚   โ”‚   โ”œโ”€โ”€ core/                  # Core processing algorithms
โ”‚   โ”‚   โ”œโ”€โ”€ pipelines/             # Processing pipelines
โ”‚   โ”‚   โ”œโ”€โ”€ metadata/              # Video metadata management
โ”‚   โ”‚   โ”œโ”€โ”€ quality_control/       # Quality assurance tools
โ”‚   โ”‚   โ””โ”€โ”€ configs/               # Configuration files
โ”‚   โ”œโ”€โ”€ ๐ŸŽฏ surgical-instance-segmentation/     # Instance segmentation models
โ”‚   โ”‚   โ”œโ”€โ”€ models/                # YOLO, Mask R-CNN, SAM implementations
โ”‚   โ”‚   โ”œโ”€โ”€ training/              # Training pipelines
โ”‚   โ”‚   โ”œโ”€โ”€ inference/             # Real-time inference engines
โ”‚   โ”‚   โ”œโ”€โ”€ evaluation/            # Model evaluation tools
โ”‚   โ”‚   โ””โ”€โ”€ data/                  # Dataset utilities
โ”‚   โ”œโ”€โ”€ ๐Ÿ”„ surgical-phase-recognition/         # Phase classification models
โ”‚   โ”‚   โ”œโ”€โ”€ models/                # Video Transformers, 3D CNNs, TeCNO
โ”‚   โ”‚   โ”œโ”€โ”€ validation/            # Training and validation frameworks
โ”‚   โ”‚   โ”œโ”€โ”€ preprocessing/         # Video preprocessing
โ”‚   โ”‚   โ”œโ”€โ”€ analysis/              # Result analysis tools
โ”‚   โ”‚   โ””โ”€โ”€ configs/               # Model configurations
โ”‚   โ”œโ”€โ”€ ๐Ÿ“Š surgical-skill-assessment/          # Skill evaluation framework
โ”‚   โ”‚   โ”œโ”€โ”€ models/                # Skill assessment models
โ”‚   โ”‚   โ”œโ”€โ”€ engine/                # Training and inference engines
โ”‚   โ”‚   โ”œโ”€โ”€ utils/                 # Analysis utilities
โ”‚   โ”‚   โ””โ”€โ”€ configs/               # Assessment configurations
โ”‚   โ”œโ”€โ”€ ๐Ÿงช tests/                  # Comprehensive test suite
โ”‚   โ”œโ”€โ”€ ๐Ÿ“š docs/                   # Documentation source
โ”‚   โ”œโ”€โ”€ ๐Ÿณ docker/                 # Docker configurations
โ”‚   โ”œโ”€โ”€ ๐Ÿ“Š reports/                # Analysis reports
โ”‚   โ”œโ”€โ”€ โš™๏ธ pyproject.toml         # Python project configuration
โ”‚   โ”œโ”€โ”€ ๐Ÿ”’ Dockerfile             # Container definition
โ”‚   โ”œโ”€โ”€ ๐Ÿš€ Makefile               # Development automation
โ”‚   โ””โ”€โ”€ ๐Ÿ”ง setup.py               # Project setup script
โ”œโ”€โ”€ ๐Ÿค– .github/                   # GitHub configurations
โ”‚   โ””โ”€โ”€ workflows/                 # CI/CD pipelines
โ””โ”€โ”€ ๐Ÿ““ security_scanning_demo.ipynb  # Security analysis notebook

๐Ÿ“ฆ Installation

System Requirements

Component Minimum Recommended
Python 3.8 3.11+
RAM 16GB 32GB+
GPU Memory 8GB 24GB+
Storage 50GB 500GB+
CUDA 11.8 12.0+

Installation Methods

Method 1: Poetry (Recommended)

# Install Poetry
curl -sSL https://install.python-poetry.org | python3 -

# Clone and setup
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM/codes

# Install dependencies
poetry install --extras "dev docs"

# Activate environment
poetry shell

Method 2: Conda Environment

# Create environment
conda create -n cataract-lmm python=3.11
conda activate cataract-lmm

# Clone and install
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM/codes
pip install -r requirements.txt

Method 3: Docker Deployment

# Build container
docker build -t cataract-lmm:latest .

# Run interactive container
docker run -it --gpus all -v $(pwd)/data:/app/data cataract-lmm:latest

Verification

# Run comprehensive validation
python setup.py --validate-only

# Run tests
pytest tests/ -v

# Check GPU availability
python -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}')"

๐ŸŽฏ Usage Examples

Video Processing Pipeline

from surgical_video_processing import VideoProcessor, QualityController

# Initialize processor with configuration
processor = VideoProcessor("configs/high_quality.yaml")

# Process surgical video
result = processor.process_video(
    input_path="data/surgery_video.mp4",
    output_dir="outputs/processed/",
    apply_deidentification=True,
    quality_threshold=0.8
)

print(f"Processed {result.frame_count} frames")
print(f"Quality score: {result.average_quality:.3f}")

Instance Segmentation

from surgical_instance_segmentation import SegmentationPredictor

# Load pre-trained model
predictor = SegmentationPredictor(
    model_type="yolo_v8",
    device="cuda"
)

# Segment surgical instruments
results = predictor.predict_batch(
    image_paths=["frame001.jpg", "frame002.jpg"],
    confidence_threshold=0.7,
    save_visualizations=True
)

# Extract detections
for result in results:
    print(f"Detected {len(result.boxes)} instruments")
    print(f"Classes: {result.class_names}")

Phase Recognition

from surgical_phase_recognition import PhaseClassifier

# Initialize phase recognition model
classifier = PhaseClassifier(
    model_name="video_transformer",
    config_path="configs/phase_recognition.yaml"
)

# Classify surgical phases in video sequence
phases = classifier.classify_sequence(
    video_path="data/surgery_complete.mp4",
    sequence_length=16,
    overlap=0.5
)

# Display phase timeline
for phase in phases:
    print(f"Time: {phase.timestamp:.2f}s - Phase: {phase.name}")

Skill Assessment

from surgical_skill_assessment import SkillEvaluator

# Initialize skill assessment framework
evaluator = SkillEvaluator("configs/skill_assessment.yaml")

# Assess surgical performance
assessment = evaluator.evaluate_surgery(
    video_path="data/complete_surgery.mp4",
    phase_annotations="data/phases.json",
    surgeon_level="resident"  # resident, fellow, attending
)

# Generate skill report
report = evaluator.generate_report(assessment)
print(f"Overall Score: {report.overall_score}/100")
print(f"Efficiency: {report.efficiency_score}/10")
print(f"Precision: {report.precision_score}/10")

๐Ÿ› ๏ธ Development

Development Setup

# Clone repository
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM/codes

# Install development dependencies
poetry install --extras "dev"

# Setup pre-commit hooks
pre-commit install

# Run development server
make dev-server

Code Quality Tools

# Format code
make format

# Run linting
make lint

# Type checking
make type-check

# Security scanning
make security-scan

# Run all quality checks
make quality

Testing

# Run unit tests
make test

# Run with coverage
make test-coverage

# Run integration tests
make test-integration

# Run end-to-end tests
make test-e2e

# Generate coverage report
make coverage-report

Available Make Commands

make help              # Show all available commands
make install           # Install dependencies
make clean             # Clean build artifacts
make build             # Build distribution packages
make docker-build      # Build Docker image
make docker-run        # Run Docker container
make docs-build        # Build documentation
make docs-serve        # Serve documentation locally

๐Ÿ“Š Model Zoo

Instance Segmentation Models (Task 3: 12-class)

Model mAP@0.5:0.95
YOLOv11 โญ 73.9%
YOLOv8 73.8%
SAM 56.0%
SAM2 55.2%
Mask R-CNN 53.7%

Phase Recognition Models (In-Domain - Farabi Test Set)

Model Backbone Accuracy F1-Score Precision Recall
MViT-B โญ - 85.7% 77.1% 77.1% 78.5%
Swin-T - 85.5% 76.2% 77.5% 77.2%
CNN + GRU EfficientNet-B5 82.1% 71.3% 76.0% 70.4%
CNN + TeCNO EfficientNet-B5 81.7% 71.2% 75.1% 71.2%
CNN + LSTM EfficientNet-B5 81.5% 70.0% 76.4% 69.4%

Skill Assessment Models

Model Accuracy Precision Recall F1-Score
TimeSformer โญ 82.5% 86.0% 82.0% 83.9%
R3D-18 81.7% 82.4% 84.9% 83.6%
Slow R50 80.0% 81.8% 81.8% 81.8%
X3D-M 80.0% 83.9% 78.8% 81.3%
R(2+1)D-18 72.9% 79.3% 76.7% 78.0%

๐Ÿ”ง Configuration

Configuration Files

The framework uses YAML-based configuration for all components:

Video Processing (surgical-video-processing/configs/default.yaml)

processing:
  target_resolution: [1920, 1080]
  fps: 30
  quality_threshold: 0.75
  
deidentification:
  enabled: true
  blur_faces: true
  remove_text: true
  
output:
  format: "mp4"
  compression: "h264"
  quality: "high"

Instance Segmentation (surgical-instance-segmentation/configs/yolo_config.yaml)

model:
  architecture: "yolov8"
  size: "medium"
  pretrained: true

training:
  epochs: 100
  batch_size: 16
  learning_rate: 0.001
  
data:
  classes: ["forceps", "scissors", "needle_holder", "suction"]
  augmentation:
    enabled: true
    rotation: 15
    scaling: [0.8, 1.2]

Environment Variables

# Create .env file
cp .env.example .env

# Edit configuration
CUDA_VISIBLE_DEVICES=0,1
WANDB_PROJECT=cataract-lmm
DATA_ROOT=/path/to/data
OUTPUT_DIR=/path/to/outputs
LOG_LEVEL=INFO

๐Ÿงช Testing

Test Structure

tests/
โ”œโ”€โ”€ unit/                   # Unit tests for individual components
โ”œโ”€โ”€ integration/            # Integration tests for module interactions  
โ”œโ”€โ”€ e2e/                   # End-to-end workflow tests
โ”œโ”€โ”€ performance/           # Performance and benchmarking tests
โ”œโ”€โ”€ security/              # Security and vulnerability tests
โ”œโ”€โ”€ fixtures/              # Test data and fixtures
โ””โ”€โ”€ conftest.py           # Pytest configuration

Running Tests

# Run all tests
pytest

# Run specific test category
pytest tests/unit/
pytest tests/integration/
pytest tests/e2e/

# Run with coverage
pytest --cov=. --cov-report=html

# Run performance tests
pytest tests/performance/ --benchmark-only

# Run with specific markers
pytest -m "gpu" --gpu-required
pytest -m "slow" --timeout=300

Test Configuration

# pytest.ini
[tool:pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
markers =
    unit: Unit tests
    integration: Integration tests
    e2e: End-to-end tests
    gpu: Tests requiring GPU
    slow: Slow running tests
    security: Security tests
addopts = 
    --strict-markers
    --verbose
    --tb=short
    --cov-report=term-missing

๐Ÿ“– Documentation

Documentation Structure

  • ๐Ÿ“š User Guide: Getting started, tutorials, and examples
  • ๐Ÿ”ง API Reference: Comprehensive API documentation
  • ๐Ÿ—๏ธ Developer Guide: Contributing, architecture, and development setup
  • ๐Ÿ“Š Model Documentation: Model architectures, performance metrics, and usage
  • ๐Ÿ” Security Guide: Security considerations and best practices

Building Documentation

# Install documentation dependencies
poetry install --extras "docs"

# Build documentation
cd docs
make html

# Serve documentation locally
make serve

# Build PDF documentation
make latexpdf

Online Documentation


๐Ÿค Contributing

We welcome contributions from the surgical AI community! Please see our CONTRIBUTING.md for detailed guidelines.

Quick Contribution Guide

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Workflow

# Setup development environment
make dev-setup

# Run pre-commit checks
pre-commit run --all-files

# Run tests before committing
make test-all

# Submit pull request
gh pr create --title "Feature: Add amazing feature"

Code Standards

  • Python Style: Black formatter
  • Import Sorting: isort
  • Linting: Flake8 with medical AI conventions
  • Type Checking: MyPy for type safety
  • Documentation: Google style docstrings

๐Ÿ“„ License

Framework License

This project framework and code are licensed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0). See the LICENSE file for details.

Data License

The dataset has specific ownership and licensing requirements. See DATA_LICENSE.md for detailed information about:

  • Data ownership by Farabi Eye Hospital and Noor Eye Hospital
  • Annotation ownership by participating institutions
  • Attribution requirements under CC-BY 4.0
  • Proper usage guidelines

๐Ÿ“ฃ Citation

If you use this benchmark dataset or framework in your research, please cite our work. The benchmark has been submitted to Scientific Data (Nature Portfolio).

BibTeX

@misc{ahmadi2025cataractlmmlargescalemultitask,
      title={Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis}, 
      author={Mohammad Javad Ahmadi and Iman Gandomi and Parisa Abdi and Seyed-Farzad Mohammadi and Amirhossein Taslimi and Mehdi Khodaparast and Hassan Hashemi and Mahdi Tavakoli and Hamid D. Taghirad},
      year={2025},
      eprint={2510.16371},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.16371},
      doi={10.48550/arXiv.2510.16371}
}

APA Style

Ahmadi, M. J., Gandomi, I., Abdi, P., Mohammadi, S.-F., Taslimi, A., Khodaparast, M., Hashemi, H., Tavakoli, M., & Taghirad, H. D. (2025). Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis. arXiv. https://doi.org/10.48550/arXiv.2510.16371

IEEE Style

M. J. Ahmadi et al., "Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis," 2025, arXiv:2510.16371. [Online]. Available: https://arxiv.org/abs/2510.16371

Chicago Style

Ahmadi, Mohammad Javad, Iman Gandomi, Parisa Abdi, Seyed-Farzad Mohammadi, Amirhossein Taslimi, Mehdi Khodaparast, Hassan Hashemi, Mahdi Tavakoli, and Hamid D. Taghirad. 2025. "Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis." arXiv. https://doi.org/10.48550/arXiv.2510.16371.

Repository Citation

@software{cataract_lmm_repo_2025,
  title={{Cataract-LMM}: Large-Scale, Multi-Source, Multi-Task Benchmark and Framework for Surgical Video Analysis},
  author={Ahmadi, Mohammad Javad and Gandomi, Iman and Abdi, Parisa and Mohammadi, Seyed-Farzad and Taslimi, Amirhossein and Khodaparast, Mehdi and Hashemi, Hassan and Tavakoli, Mahdi and Taghirad, Hamid D.},
  year={2025},
  url={https://github.com/MJAHMADEE/Cataract-LMM},
  version={1.0.0}
}

๐Ÿ‘จโ€๐Ÿ’ป Author

Mohammad Javad Ahmadi

Resume


๐Ÿ“ž Support & Community

Getting Help


๐Ÿš€ Roadmap

Current Version (v1.0.0)

  • โœ… Multi-task surgical video analysis framework
  • โœ… Instance segmentation with YOLO/Mask R-CNN/SAM
  • โœ… Phase recognition with Video Transformers
  • โœ… Skill assessment framework
  • โœ… Production-ready CI/CD pipeline

Upcoming Features (v1.1.0)

  • ๐Ÿ”„ Real-time inference optimization
  • ๐Ÿ”„ Multi-GPU distributed training
  • ๐Ÿ”„ Model quantization and pruning
  • ๐Ÿ”„ REST API and web interface
  • ๐Ÿ”„ Advanced analytics dashboard

Future Vision (v2.0.0+)

  • ๐Ÿ”ฎ Multi-modal learning (video + audio + sensor data)
  • ๐Ÿ”ฎ Federated learning across institutions
  • ๐Ÿ”ฎ Real-time surgical guidance system
  • ๐Ÿ”ฎ Integration with surgical robots
  • ๐Ÿ”ฎ Multi-language support

๐Ÿฅ Advancing Surgical AI Through Open Science ๐Ÿค–

Built with โค๏ธ by the Surgical AI Research Community
Empowering the next generation of computer-assisted surgery