Skip to content

kingabzpro/A-to-Z-MLOps

Repository files navigation

πŸ€– News Classification MLOps

Python 3.10+ License: MIT MLOps Docker Kubernetes

πŸš€ Production-Ready MLOps Pipeline for News Classification An end-to-end MLOps pipeline demonstrating best practices in machine learning operations, from data ingestion to model deployment and monitoring.

πŸ“‹ Table of Contents

🎯 Overview

This project implements a complete MLOps pipeline for news classification using BBC articles from Kaggle. It showcases industry-standard practices for:

  • Data Pipeline: Automated data ingestion, preprocessing, and validation
  • Model Training: Experiment tracking, hyperparameter tuning, and model versioning
  • Model Serving: RESTful API with authentication and monitoring
  • Orchestration: Workflow management with dependency resolution
  • Monitoring: Real-time metrics, performance tracking, and alerting
  • Testing: Unit tests, integration tests, and load testing
  • Deployment: Container-based deployment with Docker and Kubernetes support

✨ Features

πŸ”„ Complete MLOps Pipeline

  • Automated data processing and feature engineering
  • Model training with hyperparameter optimization
  • Experiment tracking and model registry
  • CI/CD pipeline with automated testing

πŸš€ Production-Ready API

  • FastAPI-based REST service with async support
  • API key authentication and rate limiting
  • Model versioning and A/B testing support
  • Comprehensive API documentation with Swagger/OpenAPI

πŸ“Š Monitoring & Observability

  • Real-time metrics collection with Prometheus
  • Interactive dashboards with Grafana
  • Model performance monitoring and drift detection
  • System health and resource utilization tracking

🎯 Workflow Orchestration

  • Prefect-based workflow management
  • Dependency resolution and error handling
  • Scheduled and event-driven execution
  • Flow visualization and debugging

πŸ§ͺ Comprehensive Testing

  • Unit tests for all core components
  • Integration tests for API endpoints
  • Load testing with Locust
  • Model validation and performance testing

🐳 Container-Based Deployment

  • Multi-stage Docker builds for optimization
  • Docker Compose for local development
  • Kubernetes manifests for production deployment
  • Environment-specific configurations

πŸ” Security Best Practices

  • Secure secret management
  • API authentication and authorization
  • Network policies and access control
  • Container security scanning

πŸ—οΈ Architecture

graph TB
    subgraph "Data Layer"
        A[Kaggle Dataset] --> B[Data Processing]
        B --> C[Feature Engineering]
        C --> D[Data Validation]
    end

    subgraph "Model Layer"
        E[Model Training] --> F[Hyperparameter Tuning]
        F --> G[Model Evaluation]
        G --> H[Model Registry]
    end

    subgraph "API Layer"
        I[FastAPI Service] --> J[Model Inference]
        I --> K[Authentication]
        I --> L[Rate Limiting]
    end

    subgraph "Monitoring Layer"
        M[Prometheus] --> N[Metrics Collection]
        N --> O[Grafana Dashboard]
        P[MLflow Tracking] --> Q[Experiment Tracking]
    end

    subgraph "Orchestration Layer"
        R[Prefect] --> S[Workflow Management]
        S --> T[Scheduled Jobs]
        S --> U[Error Handling]
    end

    D --> E
    H --> I
    I --> M
    I --> P
    E --> P
    R --> E
    R --> B

    style A fill:#e1f5fe
    style I fill:#f3e5f5
    style R fill:#e8f5e8
    style M fill:#fff3e0
Loading

πŸ“‹ Prerequisites

Required Software

  • Docker: 20.10+ and Docker Compose: 2.0+
  • Python: 3.10+ (for local development)
  • Git: For version control

Optional (for Kubernetes deployment)

  • Kubernetes: 1.20+ cluster
  • kubectl: Configured to access your cluster
  • Ingress Controller: For external access (nginx/traefik)

Required Accounts

  • Kaggle: For dataset access
  • GitHub: For CI/CD (if using GitHub Actions)
  • Container Registry: Docker Hub, GHCR, or similar

πŸš€ Quick Start

Docker Compose (Recommended) 🐳

The fastest way to get started is using Docker Compose, which spins up all services with pre-configured monitoring and orchestration.

1. Clone the Repository

git clone https://github.com/kingabzpro/A-to-Z-MLOps.git
cd A-to-Z-MLOps

2. Environment Setup

# Copy the environment template
cp .env.example .env

# Edit the environment file with your settings
nano .env

Required environment variables:

# API Configuration
API_KEY=your_secure_api_key_here
CACHE_TTL=3600

# Kaggle Configuration
KAGGLE_USERNAME=your_kaggle_username
KAGGLE_API_KEY=your_kaggle_api_key

# Model Configuration
MODEL_NAME=news_classifier_logistic
MODEL_VERSION=1

# Optional: Custom Ports
PROMETHEUS_PORT=9090
GRAFANA_PORT=3000
MLFLOW_PORT=5000
PREFECT_PORT=4200
API_PORT=7860
LOCUST_PORT=8089

⚠️ Security Note: Never commit your .env file to version control. Add it to .gitignore.

3. Launch Services

# Start all services in detached mode
docker-compose up -d

# Verify all services are running
docker-compose ps

# View logs for all services
docker-compose logs -f

4. Access Services

Once running, access the services at:

5. Run the Pipeline

# Trigger the MLOps pipeline via API
curl -X POST "http://localhost:7860/run-pipeline" \
  -H "X-API-Key: your_api_key" \
  -H "Content-Type: application/json"

# Or trigger via Prefect UI
# Navigate to http://localhost:4200 and run the "mlops_pipeline" flow

6. Stop Services

# Stop and remove all containers
docker-compose down

# Stop and remove volumes (data will be lost)
docker-compose down -v

☸️ Kubernetes (Experimental) ⚠️

Note: Kubernetes deployment is currently experimental and intended for advanced users. Production deployment requires additional configuration for persistence, security, and networking.

Prerequisites for K8s

  • Working Kubernetes cluster (local: minikube, kind; cloud: EKS, GKE, AKS)
  • kubectl configured and cluster access
  • Container registry access for image pushing

Quick K8s Deployment

  1. Create Namespace and Secrets
# Create the namespace
kubectl apply -f k8s/00-namespace.yml

# Create secrets from environment file
cat << EOF > .kube-secrets
API_KEY=your_api_key
KAGGLE_USERNAME=your_kaggle_username
KAGGLE_API_KEY=your_kaggle_api_key
GF_SECURITY_ADMIN_PASSWORD=your_grafana_password
EOF

kubectl create secret generic mlops-secrets --from-env-file=.kube-secrets -n mlops
  1. Deploy Infrastructure
# Deploy core services
kubectl apply -f k8s/02-configmap.yml
kubectl apply -f k8s/03-prometheus-deployment.yml
kubectl apply -f k8s/04-grafana-deployment.yml
kubectl apply -f k8s/05-mlflow-deployment.yml
kubectl apply -f k8s/06-prefect-deployment.yml

# Deploy application
kubectl apply -f k8s/07-api-deployment.yml

# Optional: Load testing
kubectl apply -f k8s/08-locust-deployment.yml
  1. Access Services
# Use port forwarding for local access
kubectl port-forward svc/api 7860:7860 -n mlops &
kubectl port-forward svc/mlflow 5000:5000 -n mlops &
kubectl port-forward svc/grafana 3000:3000 -n mlops &
kubectl port-forward svc/prometheus 9090:9090 -n mlops &
kubectl port-forward svc/prefect 4200:4200 -n mlops &

For detailed Kubernetes instructions, see k8s/DEPLOYMENT-GUIDE.md.

πŸ“ Project Structure

A-to-Z-MLOps/
β”œβ”€β”€ πŸ“‚ src/                          # Source code
β”‚   β”œβ”€β”€ πŸ“‚ data/                     # Data processing modules
β”‚   β”‚   β”œβ”€β”€ download.py              # Kaggle data download
β”‚   β”‚   β”œβ”€β”€ preprocessing.py         # Data cleaning and preprocessing
β”‚   β”‚   └── validation.py            # Data quality checks
β”‚   β”œβ”€β”€ πŸ“‚ models/                   # Model training and evaluation
β”‚   β”‚   β”œβ”€β”€ train.py                 # Model training with MLflow tracking
β”‚   β”‚   β”œβ”€β”€ evaluate.py              # Model evaluation metrics
β”‚   β”‚   └── predict.py               # Model inference utilities
β”‚   β”œβ”€β”€ πŸ“‚ api/                      # FastAPI application
β”‚   β”‚   β”œβ”€β”€ main.py                  # FastAPI app and endpoints
β”‚   β”‚   β”œβ”€β”€ auth.py                  # Authentication middleware
β”‚   β”‚   β”œβ”€β”€ middleware.py            # Custom middleware
β”‚   β”‚   └── monitoring.py            # Prometheus metrics
β”‚   └── πŸ“‚ pipelines/                # Prefect workflows
β”‚       β”œβ”€β”€ pipeline.py              # Main MLOps pipeline
β”‚       β”œβ”€β”€ flows.py                 # Individual workflow components
β”‚       └── tasks.py                 # Workflow task definitions
β”œβ”€β”€ πŸ“‚ tests/                        # Test suite
β”‚   β”œβ”€β”€ πŸ“‚ unit/                     # Unit tests
β”‚   β”œβ”€β”€ πŸ“‚ integration/              # Integration tests
β”‚   β”œβ”€β”€ πŸ“‚ stress/                   # Load testing scripts
β”‚   └── conftest.py                  # Pytest configuration
β”œβ”€β”€ πŸ“‚ configs/                      # Configuration files
β”‚   β”œβ”€β”€ mlflow_config.yaml           # MLflow settings
β”‚   β”œβ”€β”€ model_params.yaml            # Model hyperparameters
β”‚   β”œβ”€β”€ grafana/                     # Grafana dashboards and datasources
β”‚   └── prometheus.yml               # Prometheus configuration
β”œβ”€β”€ πŸ“‚ k8s/                          # Kubernetes manifests (Experimental)
β”‚   β”œβ”€β”€ 00-namespace.yml             # Namespace definition
β”‚   β”œβ”€β”€ 02-configmap.yml             # Configuration maps
β”‚   β”œβ”€β”€ 03-prometheus-deployment.yml # Prometheus deployment
β”‚   β”œβ”€β”€ 04-grafana-deployment.yml    # Grafana deployment
β”‚   β”œβ”€β”€ 05-mlflow-deployment.yml     # MLflow deployment
β”‚   β”œβ”€β”€ 06-prefect-deployment.yml    # Prefect deployment
β”‚   β”œβ”€β”€ 07-api-deployment.yml        # API deployment
β”‚   β”œβ”€β”€ 08-locust-deployment.yml     # Locust deployment
β”‚   └── DEPLOYMENT-GUIDE.md          # Detailed K8s guide
β”œβ”€β”€ πŸ“‚ workflows/                    # CI/CD workflows
β”‚   └── ci-cd.yml                    # GitHub Actions workflow
β”œβ”€β”€ πŸ“‚ notebooks/                    # Jupyter notebooks
β”‚   β”œβ”€β”€ 01-exploratory-data-analysis.ipynb
β”‚   β”œβ”€β”€ 02-model-experimentation.ipynb
β”‚   └── 03-performance-evaluation.ipynb
β”œβ”€β”€ πŸ“‚ data/                         # Data directory
β”‚   β”œβ”€β”€ raw/                         # Raw downloaded data
β”‚   β”œβ”€β”€ processed/                   # Processed training data
β”‚   └── validation/                  # Validation datasets
β”œβ”€β”€ πŸ“‚ models/                       # Trained models storage
β”œβ”€β”€ πŸ“‚ images/                       # Documentation images
β”œβ”€β”€ πŸ“„ Dockerfile                    # Multi-stage Docker build
β”œβ”€β”€ πŸ“„ docker-compose.yml            # Local development setup
β”œβ”€β”€ πŸ“„ pyproject.toml                # Python project configuration
β”œβ”€β”€ πŸ“„ .env.example                  # Environment variables template
└── πŸ“„ README.md                     # This file

πŸ”§ Configuration

Environment Variables

The application uses environment variables for configuration. Key variables include:

Variable Description Default Required
API_KEY API authentication key None βœ…
KAGGLE_USERNAME Kaggle username for data download None βœ…
KAGGLE_API_KEY Kaggle API key None βœ…
MODEL_NAME MLflow model name news_classifier_logistic ❌
MODEL_VERSION Model version to deploy 1 ❌
CACHE_TTL Cache time-to-live (seconds) 3600 ❌
MLFLOW_TRACKING_URI MLflow server URI http://mlflow:5000 ❌
PREFECT_API_URL Prefect server URL http://prefect:4200/api ❌

Model Configuration

Model hyperparameters are configured in configs/model_params.yaml:

logistic:
  classifier__C: [0.1, 1.0, 10.0]
  classifier__penalty: ['l2']
  tfidf__ngram_range: [[1, 1], [1, 2]]
  tfidf__max_features: [5000, 10000]

svm:
  classifier__C: [0.1, 1.0, 10.0]
  classifier__penalty: ['l2']
  tfidf__ngram_range: [[1, 1], [1, 2]]

rf:
  classifier__n_estimators: [50, 100, 200]
  classifier__max_depth: [10, 20, None]
  tfidf__max_features: [5000, 10000]

πŸ” Monitoring & Observability

πŸ“Š Grafana Dashboards

The project includes pre-configured Grafana dashboards for:

  • System Overview: CPU, memory, and disk usage
  • API Performance: Request rates, response times, error rates
  • Model Metrics: Accuracy, precision, recall, F1 scores
  • MLflow Tracking: Experiment metrics and model performance

πŸ“ˆ Prometheus Metrics

Key metrics collected:

# API Metrics
api_request_total{endpoint, method, status}
api_request_duration_seconds{endpoint, method}
api_active_connections

# Model Metrics
model_prediction_total{model_name, version}
model_prediction_accuracy{model_name, version}
model_inference_duration_seconds{model_name}

# System Metrics
container_cpu_usage_seconds_total
container_memory_usage_bytes
container_network_receive_bytes_total

πŸ§ͺ MLflow Experiment Tracking

MLflow tracks:

  • Parameters: Hyperparameters and model settings
  • Metrics: Accuracy, precision, recall, F1, log loss, ROC AUC
  • Artifacts: Trained models, confusion matrices, ROC curves
  • Model Registry: Model versions, stages, and metadata

πŸ§ͺ Testing

Run All Tests

# Run unit and integration tests
pytest tests/ -v

# Run tests with coverage
pytest tests/ --cov=src --cov-report=html

# Run load tests
locust -f tests/stress_test.py --host=http://localhost:7860

Test Categories

  1. Unit Tests (tests/unit/):

    • Model training and evaluation
    • Data processing functions
    • API endpoint logic
  2. Integration Tests (tests/integration/):

    • API endpoint testing
    • Database operations
    • External service integration
  3. Load Tests (tests/stress_test.py):

    • API performance under load
    • Concurrent request handling
    • Resource utilization monitoring

☁️ Cloud Deployment

AWS EKS Deployment

# Create EKS cluster
eksctl create cluster --name mlops-cluster --region us-west-2

# Store container registry credentials
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin <account-id>.dkr.ecr.us-west-2.amazonaws.com

# Deploy with AWS-specific configurations
kubectl apply -f k8s/aws-storage-class.yml
kubectl apply -f k8s/

Google GKE Deployment

# Create GKE cluster
gcloud container clusters create mlops-cluster --num-nodes=3

# Configure container registry access
gcloud auth configure-docker

# Deploy services
kubectl apply -f k8s/

Azure AKS Deployment

# Create AKS cluster
az aks create --resource-group mlops-rg --name mlops-cluster --node-count 3

# Get cluster credentials
az aks get-credentials --resource-group mlops-rg --name mlops-cluster

# Deploy services
kubectl apply -f k8s/

πŸ› οΈ Development

Local Development Setup

  1. Install Dependencies
# Install uv for fast package management
pip install uv

# Install project dependencies
uv pip install -r requirements.txt

# Install development dependencies
uv pip install -r requirements-dev.txt
  1. Pre-commit Hooks
# Install pre-commit hooks
pre-commit install

# Run hooks manually
pre-commit run --all-files
  1. Code Quality
# Format code
black src/ tests/

# Lint code
flake8 src/ tests/

# Type checking
mypy src/

Adding New Models

  1. Update Model Configuration (configs/model_params.yaml)
  2. Modify Training Script (src/models/train.py)
  3. Add Tests (tests/unit/test_models.py)
  4. Update API Documentation (src/api/main.py)

Extending API Endpoints

  1. Add New Endpoint (src/api/main.py)
  2. Add Authentication (if required)
  3. Add Monitoring Metrics
  4. Write Integration Tests (tests/integration/)

πŸ”§ Troubleshooting

🚨 Common Issues

Docker Issues

# Check Docker daemon
docker info

# Check container logs
docker-compose logs <service-name>

# Restart services
docker-compose restart

# Clean up unused resources
docker system prune -a

API Issues

# Test API connectivity
curl -X GET "http://localhost:7860/info" \
  -H "X-API-Key: your_api_key"

# Check API logs
docker-compose logs api

# Test model prediction
curl -X POST "http://localhost:7860/predict" \
  -H "X-API-Key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"text": "Test news article"}'

MLflow Issues

# Check MLflow server status
curl http://localhost:5000/health

# Verify experiment exists
curl -X GET "http://localhost:5000/api/2.0/mlflow/experiments/get-by-name?experiment_name=news-classifier"

# Check MLflow logs
docker-compose logs mlflow

Kubernetes Issues

# Check pod status
kubectl get pods -n mlops

# Describe pod issues
kubectl describe pod <pod-name> -n mlops

# View pod logs
kubectl logs <pod-name> -n mlops -f

# Check service connectivity
kubectl run -it --rm debug --image=busybox --restart=Never -- \
  sh -c "nslookup mlflow.mlops"

πŸ› Debug Mode

Enable debug logging by setting the environment variable:

LOG_LEVEL=DEBUG

Or temporarily enable for Docker Compose:

docker-compose run --rm -e LOG_LEVEL=DEBUG api python -m src.api.main

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Add tests for new functionality
  5. Run the test suite (pytest)
  6. Commit your changes (git commit -m 'Add amazing feature')
  7. Push to the branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • BBC News Dataset: Provided via Kaggle for research purposes
  • Open Source Community: Thanks to all contributors of the libraries used
  • MLOps Community: For best practices and patterns inspiration

πŸ“ž Support

If you encounter any issues or have questions:

  1. Check the troubleshooting section above
  2. Search existing issues on GitHub
  3. Create a new issue with detailed information
  4. Join our discussions for community support

🌟 Star this repository if you find it helpful for your MLOps journey!