π Production-Ready MLOps Pipeline for News Classification An end-to-end MLOps pipeline demonstrating best practices in machine learning operations, from data ingestion to model deployment and monitoring.
- π― Overview
- β¨ Features
- ποΈ Architecture
- π Prerequisites
- π Quick Start
- π Project Structure
- π§ Configuration
- π Monitoring & Observability
- π§ͺ Testing
- βοΈ Cloud Deployment
- π οΈ Development
- π§ Troubleshooting
- π€ Contributing
- π License
This project implements a complete MLOps pipeline for news classification using BBC articles from Kaggle. It showcases industry-standard practices for:
- Data Pipeline: Automated data ingestion, preprocessing, and validation
- Model Training: Experiment tracking, hyperparameter tuning, and model versioning
- Model Serving: RESTful API with authentication and monitoring
- Orchestration: Workflow management with dependency resolution
- Monitoring: Real-time metrics, performance tracking, and alerting
- Testing: Unit tests, integration tests, and load testing
- Deployment: Container-based deployment with Docker and Kubernetes support
- Automated data processing and feature engineering
- Model training with hyperparameter optimization
- Experiment tracking and model registry
- CI/CD pipeline with automated testing
- FastAPI-based REST service with async support
- API key authentication and rate limiting
- Model versioning and A/B testing support
- Comprehensive API documentation with Swagger/OpenAPI
- Real-time metrics collection with Prometheus
- Interactive dashboards with Grafana
- Model performance monitoring and drift detection
- System health and resource utilization tracking
- Prefect-based workflow management
- Dependency resolution and error handling
- Scheduled and event-driven execution
- Flow visualization and debugging
- Unit tests for all core components
- Integration tests for API endpoints
- Load testing with Locust
- Model validation and performance testing
- Multi-stage Docker builds for optimization
- Docker Compose for local development
- Kubernetes manifests for production deployment
- Environment-specific configurations
- Secure secret management
- API authentication and authorization
- Network policies and access control
- Container security scanning
graph TB
subgraph "Data Layer"
A[Kaggle Dataset] --> B[Data Processing]
B --> C[Feature Engineering]
C --> D[Data Validation]
end
subgraph "Model Layer"
E[Model Training] --> F[Hyperparameter Tuning]
F --> G[Model Evaluation]
G --> H[Model Registry]
end
subgraph "API Layer"
I[FastAPI Service] --> J[Model Inference]
I --> K[Authentication]
I --> L[Rate Limiting]
end
subgraph "Monitoring Layer"
M[Prometheus] --> N[Metrics Collection]
N --> O[Grafana Dashboard]
P[MLflow Tracking] --> Q[Experiment Tracking]
end
subgraph "Orchestration Layer"
R[Prefect] --> S[Workflow Management]
S --> T[Scheduled Jobs]
S --> U[Error Handling]
end
D --> E
H --> I
I --> M
I --> P
E --> P
R --> E
R --> B
style A fill:#e1f5fe
style I fill:#f3e5f5
style R fill:#e8f5e8
style M fill:#fff3e0
- Docker: 20.10+ and Docker Compose: 2.0+
- Python: 3.10+ (for local development)
- Git: For version control
- Kubernetes: 1.20+ cluster
- kubectl: Configured to access your cluster
- Ingress Controller: For external access (nginx/traefik)
- Kaggle: For dataset access
- GitHub: For CI/CD (if using GitHub Actions)
- Container Registry: Docker Hub, GHCR, or similar
The fastest way to get started is using Docker Compose, which spins up all services with pre-configured monitoring and orchestration.
git clone https://github.com/kingabzpro/A-to-Z-MLOps.git
cd A-to-Z-MLOps# Copy the environment template
cp .env.example .env
# Edit the environment file with your settings
nano .envRequired environment variables:
# API Configuration
API_KEY=your_secure_api_key_here
CACHE_TTL=3600
# Kaggle Configuration
KAGGLE_USERNAME=your_kaggle_username
KAGGLE_API_KEY=your_kaggle_api_key
# Model Configuration
MODEL_NAME=news_classifier_logistic
MODEL_VERSION=1
# Optional: Custom Ports
PROMETHEUS_PORT=9090
GRAFANA_PORT=3000
MLFLOW_PORT=5000
PREFECT_PORT=4200
API_PORT=7860
LOCUST_PORT=8089
β οΈ Security Note: Never commit your.envfile to version control. Add it to.gitignore.
# Start all services in detached mode
docker-compose up -d
# Verify all services are running
docker-compose ps
# View logs for all services
docker-compose logs -fOnce running, access the services at:
- π FastAPI App: http://localhost:7860 - API Documentation available
- π MLflow: http://localhost:5000 - Model tracking and registry
- π Grafana: http://localhost:3000 - Monitoring dashboards (admin/admin)
- π Prometheus: http://localhost:9090 - Metrics collection
- βοΈ Prefect: http://localhost:4200 - Workflow orchestration
- π¦ Locust: http://localhost:8089 - Load testing interface
# Trigger the MLOps pipeline via API
curl -X POST "http://localhost:7860/run-pipeline" \
-H "X-API-Key: your_api_key" \
-H "Content-Type: application/json"
# Or trigger via Prefect UI
# Navigate to http://localhost:4200 and run the "mlops_pipeline" flow# Stop and remove all containers
docker-compose down
# Stop and remove volumes (data will be lost)
docker-compose down -vNote: Kubernetes deployment is currently experimental and intended for advanced users. Production deployment requires additional configuration for persistence, security, and networking.
- Working Kubernetes cluster (local: minikube, kind; cloud: EKS, GKE, AKS)
kubectlconfigured and cluster access- Container registry access for image pushing
- Create Namespace and Secrets
# Create the namespace
kubectl apply -f k8s/00-namespace.yml
# Create secrets from environment file
cat << EOF > .kube-secrets
API_KEY=your_api_key
KAGGLE_USERNAME=your_kaggle_username
KAGGLE_API_KEY=your_kaggle_api_key
GF_SECURITY_ADMIN_PASSWORD=your_grafana_password
EOF
kubectl create secret generic mlops-secrets --from-env-file=.kube-secrets -n mlops- Deploy Infrastructure
# Deploy core services
kubectl apply -f k8s/02-configmap.yml
kubectl apply -f k8s/03-prometheus-deployment.yml
kubectl apply -f k8s/04-grafana-deployment.yml
kubectl apply -f k8s/05-mlflow-deployment.yml
kubectl apply -f k8s/06-prefect-deployment.yml
# Deploy application
kubectl apply -f k8s/07-api-deployment.yml
# Optional: Load testing
kubectl apply -f k8s/08-locust-deployment.yml- Access Services
# Use port forwarding for local access
kubectl port-forward svc/api 7860:7860 -n mlops &
kubectl port-forward svc/mlflow 5000:5000 -n mlops &
kubectl port-forward svc/grafana 3000:3000 -n mlops &
kubectl port-forward svc/prometheus 9090:9090 -n mlops &
kubectl port-forward svc/prefect 4200:4200 -n mlops &For detailed Kubernetes instructions, see k8s/DEPLOYMENT-GUIDE.md.
A-to-Z-MLOps/
βββ π src/ # Source code
β βββ π data/ # Data processing modules
β β βββ download.py # Kaggle data download
β β βββ preprocessing.py # Data cleaning and preprocessing
β β βββ validation.py # Data quality checks
β βββ π models/ # Model training and evaluation
β β βββ train.py # Model training with MLflow tracking
β β βββ evaluate.py # Model evaluation metrics
β β βββ predict.py # Model inference utilities
β βββ π api/ # FastAPI application
β β βββ main.py # FastAPI app and endpoints
β β βββ auth.py # Authentication middleware
β β βββ middleware.py # Custom middleware
β β βββ monitoring.py # Prometheus metrics
β βββ π pipelines/ # Prefect workflows
β βββ pipeline.py # Main MLOps pipeline
β βββ flows.py # Individual workflow components
β βββ tasks.py # Workflow task definitions
βββ π tests/ # Test suite
β βββ π unit/ # Unit tests
β βββ π integration/ # Integration tests
β βββ π stress/ # Load testing scripts
β βββ conftest.py # Pytest configuration
βββ π configs/ # Configuration files
β βββ mlflow_config.yaml # MLflow settings
β βββ model_params.yaml # Model hyperparameters
β βββ grafana/ # Grafana dashboards and datasources
β βββ prometheus.yml # Prometheus configuration
βββ π k8s/ # Kubernetes manifests (Experimental)
β βββ 00-namespace.yml # Namespace definition
β βββ 02-configmap.yml # Configuration maps
β βββ 03-prometheus-deployment.yml # Prometheus deployment
β βββ 04-grafana-deployment.yml # Grafana deployment
β βββ 05-mlflow-deployment.yml # MLflow deployment
β βββ 06-prefect-deployment.yml # Prefect deployment
β βββ 07-api-deployment.yml # API deployment
β βββ 08-locust-deployment.yml # Locust deployment
β βββ DEPLOYMENT-GUIDE.md # Detailed K8s guide
βββ π workflows/ # CI/CD workflows
β βββ ci-cd.yml # GitHub Actions workflow
βββ π notebooks/ # Jupyter notebooks
β βββ 01-exploratory-data-analysis.ipynb
β βββ 02-model-experimentation.ipynb
β βββ 03-performance-evaluation.ipynb
βββ π data/ # Data directory
β βββ raw/ # Raw downloaded data
β βββ processed/ # Processed training data
β βββ validation/ # Validation datasets
βββ π models/ # Trained models storage
βββ π images/ # Documentation images
βββ π Dockerfile # Multi-stage Docker build
βββ π docker-compose.yml # Local development setup
βββ π pyproject.toml # Python project configuration
βββ π .env.example # Environment variables template
βββ π README.md # This file
The application uses environment variables for configuration. Key variables include:
| Variable | Description | Default | Required |
|---|---|---|---|
API_KEY |
API authentication key | None | β |
KAGGLE_USERNAME |
Kaggle username for data download | None | β |
KAGGLE_API_KEY |
Kaggle API key | None | β |
MODEL_NAME |
MLflow model name | news_classifier_logistic |
β |
MODEL_VERSION |
Model version to deploy | 1 |
β |
CACHE_TTL |
Cache time-to-live (seconds) | 3600 |
β |
MLFLOW_TRACKING_URI |
MLflow server URI | http://mlflow:5000 |
β |
PREFECT_API_URL |
Prefect server URL | http://prefect:4200/api |
β |
Model hyperparameters are configured in configs/model_params.yaml:
logistic:
classifier__C: [0.1, 1.0, 10.0]
classifier__penalty: ['l2']
tfidf__ngram_range: [[1, 1], [1, 2]]
tfidf__max_features: [5000, 10000]
svm:
classifier__C: [0.1, 1.0, 10.0]
classifier__penalty: ['l2']
tfidf__ngram_range: [[1, 1], [1, 2]]
rf:
classifier__n_estimators: [50, 100, 200]
classifier__max_depth: [10, 20, None]
tfidf__max_features: [5000, 10000]The project includes pre-configured Grafana dashboards for:
- System Overview: CPU, memory, and disk usage
- API Performance: Request rates, response times, error rates
- Model Metrics: Accuracy, precision, recall, F1 scores
- MLflow Tracking: Experiment metrics and model performance
Key metrics collected:
# API Metrics
api_request_total{endpoint, method, status}
api_request_duration_seconds{endpoint, method}
api_active_connections
# Model Metrics
model_prediction_total{model_name, version}
model_prediction_accuracy{model_name, version}
model_inference_duration_seconds{model_name}
# System Metrics
container_cpu_usage_seconds_total
container_memory_usage_bytes
container_network_receive_bytes_totalMLflow tracks:
- Parameters: Hyperparameters and model settings
- Metrics: Accuracy, precision, recall, F1, log loss, ROC AUC
- Artifacts: Trained models, confusion matrices, ROC curves
- Model Registry: Model versions, stages, and metadata
# Run unit and integration tests
pytest tests/ -v
# Run tests with coverage
pytest tests/ --cov=src --cov-report=html
# Run load tests
locust -f tests/stress_test.py --host=http://localhost:7860-
Unit Tests (
tests/unit/):- Model training and evaluation
- Data processing functions
- API endpoint logic
-
Integration Tests (
tests/integration/):- API endpoint testing
- Database operations
- External service integration
-
Load Tests (
tests/stress_test.py):- API performance under load
- Concurrent request handling
- Resource utilization monitoring
# Create EKS cluster
eksctl create cluster --name mlops-cluster --region us-west-2
# Store container registry credentials
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin <account-id>.dkr.ecr.us-west-2.amazonaws.com
# Deploy with AWS-specific configurations
kubectl apply -f k8s/aws-storage-class.yml
kubectl apply -f k8s/# Create GKE cluster
gcloud container clusters create mlops-cluster --num-nodes=3
# Configure container registry access
gcloud auth configure-docker
# Deploy services
kubectl apply -f k8s/# Create AKS cluster
az aks create --resource-group mlops-rg --name mlops-cluster --node-count 3
# Get cluster credentials
az aks get-credentials --resource-group mlops-rg --name mlops-cluster
# Deploy services
kubectl apply -f k8s/- Install Dependencies
# Install uv for fast package management
pip install uv
# Install project dependencies
uv pip install -r requirements.txt
# Install development dependencies
uv pip install -r requirements-dev.txt- Pre-commit Hooks
# Install pre-commit hooks
pre-commit install
# Run hooks manually
pre-commit run --all-files- Code Quality
# Format code
black src/ tests/
# Lint code
flake8 src/ tests/
# Type checking
mypy src/- Update Model Configuration (
configs/model_params.yaml) - Modify Training Script (
src/models/train.py) - Add Tests (
tests/unit/test_models.py) - Update API Documentation (
src/api/main.py)
- Add New Endpoint (
src/api/main.py) - Add Authentication (if required)
- Add Monitoring Metrics
- Write Integration Tests (
tests/integration/)
# Check Docker daemon
docker info
# Check container logs
docker-compose logs <service-name>
# Restart services
docker-compose restart
# Clean up unused resources
docker system prune -a# Test API connectivity
curl -X GET "http://localhost:7860/info" \
-H "X-API-Key: your_api_key"
# Check API logs
docker-compose logs api
# Test model prediction
curl -X POST "http://localhost:7860/predict" \
-H "X-API-Key: your_api_key" \
-H "Content-Type: application/json" \
-d '{"text": "Test news article"}'# Check MLflow server status
curl http://localhost:5000/health
# Verify experiment exists
curl -X GET "http://localhost:5000/api/2.0/mlflow/experiments/get-by-name?experiment_name=news-classifier"
# Check MLflow logs
docker-compose logs mlflow# Check pod status
kubectl get pods -n mlops
# Describe pod issues
kubectl describe pod <pod-name> -n mlops
# View pod logs
kubectl logs <pod-name> -n mlops -f
# Check service connectivity
kubectl run -it --rm debug --image=busybox --restart=Never -- \
sh -c "nslookup mlflow.mlops"Enable debug logging by setting the environment variable:
LOG_LEVEL=DEBUGOr temporarily enable for Docker Compose:
docker-compose run --rm -e LOG_LEVEL=DEBUG api python -m src.api.mainWe welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Run the test suite (
pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- BBC News Dataset: Provided via Kaggle for research purposes
- Open Source Community: Thanks to all contributors of the libraries used
- MLOps Community: For best practices and patterns inspiration
If you encounter any issues or have questions:
- Check the troubleshooting section above
- Search existing issues on GitHub
- Create a new issue with detailed information
- Join our discussions for community support
π Star this repository if you find it helpful for your MLOps journey!