Backend.AI is a streamlined, container-based computing cluster platform that hosts popular computing/ML frameworks and diverse programming languages, with pluggable heterogeneous accelerator support including CUDA GPU, ROCm GPU, Rebellions, FuriosaAI, HyperAccel, Google TPU, Graphcore IPU and other NPUs.
It allocates and isolates the underlying computing resources for multi-tenant computation sessions on-demand or in batches with customizable job schedulers with its own orchestrator named "Sokovan".
All its functions are exposed as REST and GraphQL APIs.
- Python: 3.13.x (main branch requires CPython 3.13.7)
- Pantsbuild: 2.27.x
- See full version compatibility table
Required:
- Docker 20.10+ (with Compose v2)
- PostgreSQL 16+ (tested with 16.3)
- Redis 7.2+ (tested with 7.2.11)
- etcd 3.5+ (tested with 3.5.14)
- Prometheus 3.x (tested with 3.1.0)
Recommended (for observability):
- Grafana 11.x (tested with 11.4.0)
- Loki 3.x (tested with 3.5.0)
- Tempo 2.x (tested with 2.7.2)
- OpenTelemetry Collector
→ Detailed infrastructure setup: Infrastructure Documentation
- OS: Linux (Debian/RHEL-based) or macOS
- Permissions: sudo access for installation
- Resources: 4+ CPU cores, 8GB+ RAM recommended for development
git clone https://github.com/lablup/backend.ai.git
cd backend.ai
./scripts/install-dev.shThis script will:
- Check required dependencies (Docker, Python, etc.)
- Set up Python virtual environment with Pantsbuild
- Start halfstack infrastructure (PostgreSQL, Redis, etcd, Grafana, etc.)
- Initialize database schemas
- Create default API keypairs and user accounts
Start each component in separate terminals:
Manager (Terminal 1):
./backend.ai mgr start-server --debugAgent (Terminal 2):
./backend.ai ag start-server --debugStorage Proxy (Terminal 3):
./py -m ai.backend.storage.serverWeb Server (Terminal 4):
./py -m ai.backend.web.serverApp Proxy (Terminal 5-6, optional for in-container service access):
./backend.ai app-proxy-coordinator start-server --debug
./backend.ai app-proxy-worker start-server --debugSet up client environment:
source env-local-user-session.shRun a simple Python session:
./backend.ai run python -c "print('Hello Backend.AI!')"Or access Web UI at http://localhost:8090 with credentials from env-local-*.sh files.
Backend.AI provides websocket tunneling into individual computation sessions (containers), so that users can use their browsers and client CLI to access in-container applications directly in a secure way.
- Jupyter: data scientists' favorite tool
- Most container images have intrinsic Jupyter and JupyterLab support.
 
- Web-based terminal
- All container sessions have intrinsic ttyd support.
 
- SSH
- All container sessions have intrinsic SSH/SFTP/SCP support with auto-generated per-user SSH keypair. PyCharm and other IDEs can use on-demand sessions using SSH remote interpreters.
 
- VSCode
- Most container sessions have intrinsic web-based VSCode support.
 
Backend.AI provides an abstraction layer on top of existing network-based storages (e.g., NFS/SMB), called vfolders (virtual folders). Each vfolder works like a cloud storage that can be mounted into any computation sessions and shared between users and user groups with differentiated privileges.
Please consult our documentation for community-supported materials. Contact the sales team (contact@lablup.com) for professional paid support and deployment options.
For comprehensive system architecture, component interactions, and infrastructure details, see:
Component Architecture Documentation
This document covers:
- System architecture diagrams and component flow
- Port numbers and infrastructure setup
- Component dependencies and communication protocols
- Development and production environment configuration
This repository contains all open-source server-side components and the client SDK for Python as a reference implementation of API clients.
- src/ai/backend/: Source codes- manager/: Manager as the cluster control-plane
- manager/api: Manager API handlers
- account_manager/: Unified user profile and SSO management
- agent/: Agent as per-node controller
- agent/docker/: Agent's Docker backend
- agent/k8s/: Agent's Kubernetes backend
- agent/dummy/: Agent's dummy backend
- kernel/: Agent's kernel runner counterpart
- runner/: Agent's in-kernel prebuilt binaries
- helpers/: Agent's in-kernel helper package
- common/: Shared utilities
- client/: Client SDK
- cli/: Unified CLI for all components
- install/: SCIE-based TUI installer
- storage/: Storage proxy for offloading storage operations
- storage/api: Storage proxy's manager-facing and client-facing APIs
- appproxy/: App proxy for accessing container apps from outside
- appproxy/coordinator: App proxy coordinator who provisions routing circuits
- appproxy/worker: App proxy worker who forwards the traffic
- web/: Web UI server- static/: Backend.AI WebUI release artifacts
 
- logging/: Logging subsystem
- plugin/: Plugin subsystem
- test/: Integration test suite
- testutils/: Shared utilities used by unit tests
- meta/: Legacy meta package
- accelerator/: Intrinsic accelerator plugins
 
- docs/: Unified documentation
- tests/- manager/,- agent/, ...: Per-component unit tests
 
- configs/- manager/,- agent/, ...: Per-component sample configurations
 
- docker/: Dockerfiles for auxiliary containers
- fixtures/- manager/, ...: Per-component fixtures for development setup and tests
 
- plugins/: A directory to place plugins such as accelerators, monitors, etc.
- scripts/: Scripts to assist development workflows- install-dev.sh: The single-node development setup script from the working copy
 
- stubs/: Type annotation stub packages written by us
- tools/: A directory to host Pants-related tooling
- dist/: A directory to put build artifacts (.whl files) and Pants-exported virtualenvs
- changes/: News fragments for towncrier
- pants.toml: The Pants configuration
- pyproject.toml: Tooling configuration (towncrier, pytest, mypy)
- BUILD: The root build config file
- **/BUILD: Per-directory build config files
- BUILD_ROOT: An indicator to mark the build root directory for Pants
- CLAUDE.md: The steering guide for agent-assisted development
- requirements.txt: The unified requirements file
- *.lock,- tools/*.lock: The dependency lock files
- docker-compose.*.yml: Per-version recommended halfstack container configs
- README.md: This file
- MIGRATION.md: The migration guide for updating between major releases
- VERSION: The unified version declaration
Server-side components are licensed under LGPLv3 to promote non-proprietary open innovation in the open-source community while other shared libraries and client SDKs are distributed under the MIT license.
There is no obligation to open your service/system codes if you just run the server-side components as-is (e.g., just run as daemons or import the components without modification in your codes). Please contact us (contact-at-lablup-com) for commercial consulting and more licensing details/options about individual use-cases.
Backend.AI consists of the following core components:
Manager - Central API gateway and orchestrator
- Routes REST/GraphQL requests and orchestrates cluster operations
- Session scheduling via Sokovan orchestrator
- User authentication and RBAC authorization
- Plugin interfaces: backendai_scheduler_v10,backendai_agentselector_v10,backendai_hook_v20,backendai_webapp_v20,backendai_monitor_stats_v10,backendai_monitor_error_v10
- Legacy repo: https://github.com/lablup/backend.ai-manager
Agent - Kernel lifecycle management on compute nodes
- Manages Docker containers (kernels) on individual nodes
- Self-registers to cluster via heartbeats
- Plugin interfaces: backendai_accelerator_v21,backendai_monitor_stats_v10,backendai_monitor_error_v10
- Legacy repo: https://github.com/lablup/backend.ai-agent
Storage Proxy - Virtual folder and storage backend abstraction
- Unified interface for multiple storage backends
- Real-time performance metrics and acceleration APIs
- Legacy repo: https://github.com/lablup/backend.ai-storage-proxy
Webserver - Web UI hosting and session management
- Hosts Backend.AI WebUI (SPA)
- Session management and API request signing
- Legacy repo: https://github.com/lablup/backend.ai-webserver
Synchronizing the static Backend.AI WebUI version:
$ scripts/download-webui-release.sh <target version to download>App Proxy - Service routing and load balancing
- Routes traffic to in-container services (Jupyter, VSCode, etc.)
- Dynamic circuit provisioning and health monitoring
Kernels - Container image recipes
- Dockerfile-based computing environment recipes
- Support for popular ML frameworks and programming languages
Jail - Programmable sandbox (Rust)
- ptrace-based system call filtering
- Resource control and security enforcement
Hook - In-container runtime library
- libc overrides for resource control
- Web-based interactive stdin support
We offer client SDKs in popular programming languages (MIT License):
- Python - pip install backend.ai-client| GitHub | Includes CLI
- Java - Releases
- Javascript - npm install backend.ai-client| GitHub
- PHP - (under preparation) composer require lablup/backend.ai-client| GitHub
Backend.AI supports plugin-based extensibility via Python package entrypoints:
Accelerator Plugins (backendai_accelerator_v21)
- CUDA - NVIDIA GPU support
- CUDA Mock - Development without actual GPUs
- ROCm - AMD GPU support
- More available in the enterprise edition
Monitoring Plugins
- backendai_monitor_stats_v10- Datadog statistics collector
- backendai_monitor_error_v10- Sentry exception collector
Media Library - Multi-media output support (no longer maintained)
IDE Extensions - (Deprecated: Use in-kernel Jupyter Lab, VSCode Server, or SSH instead)
Build Python wheels or SCIE (Self-Contained Installable Executables):
./scripts/build-wheels.sh  # Build .whl packages
./scripts/build-scies.sh   # Build SCIE packagesPackages are placed in dist/ directory.
Backend.AI uses Git pre-commit hooks to maintain code quality:
# Automatically runs on every commit:
# - Linting (pants lint)
# - Type checking (pants check)
# Bypass hooks if needed (use sparingly)
git commit --no-verifyThe pre-commit hook validates:
- Code style and formatting
- Type annotations
Tests run in CI for comprehensive coverage.
See CLAUDE.md for detailed hook system documentation.
For detailed development setup, build system usage, and contribution guidelines:
- Development Setup - Python versions, Pantsbuild, dependency management
- CONTRIBUTING.md - Contribution guidelines and development workflow
- MIGRATION.md - Migration guide for major version updates
Refer to LICENSE file.