A secure, Docker-based Python sandbox server using the Model Context Protocol (MCP) for isolated code execution and advanced healthcare analytics. This project enables secure processing of Synthea synthetic healthcare data with PostgreSQL OMOP CDM integration and LLM-powered analytics.
- π Secure Sandboxing: Isolated Docker containers with resource limits and user isolation
- π₯ Healthcare Data Pipeline: Synthea-to-PostgreSQL with OMOP CDM mapping
- π€ LLM Integration: Natural language queries for healthcare analytics
- π Advanced Analytics: Structured and LLM-friendly data exploration
- π§ MCP Protocol: Model Context Protocol for AI agent integration
- π³ Docker Integration: Containerized PostgreSQL database with data persistence
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β MCP Client βββββΆβ FastMCP Server βββββΆβ Docker Sandbox β
β (AI Agent) β β (main.py) β β (Isolated) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββ βββββββββββββββββββ
β PostgreSQL DB β β Synthea CSV β
β (OMOP CDM) β β (Mounted Data) β
ββββββββββββββββββββ βββββββββββββββββββ
- Python 3.8+ with pip
- Docker & Docker Compose
- Synthea CSV files (optional, for healthcare data processing)
This project is configured to use uv for environment management. uv creates and manages Python virtual environments and can install the dependencies declared in pyproject.toml under tool.uv.
Quick start using uv:
# Install uv (see https://astral.sh/uv for instructions)
# Then create a uv-managed venv and install dependencies:
scripts/setup_uv.sh
source .venv/bin/activateIf you prefer not to use uv, you can still create a regular venv and install the packages listed in pyproject.toml or requirements.txt.
git clone https://github.com/fastomop/omcp_py.git
cd omcp_py
# Install dependencies
pip install -r requirements.txt# Start the OMOP database
docker-compose up -d db
# Verify it's running
docker-compose psPlace your Synthea CSV files in the synthetic_data/ directory:
synthetic_data/
βββ patients.csv # Patient demographics
βββ encounters.csv # Healthcare encounters
βββ conditions.csv # Medical conditions
βββ ...
# Set Python path
export PYTHONPATH=src
# Start the server
python src/omcp_py/main.pyUse MCP Inspector or your preferred MCP client:
# Install MCP Inspector
npm install -g @modelcontextprotocol/inspector
# Connect to the server
mcp-inspector python src/omcp_py/main.pyThen open http://127.0.0.1:6274 in your browser.
# 1. Create sandbox and install packages
sandbox_id = await mcp.create_sandbox()
await mcp.install_package(sandbox_id, "pandas psycopg2-binary sqlalchemy")
# 2. Create OMOP CDM schema
await mcp.create_omop_schema(sandbox_id)
# 3. Load Synthea data
await mcp.load_synthea_to_postgres(sandbox_id, "/synthetic_data")
# 4. Run analytics
await mcp.analyze_omop_data(sandbox_id, "basic")
await mcp.llm_dataframe_operation(sandbox_id, "Count total patients")| Tool | Description | Example |
|---|---|---|
create_sandbox |
Create isolated Python environment | create_sandbox() |
install_package |
Install Python packages | install_package(sandbox_id, "pandas") |
create_omop_schema |
Create OMOP CDM database schema | create_omop_schema(sandbox_id) |
load_synthea_to_postgres |
Load Synthea CSV to PostgreSQL | load_synthea_to_postgres(sandbox_id, "/synthetic_data") |
analyze_omop_data |
Run structured analytics | analyze_omop_data(sandbox_id, "basic") |
llm_dataframe_operation |
Natural language queries | llm_dataframe_operation(sandbox_id, "Count patients") |
execute_sql_in_sandbox |
Direct SQL execution | execute_sql_in_sandbox(sandbox_id, "SELECT COUNT(*) FROM person") |
remove_sandbox |
Clean up sandbox | remove_sandbox(sandbox_id, force=True) |
{
"total_patients": 1000,
"total_visits": 5000,
"total_conditions": 8000
}[
{
"gender_concept_id": 8507,
"patient_count": 500,
"avg_age": 45.2
}
]# These work with natural language
await mcp.llm_dataframe_operation(sandbox_id, "Count total patients")
await mcp.llm_dataframe_operation(sandbox_id, "Show age distribution")
await mcp.llm_dataframe_operation(sandbox_id, "Count unique conditions")
await mcp.llm_dataframe_operation(sandbox_id, "Show gender distribution")Create a .env file or set environment variables:
# Sandbox Configuration
SANDBOX_TIMEOUT=300
MAX_SANDBOXES=10
DOCKER_IMAGE=fastomop/sandbox:python-3.11-slim # recommended prebuilt sandbox image
DEBUG=false
LOG_LEVEL=INFO
# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_USER=omop_user
DB_PASSWORD=omop_pass
DB_NAME=omopThe docker-compose.yml provides:
- PostgreSQL 15 with OMOP database
- Persistent data storage
- Synthea data directory mounting
python tests/test_synthea_integration.py./scripts/demo.shWe provide a prebuilt sandbox Dockerfile and a convenience demo script to run an end-to-end local demo.
- Build the prebuilt sandbox image (optional but recommended):
docker build -t fastomop/sandbox:python-3.11-slim -f docker/sandbox/Dockerfile .- Run the demo (builds image, starts DB, launches server, runs a local client and prints DB counts):
./scripts/demo.shIf you have a DuckDB snapshot at synthetic_data/synthea.duckdb and want the demo to load Synthea into Postgres, run:
./scripts/demo.sh --load-duckdbIf port 5432 on your host is already in use, pass an alternate host port to the demo script or set DB_PORT in your environment (or .env) before running:
# Use port 5433 for the host mapping
./scripts/demo.sh --db-port 5433 --load-duckdb
# or export DB_PORT beforehand
export DB_PORT=5433
./scripts/demo.sh --load-duckdbNotes:
- The sandbox manager will auto-join the docker-compose network (if detected) so sandboxes can resolve the
dbservice name when running underdocker compose. - If you use a host Postgres instance, set
DB_HOST=host.docker.internalor enable host-gateway resolution.
# Test file structure
python -c "import src.omcp_py.main; print('β
Main module loads successfully')"
# Test Docker Compose
docker-compose config- Container Isolation: Each sandbox runs in isolated Docker containers
- Resource Limits: CPU and memory restrictions per sandbox
- User Isolation: Non-root user execution
- Network Security: Controlled network access
- File System: Read-only filesystem with temporary mounts
- Capability Dropping: Removed dangerous Linux capabilities
- Auto-cleanup: Automatic removal of inactive sandboxes
- Synthea Usage Guide - Detailed workflow documentation
- API Reference - Complete tool documentation
- Configuration Guide - Environment and deployment setup
- Architecture Overview - System design and components
Extend the Synthea-to-OMOP mapping in load_synthea_to_postgres:
synthea_mappings = {
'custom_data.csv': {
'table': 'omop_cdm.custom_table',
'columns': {
'custom_id': 'person_id',
'custom_date': 'birth_datetime'
}
}
}Extend the schema to include more OMOP CDM tables:
drug_exposureprocedure_occurrencemeasurementobservation
Create domain-specific analytics:
# Custom Python code in sandbox
code = '''
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql://omcp:postgres@db:5432/omcp')
df = pd.read_sql("SELECT * FROM omop_cdm.person", engine)
# Your custom analysis here
result = df.groupby('gender_concept_id').agg({
'person_id': 'count',
'birth_datetime': lambda x: pd.Timestamp.now().year - pd.to_datetime(x).dt.year.mean()
}).to_dict()
print(result)
'''
await mcp.execute_python_code(sandbox_id, code)- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
MIT License - see LICENSE file for details.
- Model Context Protocol for the MCP specification
- FastMCP for the Python MCP implementation
- Synthea for synthetic healthcare data
- OMOP CDM for healthcare data standards
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Wiki
Built by Zhangshu Joshua Jiang and the wider FastOMCP team