- FastAPI backend for serving GLiNER models (NER).
- Gradio frontend (optional) for interactive use.
- Prometheus metrics endpoint (
/metrics). - Configurable via YAML, CLI, or environment variables.
- Docker and Docker Compose support.
- ONNX inference support (including quantized models).
- API key authentication (optional).
- Custom metrics port and enable/disable option for Prometheus metrics.
For detailed documentation, see DeepWiki (
You can try the live demo of the GLiNER API container in it's Huggingface Space: GLiNER API Demo.
It uses a minimally changed image to make it work in the Huggingface Space environment.
You can either build the container yourself or use a prebuilt image from GitHub Container Registry.
CPU version:
docker run \
-p 8080:8080 \
-p 9090:9090 \
-v $(pwd)/config.yaml:/app/config.yaml \
-v $HOME/.cache/huggingface:/app/huggingface \
ghcr.io/freinold/gliner-api:latestGPU version:
docker run \
--gpus all \
-p 8080:8080 \
-p 9090:9090 \
-v $(pwd)/config.yaml:/app/config.yaml \
-v $HOME/.cache/huggingface:/app/huggingface \
ghcr.io/freinold/gliner-api-gpu:latestMounting volumes:
-v $(pwd)/config.yaml:/app/config.yamlmounts your config file (edit as needed)-v $HOME/.cache/huggingface:/app/huggingfacemounts your Huggingface cache for faster model loading
docker build \
-f cpu.Dockerfile \
--build-arg IMAGE_CREATED="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--build-arg IMAGE_REVISION="$(git rev-parse HEAD)" \
--build-arg IMAGE_VERSION="$(git describe --tags --always)" \
-t gliner-api .
docker run --rm \
-p 8080:8080 \
-p 9090:9090 \
-v $(pwd)/example_configs/general.yaml:/app/config.yaml \
-v $HOME/.cache/huggingface:/app/huggingface \
gliner-apiPowerShell version
docker build `
-f cpu.Dockerfile `
--build-arg IMAGE_CREATED="$(Get-Date -Format 'yyyy-MM-ddTHH:mm:ssZ')" `
--build-arg IMAGE_REVISION="$(git rev-parse HEAD)" `
--build-arg IMAGE_VERSION="$(git describe --tags --always)" `
-t gliner-api .
docker run --rm `
-p 8080:8080 `
-p 9090:9090 `
-v "$PWD/example_configs/general.yaml:/app/config.yaml" `
-v "$HOME/.cache/huggingface:/app/huggingface" `
gliner-apidocker build \
-f gpu.Dockerfile \
--build-arg IMAGE_CREATED="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--build-arg IMAGE_REVISION="$(git rev-parse HEAD)" \
--build-arg IMAGE_VERSION="$(git describe --tags --always)" \
-t gliner-api-gpu .
docker run --rm \
--gpus all \
-p 8080:8080 \
-p 9090:9090 \
-v $(pwd)/example_configs/general.yaml:/app/config.yaml \
-v $HOME/.cache/huggingface:/app/huggingface \
gliner-api-gpuPowerShell version
docker build `
-f gpu.Dockerfile `
--build-arg IMAGE_CREATED="$(Get-Date -Format 'yyyy-MM-ddTHH:mm:ssZ')" `
--build-arg IMAGE_REVISION="$(git rev-parse HEAD)" `
--build-arg IMAGE_VERSION="$(git describe --tags --always)" `
-t gliner-api-gpu .
docker run --rm `
--gpus all `
-p 8080:8080 `
-p 9090:9090 `
-v "$PWD/example_configs/general.yaml:/app/config.yaml" `
-v "$HOME/.cache/huggingface:/app/huggingface" `
gliner-api-gpuEdit cpu.compose.yaml / gpu.compose.yaml to select the config you want (see example_configs).
Then run:
# For CPU version
docker compose -f cpu.compose.yaml up
# For GPU version
docker compose -f gpu.compose.yaml upBe sure to check the installation instructions first.
uv run main.py [OPTIONS]Or with FastAPI CLI:
fastapi run main.py --host localhostuv run main.py --help| Option | Description | Default |
|---|---|---|
--use-case / --name |
Use case for the GLiNER model (application/domain) | general |
--model-id |
Huggingface model ID (browse models) | knowledgator/gliner-x-base |
--onnx-enabled |
Use ONNX for inference | False |
--onnx-model-path |
Path to ONNX model file | model.onnx |
--default-entities |
Default entities to detect | ['person', 'organization', 'location', 'date'] |
--default-threshold |
Default detection threshold | 0.5 |
--api-key |
API key for authentication (if set, required in requests) | null |
--host |
Host address | "" (bind to all interfaces) |
--port |
Port | 8080 |
--metrics-enabled |
Enable Prometheus metrics endpoint | True |
--metrics-port |
Port for Prometheus metrics endpoint | 9090 |
--frontend-enabled |
Enable Gradio frontend | True |
| Description | Path | Demo Link |
|---|---|---|
| Gradio Frontend (if enabled) | / |
Frontend |
| API Docs (Swagger) | /docs |
Swagger UI |
| API Docs (ReDoc) | /redoc |
ReDoc |
| Prometheus Metrics | /metrics |
(no public demo link; available on metrics port if enabled) |
curl -X POST "http://localhost:8080/api/invoke" -H "Content-Type: application/json" -d '{"text": "Steve Jobs founded Apple in Cupertino."}'Prerequisites:
- Python 3.13.9
- uv (for dependency management)
Install dependencies:
# CPU version
uv sync --extra cpu [--extra frontend]
# GPU version
uv sync --extra gpu [--extra frontend]The frontend is optional, but encouraged for interactive use.
Install from source:
git clone https://github.com/freinold/gliner-api.git
cd gliner-api
uv sync --extra cpu # or --extra gpuYou can configure the app via:
config.yaml(default, seeexample_configs/)- CLI options (see above)
- Environment variables (prefix:
GLINER_API_)
Example configs:
example_configs/general.yaml(default NER)example_configs/pii.yaml(PII detection)example_configs/medical.yaml(medical NER)example_configs/general_onnx.yaml(ONNX inference)example_configs/general_onnx_quantized.yaml(quantized ONNX)
- FastAPI (API backend)
- Gradio (optional frontend)
- Uvicorn (ASGI server)
- Prometheus Client (metrics)
- Huggingface Hub (model loading)
- PyTorch (CPU/GPU inference)
- ONNX (optional, for ONNX models)
- uv (dependency management)
See LICENSE.
