Canary

LLM prompt injection detection.

How it works

User submits a potentially malicious message.
The message is passed through a LLM prompted to format the message plus a unique key into a JSON. In the event the message is a malicious prompt, this output should be compromised. If the output is an invalid JSON, is missing a key, or a key-value doesn't match the expected values, then the integrity may be compromised.
If the integrity check passes, the user message is forwarded to the guarded LLM (e.g.: the application chatbot, etc.).
The API returns the result of the integrity test (boolean) and either the chatbot response (if integrity passes) or an error message (if integrity fails).

graph TD
    A[1: User Inputs Chat Message] --> B[2: Integrity Filter]
    B -->|Integrity check passes.| C[3: Generate Chatbot Response]
    B -->|Integrity check fails. Response is error message.| D
    C -->|Response is chatbot message.| D[4: Return Integrity and Response]

What this solution can do:

Detect inputs that override an LLMs initial / system prompt.

What this solution cannot do:

Neutralise malicious prompts.

Install dependencies

If using poetry:

poetry install

If using vanilla pip:

pip install .

Usage

Set your OpenAI API key in .envrc.

To run the project locally, run

make start

This will launch a webserver on port 8001.

Or via docker compose (does not use hot reload by default):

docker compose up

Query the /chat endpoint, e.g.: using curl:

curl -X POST -H "Content-Type: application/json" -d '{"message": "Hi how are you?"}' http://127.0.0.1:8000/chat

To run unit tests:

make test

Contributing

For information on how to set up your dev environment and contribute, see here.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
.vscode		.vscode
canary		canary
.envrc.example.sh		.envrc.example.sh
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python_version		.python_version
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
canary.png		canary.png
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Canary

How it works

Install dependencies

Usage

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

Cutwell/canary

Folders and files

Latest commit

History

Repository files navigation

Canary

How it works

Install dependencies

Usage

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages