This project is an effort to create Open-source implementation of a task execution engine based on the TES standard
distributing executions to services exposing Pulsar application. For more details
on TES, see the Task Execution Schemas documentation. Pulsar
is a Python server application that allows a Galaxy server to run jobs on remote systems. The original intention of this
project was to modify the Pulsar project (e.g. via forking) so its Rest API would be compatible with the TES standard.
Later a decision was made that rather a separate microservice will be created, decoupled from the Pulsar, implementing the TES
standard and distributing TES tasks execution to Pulsar applications.
Repository contains docker-compose.yaml file with infrastructure setup for current functionality which can be used to
immediately start the project in DEVELOPMENT environment. This is convenient for users and contributors as there is no need to manually install and
configure all the services which TESP API requires for it to be fully functional. While this is the easiest approach to start
the project, for "first timers" it's recommended to follow this readme to understand all the services and tools used across the project.
Also have a detailed look at Current Docker services section of this readme before starting up the infrastructure for the first time.
!! DISCLAIMER:
Project is currently in the development phase only and should not be used in production environments yet. If you really
wish to set up a production environment despite missing features, tests etc ... then following contents will show what needs to be done.
You can work purely with docker and docker-compose only
instead of starting the project locally without docker. In that case only those two dependencies are relevant for you.
| dependency | version | note |
|---|---|---|
| docker | 20.10.0+ | latest is preferred |
| docker-compose | 1.28.0+ | - |
| python | 3.10.0+ | - |
| pip | python3 | 21.3.1 in case of problems |
| poetry | 1.1.13+ | pip install poetry |
| mongodb | 4.4+ | docker-compose uses latest |
| pulsar | 0.14.13 | actively trying to support latest. Must have access to docker with the same host as pulsar application itself |
| ftp server | - | no real recommendation here. docker-compose uses ftpserver so local alternative should support same fpt commands. |
TESP API uses dynaconf for its configuration. Configuration is currently set up by using
./settings.toml file. This file declares sections which represent different environments for TESP API. Default section
is currently used for local development without docker. Also, all the properties from default section are propagated
to other sections as well unless they are overridden in the specific section itself. So for example if following settings.toml
file is used
[default]
db.mongodb_uri = "mongodb://localhost:27017"
logging.level = "DEBUG"
[dev-docker]
db.mongodb_uri = "mongodb://tesp-db:27017"
then dev-docker environment will use property logging.level = DEBUG as well, while property db.mongodb_uri
gets overridden to url of mongodb in the docker environment. dev-docker section in current ./settings.toml
file is set up to support ./docker-compose.yaml for development infrastructure.
To apply different environment (i.e. to switch which section will be picked by TESP API) environment variable
FASTAPI_PROFILE must be set to the concrete name of such section (e.g. FASTAPI_PROFILE=dev-docker which can be seen
in the ./docker/tesp_api/Dockerfile)
You can have a look at ./docker-compose.yaml to see how
the infrastructure for development should look like. Of course, you can configure those services in your preferred way if you are
going to start the project without docker or if you are trying to create other than development environment but some things
must remain as they are. For example, TESP API currently supports communication with Pulsar only through its Rest API and
therefore Pulsar must be configured in such a way.
All the current Docker services which will be used when the project is started with docker-compose have common directory
./docker for configurations, data, logs and Dockerfiles if required.
docker-compose should run out of the box, but sometimes it might happen that a problem with privileges occurs while for
example trying to create data folder for given service. Such issues should be resolved easily manually. Always look into
./docker-compose.yaml to see what directories need to mapped
which ports to be used etc. Following services are currently defined by ./docker-compose.yaml
- tesp-api - This project itself. Depends on mongodb
- tesp-db - MongoDB instance for persistence layer
- pulsar_rest -
Pulsarconfigured to use Rest API with access to a docker instance thanks to DIND. - rabbitmq - currently disabled, will be used in the future development
- pulsar_amqp - currently disabled, will be used in the future development
- ftpserver - online storage for
TEStasks input/output content - minio - currently acting only as a storage backend for the
ftpserverwith simple web interface to access data.
Folder ./docker/minio/initial_data contains startup
folders for minio service which must be copied to the ./docker/minio/data folder before starting up the infrastructure. Those data
configure minio to start with already created bucket and user which will be used by ftpserver for access.
This project uses Poetry for dependency management and packaging. Poetry makes it easy
to install libraries required by TESP API. It uses ./pyproject.toml
file to obtain current project orchestration configuration. Poetry automatically creates virtualenv, so it's easy to run
application immediately. You can use command poetry config virtualenvs.in-project true which globally configures
creation of virtualenv directories directly in the project instead of the default cache folder. Then all you need to do
to run TESP API deployed to uvicorn for example is:
poetry install
poetry run uvicorn tesp_api.tesp_api:app --reload --host localhost --port 8000Otherwise, as was already mentioned, you can instead use docker-compose to start whole development infrastructure.
Service representing TESP API is configured to mount this project sources as a volume and TESP API is run with the very
same command as is mentioned above. Therefore, any changes made to the sources in this repository will be immediately applied to the docker
service as well, enabling live reloading which makes development within the docker environment very easy.
docker-compose up -d
docker-compose sets up whole development infrastructure. There will be two important endpoints to explore if you wish to
execute some TES tasks. Before doing any action, don't forget to run docker-compose logs command to see if each service
initialized properly or whether any errors occurred.
- http://localhost:8080/ - will redirect to Swagger documentation of
TESP API. This endpoint also currently acts as a frontend. You can use it to execute REST based calls expected by theTESP API. Swagger is automatically generated from the sources, and therefore it corresponds to the very current state of theTESP APIinterface. - http://localhost:40949/ -
minioweb interface. Useadminand!Password123credentials to login. Make sure that buckettesp-ftpis already present, otherwise see Current Docker services section of this readme to properly prepare infrastructure before the startup.
This section will demonstrate execution of simple TES task which will calculate md5sum
hash of given input. There are more approaches of how I/O can be handled by TES but main goal here is to demonstrate ftp server as well.
- Head over to http://localhost:40949/buckets/tesp-ftp/browse and upload a new file with your preferred name and content (e.g. name
holy_fileand contentHello World!). This file will now be accessible troughftpserverservice and will be used as an input file for this demonstration. - Go to http://localhost:8080/ and use
POST /v1/tasksrequest to create followingTEStask (task is sent in the request body). In the"inputs.url"replace<file_uploaded_to_minio>with the file name you chose in the previous step. If http status of returned response is 200, the response will containidof created task in the response body which will be used to reference this task later on.
{
"inputs": [
{
"url": "ftp://ftpserver:2121/<file_uploaded_to_minio>",
"path": "/data/file1",
"type": "FILE"
}
],
"outputs": [
{
"path": "/data/outfile",
"url": "ftp://ftpserver:2121/outfile-1",
"type": "FILE"
}
],
"executors": [
{
"image": "alpine",
"command": [
"md5sum"
],
"stdin": "/data/file1",
"stdout": "/data/outfile"
}
]
}- Use
GET /v1/tasks/{id}request to view task you have created. Useidfrom the response you have obtained in the previous step. This request also supportsviewquery parameter which can be used to limit the view of the task. By default,TESP APIwill returnMINIMALview which only includesidandstateof the requested task. Wait until task state is set to the stateCOMPLETEor one of the error states. In case of an error state, depending on its type, the error will be part of the task logs in the response (useFULLview), or you can inspect the logs ofTESP APIservice, where error should be logged with respective message. - Once the task completes you can head over back to http://localhost:40949/buckets/tesp-ftp/browse where you should find
uploaded
outfile-1with output content of executed md5sum. You can play around by creating different tasks, just be sure to only use functionality which is currently supported - see Known limitations. For example, you can omitinputs.urland instead useinputs.contentwhich allows you to create input in place, or you can also omitoutputsandexecutors.stdoutin which case the output will be present in thelogs.logs.stdoutas executor is no longer configured to redirect stdout into the file.
| Domain | Limitation |
|---|---|
| Pulsar | TESP API communicates with Pulsar only through its REST API, missing functionality for message queues |
| Pulsar | TESP API should be able to dispatch executions to multiple Pulsar services via different types of Pulsar interfaces. Currently, only one Pulsar service is supported |
| Pulsar | Pulsar must be "polled" for job state. Preferably Pulsar should notify TESP API about state change. This is already default behavior when using Pulsar with message queues |
| TES | Canceling TES task does not immediately stop the task. Task even cannot be canceled while it is running. |
| TES | TES does not state specific urls to be supported for file transfer (e.g. tasks inputs.url). Only FTP is supported for now |
| TES | tasks inputs.type and outputs.type can be either DIRECTORY or FILE. Only FILE is supported, DIRECTORY will lead to undefined behavior for now |
| TES | tasks resources currently do not change execution behavior in any way. This configuration will take effect once Pulsar limitations are resolved |
| TES | tasks executors.workdir and executors.env functionality is not yet implemented. You can use them but they will have no effect |
| TES | tasks volumes and tags functionality is not yet implemented. You use them but they will have no effect |
| TES | tasks logs.outputs functionality is not yet implemented. However this limitation can be bypassed with tasks outputs |
Current main branch is origin/main. This happens to be also a release branch for now. Developers should typically derive their
own feature branches such as e.g. feature/TESP-111-task-monitoring. This project has not yet configured any CI/CD. Releases are
done manually by creating a tag in the current release branch. There is not yet configured any issue tracking software but for
any possible future integration this project should reference commits, branches PR's etc ... with prefix TESP-0 as a reference
to a work that has been done before such integration. Pull request should be merged using Squash and merge option with message format Merge pull request #<PRnum> from <branch-name>.
Since there is no CI/CD setup this is only opinionated view on how branching policies should work and for now everything is possible.
Copyright (c) 2022 Norbert Dopjera
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.