These are services that are part of the TacoQ core system. The user doesn't use these directly in their code and they exist purely to transport and store task data.
The broker is responsible for transporting task objects between services. This works by having a central exchange that routes all task objects to the relay and new tasks to be executed to the appropriate worker.
The broker is implemented in RabbitMQ. Its structure is as follows:
relay_queue
, which is consumed by the Relay to continuously update the
task database. When a worker starts or finishes executing a task, they send
an update directly to this queue.task_exchange
exchange to which new tasks are published, being routed to
the appropriate worker queue based on their routing key (which is dictated
by the worker kind in the task object).Queues and exchanges are not customizable because RabbitMQ doesn't like it when different services declare different queue and exchange configurations (it crashes). Therefore:
{"x-max-priority": 255}
to allow for maximum
flexibility in task priority and to
have the priority feature available by default.We do not plan on supporting additional brokers in the near future, but we are open to making the broker an abstract class and accepting contributions for other message brokers if there is enough demand.
This is because we rely on RabbitMQ's routing and priorities features, which are not always present in other brokers.
The latest state of each task is stored in the database. The database is implemented in Postgres and managed via Rust's sqlx library by the relay.
Some Postgres-backed task queues like Hatchet store every event and use triggers to keep a materialized view with the latest state of each task up to date.
We do not do this for a few reasons:
- As the Hatchet team has noted, it is no easy feat to get the triggers right.
- Unlike Hatchet, we are only a task queue, not a workflow orchestrator - it is not as important for us to store information about every step of a workflow.
- We already support OTEL tracing. We believe this is enough to get observability into the system. The task also has information about the timeline of its execution.
We use some Postgres-specific features like
LISTEN
andNOTIFY
to implement event-based task updates with the clients. Because of this, we do not plan on suporting other databases in the near future.
The relay, as the name implies, is responsible for relaying information between the core services and the user's application. It is implemented in Rust and has the following capabilities:
The relay consumes task updates from the broker and stores them in the database as they come in.
The relay also serves a REST API for retrieving task data from the database. You can read the API swagger definition in Relay Endpoints. The REST API is implemented in Axum.
The relay will run a job to delete tasks that have been in the database for longer than a set period of time specified by the user. An index exists on the TTL column of the database to make this operation efficient.
The relay is stateless and can be scaled horizontally if you need to load balance requests between multiple relays or increase the consuming rate of tasks.
The relay has a lot of features packaged into a single service for the sake of simplicity.
If you only want to scale the consuming rate of tasks horizontally but you don't need more APIs, you can use the environment variables
ENABLE_RELAY_TASK_CONSUMER
,ENABLE_RELAY_CLEANUP
andENABLE_RELAY_API
to disable the features you don't need. Read more about environment variables in the Relay environment variables section.
These are set up by the user himself and can safely go online and offline as needed.
The worker is responsible for executing tasks. Each worker has a worker_kind
and multiple task_kind
's that it is capable of executing.
Why are worker kinds a thing?
It is not uncommon for task queues to have all their workers consume from the same queue. If you were to have two different workers with different task capabilities, they would often consume tasks they are unable to execute, NACK them, and send the task to the back of the queue. This could happen repeatedly and cause the task to never be executed, or at least be greatly delayed.
Another possible implementation would be to have one queue per task kind, which would allow workers to only consume the queues they know they are able to execute. This would, however, require the worker to have a strategy for determining which queue to prioritize consuming, increasing complexity and making the behaviour more opaque.
So, we make the user explicitely decide which worker to route their task to, and we make the worker kind part of the task object.
The drawbacks to the current approach are:
- Additional setup parameter that must be known pre-runtime.
- There cannot be two different worker kinds with the shared task kind capabilities. The user must always choose which worker kind to route a task to.
Given these extremely specific drawbacks which apply to almost no one and can easily be worked around, we've decided to use worker kinds to route tasks. If you have a better idea, please let us know! :)
A task publisher client isn't a service but a client for submitting tasks to your workers. It connects directly to the broker.
As task results are completed, they get stored in the database. To access them, an application must communicate with the relay via REST API. So, the client SDKs have a Relay Client built in, whom is capable for retrieving task results via the REST API.
Services communicate with each other by sending Task
, TaskAssignmentUpdate
,
TaskRunningUpdate
, and TaskCompletedUpdate
objects. We split the updates
into different objects so we can be as compact as possible when sending small
updates.
Objects are serialized and schematized using Apache Avro:
schema/avro