Jobs

The job system is how Eyrie executes Deployment Books. Every lifecycle action (deploy, pause, backup, etc.) creates a job that serializes the book’s steps and runs them sequentially.

Job Model

eyrie.deployment.job is the runtime representation of a Deployment Book execution. Key fields:

  • book_id — the Deployment Book being executed.

  • deployment_id / deployment_group_id / cluster_id — the scoped resources this job operates on.

  • statedraftscheduledrunningdone / error / cancel.

  • step_ids — serialized copies of the book’s steps, created at job creation time.

  • current_step_id — the step currently being executed.

Job Lifecycle

  1. Draft — the job record is created.

  2. Scheduled — the job is enqueued via OCA queue_job. The date_scheduled timestamp is recorded.

  3. Running — the job runner picks up the job and begins executing steps in sequence. date_running is set.

  4. Done — all steps completed successfully. date_done is set.

  5. Error — a step raised a non-retryable error.

  6. Cancel — the job was manually cancelled.

Step Serialization

When a job is created, Eyrie copies the book’s steps into eyrie.deployment.job.step records. Each step’s python_serialize_job_step_data is evaluated at this point, injecting computed data into the step’s context. This means the job is a snapshot — later changes to the book do not affect in-flight jobs.

Retry Mechanism

Steps can signal that they should be retried by raising a RetryableJobError (from OCA queue_job) or returning a JobStepResultRetryable. The job runner re-enqueues the step and tries again after a delay.

Non-retryable failures raise FailedJobError or return JobStepResultFailed, which moves the job to the error state.

Integration with OCA queue_job

Eyrie delegates job scheduling and execution to the OCA queue_job module. This provides:

  • Configurable concurrency channels.

  • Automatic retries with exponential backoff.

  • Dead-letter handling for permanently failed jobs.

  • Database-level locking to prevent duplicate execution.

Estimated Durations and Timeouts

Each step has an estimated_duration (seconds) and an estimated_duration_over_stop multiplier (default: 10×). If a step runs longer than estimated_duration × estimated_duration_over_stop, it is considered timed out. Historical averages from previous job runs override the static estimate during serialization.

Bus Notifications

When a job or step changes state, Eyrie sends a bus notification (eyrie.deploy_event) so that the portal UI can update in real time. Users see job progress, step completion, and errors without refreshing the page.

Jobs with should_notify = True (inherited from the book’s should_notify flag) also trigger user-facing notifications.