==== Jobs ==== The **job system** is how Eyrie executes Deployment Books. Every lifecycle action (deploy, pause, backup, etc.) creates a job that serializes the book's steps and runs them sequentially. Job Model ========= ``eyrie.deployment.job`` is the runtime representation of a Deployment Book execution. Key fields: - **book_id** — the Deployment Book being executed. - **deployment_id** / **deployment_group_id** / **cluster_id** — the scoped resources this job operates on. - **state** — ``draft`` → ``scheduled`` → ``running`` → ``done`` / ``error`` / ``cancel``. - **step_ids** — serialized copies of the book's steps, created at job creation time. - **current_step_id** — the step currently being executed. Job Lifecycle ============= 1. **Draft** — the job record is created. 2. **Scheduled** — the job is enqueued via OCA ``queue_job``. The ``date_scheduled`` timestamp is recorded. 3. **Running** — the job runner picks up the job and begins executing steps in sequence. ``date_running`` is set. 4. **Done** — all steps completed successfully. ``date_done`` is set. 5. **Error** — a step raised a non-retryable error. 6. **Cancel** — the job was manually cancelled. Step Serialization ================== When a job is created, Eyrie copies the book's steps into ``eyrie.deployment.job.step`` records. Each step's ``python_serialize_job_step_data`` is evaluated at this point, injecting computed data into the step's context. This means the job is a snapshot — later changes to the book do not affect in-flight jobs. Retry Mechanism =============== Steps can signal that they should be retried by raising a ``RetryableJobError`` (from OCA ``queue_job``) or returning a ``JobStepResultRetryable``. The job runner re-enqueues the step and tries again after a delay. Non-retryable failures raise ``FailedJobError`` or return ``JobStepResultFailed``, which moves the job to the *error* state. Integration with OCA queue_job ============================== Eyrie delegates job scheduling and execution to the OCA ``queue_job`` module. This provides: - Configurable concurrency channels. - Automatic retries with exponential backoff. - Dead-letter handling for permanently failed jobs. - Database-level locking to prevent duplicate execution. Estimated Durations and Timeouts ================================= Each step has an ``estimated_duration`` (seconds) and an ``estimated_duration_over_stop`` multiplier (default: 10×). If a step runs longer than ``estimated_duration × estimated_duration_over_stop``, it is considered timed out. Historical averages from previous job runs override the static estimate during serialization. Bus Notifications ================= When a job or step changes state, Eyrie sends a bus notification (``eyrie.deploy_event``) so that the portal UI can update in real time. Users see job progress, step completion, and errors without refreshing the page. Jobs with ``should_notify = True`` (inherited from the book's ``should_notify`` flag) also trigger user-facing notifications.