etr  4.0-pre
Classes | Functions
etr.plugins.nomad Namespace Reference

Classes

class  Plugin
 Nomad plugin that enables deployment of Nomad jobs. More...
 

Functions

def wait_until_healthy (nomad_client, job_id)
 Running job does not mean it is healthy. More...
 
def host_url (host)
 Validate –nomad-host argument. More...
 

Function Documentation

◆ host_url()

def etr.plugins.nomad.host_url (   host)

Validate –nomad-host argument.

Valid is an URI with netloc and optional scheme.

Examples
//localhost equivalent to http://localhost:4646 https://localhost equivalent to https://localhost:4646 //localhost:1234 equivalent to http://localhost:1234

◆ wait_until_healthy()

def etr.plugins.nomad.wait_until_healthy (   nomad_client,
  job_id 
)

Running job does not mean it is healthy.

When running, allocations may still be under way, or failing. Allocation statuses:

- Queued
- Running
- Starting
- Failed
- Complete
- Lost

Allocations can be considered "run attempts". So it does not mean that the numbers will add up to the number of tasks.

Starting means that an allocation is under way. This seems to be true for the duration of retry attempts. Failed means that an allocation has failed (restart attempts exceeded). The task group might still end up in running though due to rescheduling, so this number cannot be relied upon to figure out health. Running means that a task allocation is running (but not necessarily healthy). Complete, dunno Starting, dunno. Queued, dunno. Lost, dunno.

Services

For services Failed can increment indefinitely depending on the limitations on restart and reschedule stanzas.

Q How do I determine when job has been deployed fully? A Healthy job deployment at this point means to have as many Running tasks as the sum of all task group counts (to be verified). Note that a task may be Running for a short period before exiting causing false positives.

Q How do I know when to give up waiting for Nomad job to be healthy? A There is no obvious way to know when to give up except monitor monitor the allocations.

If an allocation is `dead` that means that Nomad has given up on it for a given scheduling.
Now nomad may try to reschedule it at which point it will create a new allocation which
may or may not fail again.

Q How do I know if Nomad have given up rescheduling an allocation? A There seems to be no way to see this easily.