Consul and Nomad usage

Nomad and Consul have been selected as mechanisms for process deployment and monitoring.

Note

This documentation covers just RTC Toolkit relevant aspects of Nomad and Consul for more details please refer to general Nomad and Consul documentation.

Nomad usage

Nomad is used to deploy processes e.g RTC components in the RTC system.

Nomad job files

Nomad uses a job description file, which describes how to start the processes. Since RTC Components use a common command line format, automatic generation is possible for creating the component’s Nomad job files. The details of the job file might be useful for advanced users only.

The template job file uses jinja2 for replacing entries and it looks like the following:

job {{ template_component_name }} {
#Specify Nomad datacenters to run job on
datacenters = ["{{ template_datacenter }}"]

type = "batch"
{% if template_node %}
constraint {
    attribute = "${meta.node}"
    operator  = "set_contains"
    value = "{{ template_node }}"
}
{% endif %}

group "{{ template_component_name }}_group" {
    restart{
        attempts = 0
        mode = "fail"
    }
    reschedule {
        attempts  = 0
        unlimited = false
    }
    ephemeral_disk{
        size = 150
    }
    network {
        port "req_rep" {}
        port "pub_sub" {}
    }
    {% if template_no_services %}
    service {
        name = "{{ template_component_name_dashes }}"
        port = "req_rep"
        task = "{{ template_component_name }}_task"
        meta {
            rtc_component_name = "{{ template_component_name }}"
            endpoint_type = "req_rep_endpoint"
            endpoint_uri = "zpb.rr://${NOMAD_IP_req_rep}:${NOMAD_PORT_req_rep}/"
        }
        check {
            name      = "{{ template_component_name_dashes }}-GetState"
            type      = "script"
            interval  = "20s"
            timeout   = "2s"
            command   = "/bin/bash"
            args      = [
                "-l",
                "-c",
                {% if template_deployment_daemon %}
                "msgsend --uri zpb.rr://${NOMAD_IP_req_rep}:${NOMAD_PORT_req_rep}/DeploymentCmds ::rtctkif::DeploymentCmds::GetState"
                {% else %}
                "msgsend --uri zpb.rr://${NOMAD_IP_req_rep}:${NOMAD_PORT_req_rep}/StdCmds ::stdif::StdCmds::GetState"
                {% endif %}

            ]
        }
    }
    service {
        name = "{{ template_component_name_dashes }}"
        port = "pub_sub"
        task = "{{ template_component_name }}_task"
        meta {
            rtc_component_name = "{{ template_component_name }}"
            endpoint_type = "pub_sub_endpoint"
            endpoint_uri = "zpb.ps://${NOMAD_IP_pub_sub}:${NOMAD_PORT_pub_sub}/"
        }
    }
    {% endif %}
    task "{{ template_component_name }}_task" {
        kill_timeout = "{{ template_app_kill_timeout }}"
        resources {
            cpu = 20
            memory = 10
        }
        driver = "raw_exec"
        {% if template_user %}
        user = "{{ template_user }}"
        {% endif %}
        config {
            # in this way we get the environament variables
            command = "/usr/bin/bash"
            args = [
                "-l",
                "-c",
                "export HOME=$(eval echo ~$USER); source /etc/profile.d/00-modulepath.sh ; source /etc/profile.d/modules.sh ; source /etc/profile.d/z01_eltdev.sh ; module try-load private ; module try-load private-$HOSTNAME ;{{ template_command }} {{ template_arguments }}",
            ]
        }
    }
}
}

A simple Python application rtctkDeploymentGen is provided that automates the replacement of the template values with the ones provided as arguments:

$ rtctkDeploymentGen --help
Usage: rtctkDeploymentGen [OPTIONS] COMMAND [ARGS]...

  RTCTK Deployment Daemon Nomad/Consul Job Generator

Options:
  --stdout                 Outputs to stdout instead of file.
  --datacenter TEXT        Contraints running the job to this DATACENTER.
                           [default: dc1]
  --app-kill-timeout TEXT  Nomad job/application kill timeout.  [default: 25s]
  --node TEXT              Contraints running the job to this NODE.
  --user TEXT              Runs the job as this user.
  --help                   Show this message and exit.

Commands:
  job       Generate a nomad job file for running RTC Components.
  services  Generate a nomad job file that provides the RTCTK Service...


$ rtctkDeploymentGen job --help
Usage: rtctkDeploymentGen job [OPTIONS] COMPONENT_NAME COMMAND [ARGUMENTS]...

  Generate a nomad job file for running RTC Components.

  This program requires three strings as input:

  - COMPONENT_NAME: RTC Components name, the identifier of the component..

  - COMMAND: Executable application to run as part of this job.

  - ARGUMENTS: A string that contains quoted arguments to pass to the COMMAND.
  If you need to use arguments with  '-' character, please use '--' before
  that   argument. Example:

    rtctkDeploymentGen job rtc_sup rtctkRtcSupervisor -- -i rtc_sup --sde
    consul://127.0.0.1:8500

  Exit Codes: * 11: --cid and COMPONET_NAME values are not the same.

Options:
  --no-services         Creates a nomad job that provides no services
                        [default: True]

  --deployment-daemon   Creates a nomad job for Deployment Daemon  [default:
                        False]

  --help                Show this message and exit.

An example use for a telemetry-based data task called data_task_1 might be:

$ rtctkDeploymentGen job data_task_1 rtctkExampleDataTaskTelemetry -- data_task_1

Note, the tool requires the use of -- to pass options to the command to be executed. Remember to include any rtctkDeploymentGen specific options before the --, otherwise they will be considered part of the component’s options.

The above command creates a Nomad job file called data_task_1.nomad in the working directory containing:

job data_task_1 {
#Specify Nomad datacenters to run job on
datacenters = ["dc1"]

type = "batch"

group "data_task_1_group" {
    restart{
        attempts = 0
        mode = "fail"
    }
    reschedule {
        attempts  = 0
        unlimited = false
    }
    ephemeral_disk{
        size = 150
    }
    network {
        port "req_rep" {}
        port "pub_sub" {}
    }

    service {
        name = "data-task-1"
        port = "req_rep"
        task = "data_task_1_task"
        meta {
            rtc_component_name = "data_task_1"
            endpoint_type = "req_rep_endpoint"
            endpoint_uri = "zpb.rr://${NOMAD_IP_req_rep}:${NOMAD_PORT_req_rep}/"
        }
        check {
            name      = "data-task-1-GetState"
            type      = "script"
            interval  = "20s"
            timeout   = "2s"
            command   = "/bin/bash"
            args      = [
                "-l",
                "-c",

                "msgsend --uri zpb.rr://${NOMAD_IP_req_rep}:${NOMAD_PORT_req_rep}/StdCmds ::stdif::StdCmds::GetState"
            ]
        }
    }
    service {
        name = "data-task-1"
        port = "pub_sub"
        task = "data_task_1_task"
        meta {
            rtc_component_name = "data_task_1"
            endpoint_type = "pub_sub_endpoint"
            endpoint_uri = "zpb.ps://${NOMAD_IP_pub_sub}:${NOMAD_PORT_pub_sub}/"
        }
    }

    task "data_task_1_task" {
        kill_timeout = "25s"
        resources {
            cpu = 20
            memory = 10
        }
        driver = "raw_exec"
        config {
            # in this way we get the environament variables
            command = "/bin/bash"
            args = [
                "-l",
                "-c",
                "rtctkExampleDataTaskTelemetry data_task_1"
            ]
        }
    }
}
}

After ensuring that the Nomad agent is running and the file contents appear correct, the resulting job file can be started and checked with the following commands:

$ nomad job run data_task_1.nomad

$ nomad job status

ID           Type   Priority  Status   Submit Date
data_task_1  batch  50        running  2022-06-03T09:18:20Z

Note

Be aware that Nomad deploys components using the username (and its configuration e.g. environment variables) that has been used to start the Nomad agent service. Usually eltdev user is used.

RTC Components can also be started directly from the command line without using Nomad, if component endpoints are registered manually to Consul. This can be useful for debugging purposes and during development. The above Nomad example would be the equivalent of executing COMMAND with argument ARGS as the user who started the Nomad Agent, i.e.

$ rtctkExampleDataTaskTelemetry data_task_1

Nomad agent

Nomad Agents can be started as systemd service or manually, in both cases the configuration (https://developer.hashicorp.com/nomad/docs/v1.6.x/configuration) file needs to be provided. A very simple configuration file for running Nomad agent could look like:

client {
    meta {
        "node" = "hrtc-gw,srtc1"
    }
}

For running RTC components, it is important that the configuration defines the node entry in the meta stanza. This is used by Nomad to determine if a job should be run on the current node. The job file needs to specify the same node value in its Component’s Deployment Configuration. In the above example the Nomad agent config, the node can deploy all components that contain either hrtc-gw or srtc1 in their deployment configuration files. In the following example of a components’ deployment configuration, the comp_1, comp_2 and comp_3 (but not comp_4) components would be deployed on the machine (using the Nomad agent configuration from above):

...

comp_1:
  node: !cfg.type:string srtc1
  executable: !cfg.type:string rtctkExampleComponent1
comp_2:
  node: !cfg.type:string srtc1
  executable: !cfg.type:string rtctkExampleComponent2
comp_3:
  node: !cfg.type:string hrtc-gw
  executable: !cfg.type:string rtctkExampleComponent3
comp_3:
  node: !cfg.type:string srtc2
  executable: !cfg.type:string rtctkExampleComponent4

For example, the Nomad agent can be run as:

nomad agent -dev -consul-address 127.0.0.1:8500  -config simple-cfg.hcl

If you want to extend this Nomad usage to support a multi node cluster it is necessary to run multiple Nomad agents with different configuration for each machine (in particular with different node values). An example of multi node Nomad usage can be found in: Distributed Scenario.

Note

Files and other deployment details produced during the deployment can be found in the so called Nomad filesystem (which can be set with the configuration option data_dir) please refer to the Nomad documentation for more details: https://developer.hashicorp.com/nomad/docs/v1.4.x/concepts/filesystem.

Nomad provides a web interface running on port 4646 (e.g. http://<nomad_host_address>:4646) with information and details about the deployment, status of jobs etc. A web browser or other tools can be used to access this information.

Consul usage

Consul is used as a mechanism for Service Discovery. In order to use Consul it is necessary to use Nomad as a deployment mechanism.

Each RTC component that is started requires two individual endpoints for communication:

  • The RTC Component’s Request-Reply endpoint

  • The RTC Component’s Publish-Subscribe endpoint

Endpoint information is provided by the Nomad job files of the respective component. Have a look at the example Nomad section, it includes these two service definitions.

In addition, all RTC components require three common endpoints for access to shared services:

  • OLDB endpoint

  • Persistent Repository endpoint

  • Runtime Repository endpoint

These service endpoints are needed before any RTC component is started.

The provided rtctkDeploymentGen tool can also be used to generate a Nomad job file for the common services:

$ rtctkDeploymentGen services --help
Usage: rtctkDeploymentGen services [OPTIONS] PTR RTR OLDB

  Generate a nomad job file that provides the RTCTK Service Discovery basic
  entries.

  This program requires three URIs as input:

  - PTR: Location of the Persistent Repository as an URI.

  - RTR: Location of the Runtime Repository as an URI.

  - OLDB: Location of the OLDB as an URI.

Options:
  --as-consul-service  Outputs instead a consul (.hcl) file with the RTCTK
                       common service definition
  --help               Show this message and exit.

Here an example how to generate and execute such a common service job:

$ rtctkDeploymentGen services "cii.config://local//persistent_repo" "cii.oldb:/ex_end_to_end/rtr" "cii.oldb:/ex_end_to_end/oldb"
$ nomad job run rtc_discovery_service.nomad

This will generate a Nomad job file: rtc_discovery_service.nomad that provides a single Consul Service entry for all three services. For the time being, it just registers the services to Consul. There is no process associated, just a sleep cycle. In the future, the services themselves will very likely be deployed and run using Nomad. As long as this Nomad job is running, the Consul Service entry remains available.

Note that the service endpoints are stored as meta key-values of a Consul Service entry. These entries rely on Nomad to assign a dynamic port, which is automatically filling in the service’s meta stanza by Nomad.

Also note that the Request-Reply service has a msgsend-based check. Consul will automatically check the return value of the command and mark as failed any Request-Reply service that does not pass the check.

The services entries – though generated by Nomad – are stored also by Consul. The meta stanza is used to store these values.

Consul agent

Similarly yo Nomad, Consul needs to run one or more agents. They can be run as systemd service(s), or standalone process(es). A configuration (https://developer.hashicorp.com/consul/docs/v1.14.x/agent/config) can be provided as a file or as command-line options.

Here is an example of how to run a simple Consul agent providing the configuration as command-line options:

consul  agent  -dev  -serf-lan-port  8311  -serf-wan-port  -1  -server-port  8310  -http-port 8500  -dns-port  8610  -grpc-port  8512

Consul information could be retrieved, for example, using a web browser (e.g. http://<consul_host_address>:8500).