Consul and Nomad usage

Nomad and Consul has been selected as the mechanism by which processes are launched/deployed and monitored

Note

This documentation covers just RTC toolkit relevant aspects of Nomad and Consul for more details please refer to general Nomad and Consul documentation.

Nomad usage

Nomad is used to deploy processes e.g RTC components in the RTC system.

Nomad job files

Nomad uses a job description file, which describes how to start the processes. Since RTC Toolkit components use a common command line format, some automatic generation is possible for creating the component’s Nomad job files. The details of the job file might be useful for advanced users only.

The template job file used jinja2 for replacing the appropriate entries and looks like the following:

job {{ template_component_name }} {
#Specify Nomad datacenters to run job on
datacenters = ["{{ template_datacenter }}"]
type = "batch"
{% if template_node %}
constraint {
    attribute = "${meta.node}"
    operator  = "set_contains"
    value = "{{ template_node }}"
}
{% endif %}

group "{{ template_component_name }}_group" {
    restart{
        attempts = 0
        mode = "fail"
    }
    reschedule {
        attempts  = 0
        unlimited = false
    }
    ephemeral_disk{
        size = 150
    }
    network {
        port "req_rep" {}
        port "pub_sub" {}
    }
    {% if template_no_services %}
    service {
        name = "{{ template_component_name_dashes }}"
        port = "req_rep"
        task = "{{ template_component_name }}_task"
        meta {
            rtc_component_name = "{{ template_component_name }}"
            endpoint_type = "req_rep_endpoint"
            endpoint_uri = "zpb.rr://${NOMAD_IP_req_rep}:${NOMAD_PORT_req_rep}/"
        }
        check {
            name      = "{{ template_component_name_dashes }}-GetState"
            type      = "script"
            interval  = "20s"
            timeout   = "2s"
            command   = "/bin/bash"
            args      = [
                "-l",
                "-c",
                "msgsend --uri zpb.rr://${NOMAD_IP_req_rep}:${NOMAD_PORT_req_rep}/StdCmds ::stdif::StdCmds::GetState"
            ]
        }
    }
    service {
        name = "{{ template_component_name_dashes }}"
        port = "pub_sub"
        task = "{{ template_component_name }}_task"
        meta {
            rtc_component_name = "{{ template_component_name }}"
            endpoint_type = "pub_sub_endpoint"
            endpoint_uri = "zpb.ps://${NOMAD_IP_pub_sub}:${NOMAD_PORT_pub_sub}/"
        }
    }
    {% endif %}
    task "{{ template_component_name }}_task" {
        resources {
            cpu = 20
            memory = 10
        }
        driver = "raw_exec"
        config {
            # in this way we get the environament variable
            command = "/bin/bash"
            args = [
                "-l",
                "-c",
                "{{ template_command }} {{ template_arguments }}"
            ]
        }
    }
}
}

A simple python application is provided that automates the replacement of the templated values with the ones provided as arguments.

$ rtctkDeploymentGen --help
Usage: rtctkDeploymentGen [OPTIONS] COMMAND [ARGS]...

  RTCTK Deployment Daemon Nomad/Consul Job Generator

Options:
  --stdout           Outputs to stdout instead of file.  [default: False]
  --datacenter TEXT  Contraints running the job to this DATACENTER.
  --node TEXT        Contraints running the job to this NODE.
  --help             Show this message and exit.

Commands:
  job       Generate a nomad job file for running RTC Components.
  services  Generate a nomad job file that provides the RTCTK Service...


$ rtctkDeploymentGen job --help
Usage: rtctkDeploymentGen job [OPTIONS] COMPONENT_NAME COMMAND [ARGUMENTS]...

  Generate a nomad job file for running RTC Components.

  This program requires three strings as input:

  - COMPONENT_NAME: RTC Components name, the identifier of the component..

  - COMMAND: Executable application to run as part of this job.

  - ARGUMENTS: A string that contains quoted arguments to pass to the
  COMMAND.   If you need to use arguments with  '-' character, please use
  '--' before that   argument. Example:

    rtctkDeploymentGen job rtc_sup rtctkRtcSupervisor -- -i rtc_sup --sde
    consul://127.0.0.1:8500

  Exit Codes: * 11: --cid and COMPONET_NAME values are not the same.

Options:
  --no-services  Creates a nomad job that provides no services  [default:
                 True]

  --help         Show this message and exit.

An example use for a telemetry-based data task called data_task_1 might be:

$ rtctkDeploymentGen job data_task_1 rtctkExampleDataTaskTelemetry -- -i data_task_1 -s file:/path/to/service_disc.yaml

As you can see from the example, the tool requires the use of “–” to pass options to the command to be executed. Please remember to include any rtctkDeployGen options before the use of “–”, otherwise they will be considered part of the component’s options.

The above command would create a Nomad job file called data_task_1.nomad in the working directory containing:

job data_task_1 {
#Specify Nomad datacenters to run job on
datacenters = ["dc1"]
type = "batch"


group "data_task_1_group" {
    restart{
        attempts = 0
        mode = "fail"
    }
    reschedule {
        attempts  = 0
        unlimited = false
    }
    ephemeral_disk{
        size = 150
    }
    network {
        port "req_rep" {}
        port "pub_sub" {}
    }

    service {
        name = "data-task-1"
        port = "req_rep"
        task = "data_task_1_task"
        meta {
            rtc_component_name = "data_task_1"
            endpoint_type = "req_rep_endpoint"
            endpoint_uri = "zpb.rr://${NOMAD_IP_req_rep}:${NOMAD_PORT_req_rep}/"
        }
        check {
            name      = "data-task-1-GetState"
            type      = "script"
            interval  = "20s"
            timeout   = "2s"
            command   = "/bin/bash"
            args      = [
                "-l",
                "-c",
                "msgsend --uri zpb.rr://${NOMAD_IP_req_rep}:${NOMAD_PORT_req_rep}/StdCmds ::stdif::StdCmds::GetState"
            ]
        }
    }
    service {
        name = "data-task-1"
        port = "pub_sub"
        task = "data_task_1_task"
        meta {
            rtc_component_name = "data_task_1"
            endpoint_type = "pub_sub_endpoint"
            endpoint_uri = "zpb.ps://${NOMAD_IP_pub_sub}:${NOMAD_PORT_pub_sub}/"
        }
    }

    task "data_task_1_task" {
        resources {
            cpu = 20
            memory = 10
        }
        driver = "raw_exec"
        config {
            # in this way we get the environament variable
            command = "/bin/bash"
            args = [
                "-l",
                "-c",
                "rtctkExampleDataTaskTelemetry -i data_task_1 -s file:/path/to/service_disc.yaml"
            ]
        }
    }
}
}

After ensuring that the Nomad agent is running and the file contents appear correct the resulting job file can be started and checked with the following commands:

$ nomad job run data_task_1.nomad

$ nomad job status

ID           Type   Priority  Status   Submit Date
data_task_1  batch  50        running  2022-06-03T09:18:20Z

Note

Be aware that Nomad starts/deploys components using the username (and its configuration e.g. environment variables) that has been used to start the Nomad agent service. Usually eltdev user is used.

RTC Tk components can be still stared directly from the command line without using Nomad. This can be useful for debugging purposes and during development. The above Nomad example would be the equivalent of executing COMMAND with argument ARGS as the user who started the Nomad Agent, i.e.

$ rtctkExampleDataTaskTelemetry -i data_task_1 -s file:/path/to/service_disc.yaml

Nomad agent

Nomad agents could be started as systemd service or manually in both cases the configuration (https://developer.hashicorp.com/nomad/docs/v1.4.x/configuration) file needs to be provided. A very simple configuration file for running Nomad agent could look like:

client {
    meta {
        "node" = "hrtc-gw,srtc1"
    }
}

For running RTC components it is important that the configuration defines the node key-value pair value in the meta stanza. This value is used by Nomad to determine if a job should be run on the current node. The job file needs to specify the same node value in its Component’s Deployment Configuration. In the above example the Nomad agent config, the node can deploy all components that contain either hrtc-gw or srtc1 in their deployment configuration files. In the following example of a components’ deployment configuration, the comp_1, comp_2 and comp_3 (but not comp_4) components would be deployed on the machine (using the Nomad agent configuration from above):

...

comp_1:
  node: !cfg.type:string srtc1
  executable: !cfg.type:string rtctkExampleComponent1
comp_2:
  node: !cfg.type:string srtc1
  executable: !cfg.type:string rtctkExampleComponent2
comp_3:
  node: !cfg.type:string hrtc-gw
  executable: !cfg.type:string rtctkExampleComponent3
comp_3:
  node: !cfg.type:string srtc2
  executable: !cfg.type:string rtctkExampleComponent4

For example, the Nomad agent can be run as:

nomad agent -dev -consul-address 127.0.0.1:8500  -config simple-cfg.hcl

If you want to extend this Nomad usage to support a multi node cluster it is necessary to run multiple Nomad agents with different configuration for each machine (in particular with different node values). An example of multi node Nomad usage can be found in: Distributed Scenario.

Note

Files and other deployment details produced during the deployment can be found in the so called Nomad file system (which can be set with the configuration option data_dir) please refer for details the Nomad documentation: https://developer.hashicorp.com/nomad/docs/v1.4.x/concepts/filesystem.

Nomad provides a web interface running on port 4646 (e.g. http://<nomad_host_address>:4646) with information and details about the deployment, status of jobs etc. A web browser or other tools can be used to access this information.

Consul usage

Consul is used as backend for the Service Discovery. In order to use Consul it is necessary to use Nomad as deployment mechanism.

An RTC component that is started generally requires two endpoints for communication:

  • The RTC Component’s Request-Reply endpoint

  • The RTC Component’s Publish-Subscription endpoint

and three service endpoints:

  • The Deployment’s common OLDB path

  • The Deployment’s common Persistent Repository endpoint

  • The Deployment’s common Runtime Repository endpoint

Using Consul, the endpoints are stored as meta key-values of Consul Service entry. The three service endpoints that belong to the services deployment are needed before any RTC component is started.

Please use the provided rtctkDeploymentGen tool to generate the services common deployment Nomad job file:

$ rtctkDeploymentGen --help
Usage: rtctkDeploymentGen [OPTIONS] COMMAND [ARGS]...

RTCTK Deployment Daemon Nomad/Consul Job Generator

Options:
--stdout           Outputs to stdout instead of file.  [default: False]
--datacenter TEXT  Contraints running the job to this DATACENTER.
--node TEXT        Contraints running the job to this NODE.
--help             Show this message and exit.

Commands:
job       Generate a nomad job file for running RTC Components.
services  Generate a nomad job file that provides the RTCTK Service...


$ rtctkDeploymentGen services --help
Usage: rtctkDeploymentGen services [OPTIONS] PTR RTR OLDB

Generate a nomad job file that provides the RTCTK Service Discovery basic
entries.

This program requires three URIs as input:

- PTR: Location of the Persistent Repository as an URI.

- RTR: Location of the Runtime Repository as an URI.

- OLDB: Location of the OLDB as an URI.

Options:
--as-consul-service  Outputs instead a consul (.hcl) file with the RTCTK
                    common service definition

--help               Show this message and exit.

This will generate a Nomad job file: rtc_discovery_service.nomad that provides a single Consul Service entry for all services. For the time being it acts just a way to register services with Consul. There is no process associated, just a sleep cycle. So it is important that all three services are running before. In the future the services will very likely actually be deployed and run using Nomad. As long as this Nomad job is running, the Consul Service entry is available.

You can start such a generated job file as any other Nomad job definition. Here is ane example how to generate and execute one:

$ rtctkDeploymentGen services "cii.config://local//persistent_repo" "cii.oldb:///ex_end_to_end/rtr" "cii.oldb:///ex_end_to_end/oldb"
$ nomad job run rtc_discovery_service.nomad

The remaining two entries for the RTC Component’s Request-Reply and Publish-Subscribe endpoints are provided by their Nomad job files.

From the example in the Nomad section, the user may see that it includes two service definitions:

service {
    name = "{{ template_component_name_dashes }}"
    port = "req_rep"
    task = "{{ template_component_name }}_task"
    meta {
        rtc_component_name = "{{ template_component_name }}"
        endpoint_type = "req_rep_endpoint"
        endpoint_uri = "zpb.rr://${NOMAD_IP_req_rep}:${NOMAD_PORT_req_rep}/"
    }
    check {
        name      = "{{ template_component_name_dashes }}-GetState"
        type      = "script"
        interval  = "20s"
        timeout   = "2s"
        command   = "/bin/bash"
        args      = [
            "-l",
            "-c",
            "msgsend --uri zpb.rr://${NOMAD_IP_req_rep}:${NOMAD_PORT_req_rep}/StdCmds ::stdif::StdCmds::GetState"
        ]
    }
}
service {
    name = "{{ template_component_name_dashes }}"
    port = "pub_sub"
    task = "{{ template_component_name }}_task"
    meta {
        rtc_component_name = "{{ template_component_name }}"
        endpoint_type = "pub_sub_endpoint"
        endpoint_uri = "zpb.ps://${NOMAD_IP_pub_sub}:${NOMAD_PORT_pub_sub}/"
    }
}

These entries rely on Nomad to assign a dynamic port, which is automatically filling in the service’s meta stanza by Nomad. The Request-Reply service has a msgsend-based check. Consul will automatically check that return value of the command and mark as failed any Request-Reply service that does not pass the check.

The services entries – though generated by Nomad – are stored also by Consul. The meta stanza is used to stored these values.

Consul agent

Similar as for the Nomad Consul needs to run on or more agents. They can be run as systemd service(s), or standalone process(es). A configuration (https://developer.hashicorp.com/consul/docs/v1.14.x/agent/config) can be provided in a file or as command line options.

Here is ane sample to run a simple Consul agent providing configuration as command line options:

consul  agent  -dev  -serf-lan-port  8311  -serf-wan-port  -1  -server-port  8310  -http-port 8500  -dns-port  8610  -grpc-port  8512

Consul information could be retrieved for example using a web browser (e.g. http://<consul_host_address>:8500).