NUMA++ 0.11.0
Loading...
Searching...
No Matches
NUMA++

NUMA++ provides a more convenient C++ API to control:

  • memory policy and other memory related functions (mainly provided by libnuma)
  • CPU affinity (libnuma)
  • Kernel scheduler (glibc/pthread)

Policies are described with classes and are applied using the free functions numapp::Apply() where overloads are provided for each policy. To indicate that a policy is applied to the current thread Apply() has overloads in numapp::thisThread as well: numapp::thisThread::Apply().

Warning
Before using other library functions numapp::NumaAvailable() must be called and confirmed to return true.
Note
Some policies can only be set on the current thread and only provides overload in numapp::thisThread.
Warning
Efforts are made to not make breaking changes but this library is still under development and API may change.

Permissions

Certain operations are subject to permission checks by the Kernel. Often there are resource limits below which no additional permissions are required (getrlimit(2)), exceeding those limits typically requires the appropriate capabilities(7) or to run as root/have SUID bit set.

Capabilities

Notable capabilities:

See capabilities(7) for details.

Resource Limits

Notable limits:

  • RLIMIT_AS limits maximum virtual address space.
  • RLIMIT_CPU limits amount of CPU time a process can consume.
  • RLIMIT_MEMLOCK limits amount of memory that can be locked.
  • RLIMIT_NICE specifies ceiling for nice level.
  • RLIMIT_RTPRIO specifies ceiling of real-time scheduler priority.
  • RLIMIT_RTTIME effectively limits how long a real-time process may run without yielding to kernel.

See getrlimit(2) for details.

Dependencies

Library dependencies:

  • pthread
  • numa (a.k.a. libnuma or numactl)

General Utilities

Before use the availability of NUMA on the host system should be ensured with numapp::NumaAvailable().

NUMA++ provides a utility to create threads with NUMA policies in <numapp/thread.hpp> with numapp::MakeThread().

An example of creating a thread that only runs on CPUs 0-3:

#include <chrono>
#include <iostream>
#include <numapp/numa.hpp>
void ThreadFunc(std::chrono::milliseconds duration) {
std::cout << "Thread affinity: " << numapp::CpuAffinity::MakeFromActive() << std::endl;
std::this_thread::sleep_for(duration);
}
int main() {
using namespace numapp;
using namespace std::chrono_literals;
if (!NumaAvailable()) {
std::cout << "NUMA not available";
return -1;
}
NumaPolicies policies;
auto thread = MakeThread("myThread", policies, &ThreadFunc, 500ms);
thread.join();
}
static CpuAffinity MakeFromCpuStringAll(char const *cpustring)
Create CpuAffinity from `cpustring` without considering current cpuset.
static CpuAffinity MakeFromActive()
Create current affinity settings.
Combines the the available NUMA policy types in one object.
void SetCpuAffinity(std::optional< CpuAffinity > affinity) noexcept
Set CPU affinity.
bool NumaAvailable() noexcept
Query whether system has NUMA support.
Definition numa.hpp:34
std::thread MakeThread(std::string_view thread_name, NumaPolicies const &policies, Func &&func, Args &&... args)
Primary overload accepting string-view for thread_name.
Definition thread.hpp:144
Contains declarations for numapp thread utilities.

Memory APIs

Attention
If you take the step of managing memory with NUMA in mind, it is important to understand how the primitives provided by NUMA++ works and the context they are used. It may be helpful to think of NUMA++ as block allocators with the minimum block size being the system page size for normal allocations and the specified huge page size for huge page allocations. It is not recommended to use this as a general purpose allocator. As an example, foonathan-memory has a compatible concept BlockAllocator.

Manual Allocation

Allocate pages of memory with specified memory policy (see function group).

To e.g. allocate one page from NUMA node 1

// Creates a bind policy for NUMA node 1 and allocates with that policy.
std::size_t size = numapp:GetPageSize();
/* ... */
numapp::Free(buffer, size);
static MemPolicy MakeBindNode(int node)
Creates a strict policy (using MPOL_BIND) to allocate all memory to the specified node.
std::size_t GetPageSize() noexcept
Fast query of system page size.
Definition memory.cpp:40
void Free(void *ptr, std::size_t size, std::error_code &ec) noexcept
See group for details.
Definition memory.cpp:129
void * Allocate(std::size_t size, MemPolicy const &policy, std::error_code &ec) noexcept
See group for details.
Definition memory.cpp:87

Hardware Queries

Provides the following functions:

Memory Locking

Prevents all or parts of process memory from being paged to the swap area.

This is an example of how to effectively disable swapping completely for the process:

Note
This example will cause all allocated memory to be locked, not only the resident memory which will increase memory pressure and possibly cause out-of-memory situations.
#include <iostream>
int main() {
using numapp::operator|;
std::cout << "NUMA not available";
return -1;
}
// Lock current and future mappings
std::cerr << "Failed to lock memory!" << std::endl;
return -1;
}
}
LockAllFlag
Flags that are combined to modify behaviour of MemLockAll().
Definition memory.hpp:146
std::error_code MemLockAll(LockAllFlag flags) noexcept
Lock all memory pages as specified by provided flags.
Definition memory.cpp:72
@ Current
Lock all pages which are currently mapped into the address space of the process.
Definition memory.hpp:150
@ Future
Lock all future pages that are mapped into the address space of the process.
Definition memory.hpp:155
Contains memory function declarations.

Memory Policy

Controls how physical memory is by default allocated by a thread or the memory placement for an already mapped virtual memory range.

Note
Important to note is that the memory policy is used by the Kernel when allocating physical memory, not when application allocates virtual memory. It is also the policy of the thread that trigger the page fault which will be used, which is not necessarily the thread that allocated the virtual memory.

A memory policy is described using numapp::MemPolicy in <numapp/mempolicy.hpp>. And is applied to the current thread with e.g. the function numapp::thisThread::Apply(numapp::MemPolicy const&) or a memory range with numapp::Apply(void*,std::size_t,numapp::MemPolicy const&,MemPolicyFlag).

To apply a policy temporarily to the current thread, e.g. to have the policy used when mmap'ing some memory, the numapp::ScopedMemPolicy can be used to automatically apply new policy and restore the old, within a scope.

A helper function to apply a memory policy to the current thread stack memory is also provided with numapp::thisThread::ApplyStack().

Memory policies are inherited at fork(), clone() (without CLONE_VM flag) and exec*.

The following example applies a bind policy for NUMA nodes 0-1 to the current thread:

#include <iostream>
#include <numapp/numa.hpp>
int main() {
std::cout << "NUMA not available";
return -1;
}
// Set default policy for main thread
if (auto ec = numapp::thisThread::Apply(policy); ec) {
std::cout << "Failed to apply policy: " << ec.message() << std::endl;
return 1;
}
// And apply policy to stack
if (auto ec = numapp::thisThread::ApplyStack(policy); ec) {
std::cout << "Failed to apply policy: " << ec.message() << std::endl;
return 1;
}
}
Class representing a memory policy that can be modified and used to apply to the current thread or a ...
@ Bind
The Bind mode defines a strict policy that restricts memory allocation to the nodes specified in node...
std::error_code Apply(CpuAffinity const &affinity) noexcept
Apply policy to calling thread.
std::error_code ApplyStack(MemPolicy const &policy, MemPolicyFlag flags=MemPolicyFlag::Move|MemPolicyFlag::Strict) noexcept
Convenience function that applies a memory policy to current thread stack memory.
Contains declarations for numapp::MemPolicy.
Type-safe NUMA node mask.
Definition nodemask.hpp:23
static Nodemask MakeFromNodestring(char const *nodestring)
Construct Nodemask from nodestring that considers current cpuset.
Definition nodemask.hpp:41

The following example applies policy to a memory range, rather than setting the policy to the current thread:

#include <iostream>
void BindMemoryRange(int node, void* address, std::size_t size) {
using numapp::operator|;
using numapp::MemPolicyFlag;
auto policy = MemPolicy::MakeBindNode(node);
auto flags = MemPolicyFlag::Move | MemPolicyFlag::Strict;
if (auto ec = numapp::Apply(address, size, policy, flags); ec) {
std::cerr << "Failed to apply policy: " << ec.message() << std::endl;
throw std::system_error(ec);
}
if (auto ec = numapp::MemLock(address, size, LockFlag::PreFault); ec) {
std::cerr << "Failed to apply policy: " << ec.message() << std::endl;
throw std::system_error(ec);
}
}
std::error_code Apply(pid_t thread, CpuAffinity const &affinity) noexcept
Apply policy to specified thread.
std::error_code MemLock(void const *addr, std::size_t len, LockFlag flag) noexcept
Lock memory pages in the specified address range.
Definition memory.cpp:58
LockFlag
Mutually exclusive flags that modifies behaviour of MemLock().
Definition memory.hpp:119
@ PreFault
Locks pages whether they are resident or not.
Definition memory.hpp:126

Polymorphic Memory Resource

NUMA++ provide implementation of std::pmr::memory_resource with numapp::PageResource that can be used to allocate memory with specified memory policy using the STL allocator std::pmr::polymorphic_allocator.

An example of this is shown below where the pool allocator std::pmr::unsynchronized_pool_resource use memory from numapp::PageResource.

#include <string>
#include <vector>
void Example() {
// Create a pool resource that allocates memory from NUMA node 1 with using
// PageResource.
numapp::MemPolicyFlag::Strict);
// Important Note
// --------------
//
// Pool resource likely requires memory for its implementation, which will
// use the same upstream memory resource `resource`. This is not necessarily
// optimal as the pool implementation may perform multiple allocations, each
// smaller than the page size, which will each take at minimum 1 page. This
// overhead may be discounted by example sharing the pool resource for the
// full application, or per NUMA node.
auto pool = std::pmr::unsynchronized_pool_resource(&resource);
// Now we can use the `pool` with e.g. STL containers that use
// `std::pmr::polymorphic_allocator`
[[maybe_unused]] auto vector = std::pmr::vector<std::uint8_t>(1024, &pool);
[[maybe_unused]] auto string = std::pmr::string(1024, 'a', &pool);
}
Polymorphic memory resource allocating full system pages with specified NUMA policy.
Definition memory.hpp:359
Note
Boost.SmartPtr provides allocator-aware utilities for constructing std::unique_ptr with boost::allocate_unique().

NUMA++ also provide a std::pmr::memory_resource implementation that locks memory allocated from upstream memory resource in numapp::LockResource.

#include <vector>
void Example() {
// Create a pool resource that allocates memory from NUMA node 1 with strict
// policy, using PageResource.
numapp::MemPolicyFlag::Strict);
// Allocate 4 MiB block memory from NUMA node 1, pre-fault it and lock it
// into memory.
[[maybe_unused]] auto vector =
std::pmr::vector<std::uint8_t>(4 * 1024 * 1024, &locked);
}
Lock memory allocated from upstream memory resource using specified LockFlag.
Definition memory.hpp:406

Huge Pages

NUMA++ provides primitives for allocating and freeing huge pages with numapp::AllocateHuge() and numapp::FreeHuge() as well as std::pmr::memory_resource implementation numapp::HugePageResource.

An example usage is shown below:

void Example() {
// Allocate 16MiB from NUMA node 1 in 2MiB pages.
auto const size = 16 * 1024 * 1024;
auto const page_size = numapp::HugePagePreset::Huge2M;
// 1) AllocateHuge will allocate size, rounding up to nearest page_size
// multiple
void* ptr = numapp::AllocateHuge(size,
page_size,
numapp::MemPolicyFlag::Strict);
// 2) It is particularly important to pre-fault huge-pages as the likelihood
// for failure is higher. If page-fault would fail when memory is accessed
// the result is a segmentation fault.
if (auto ec = numapp::MemLock(ptr, size, numapp::LockFlag::PreFault); ec) {
// Page-fault failed; this would have caused a segmentation fault if not
// for `numapp::MemLock()`.
throw std::system_error(ec, "Page fault failed");
}
// 3) FreeHuge will free memory, rounding up to nearest page_size multiple
// as required by mmap.
numapp::FreeHuge(ptr, size, page_size);
}
void FreeHuge(void *ptr, std::size_t size, HugePageSize page_size, std::error_code &ec) noexcept
Free huge pages previously allocated with AllocateHuge.
Definition memory.cpp:205
void * AllocateHuge(std::size_t size, HugePageSize page_size, MemPolicy const &policy, MemPolicyFlag flags, std::error_code &ec) noexcept
Non-throwing version of AllocateHuge()
Definition memory.cpp:165
@ Huge2M
2 MiB page size.
Definition memory.hpp:505

CPU Affinity API

Control where a thread is executed.

The CPU affinity of the current thread can be controlled using numapp::CpuAffinity in <numapp/cpuaffinity.hpp>.

To e.g. set CPU affinity of current thread to run on cores 0-3, disregarding whether they are isolated or not:

#include <iostream>
#include <numapp/numa.hpp>
int main() {
std::cout << "NUMA not available";
return -1;
}
// Set CPU affinity for main thread
if (auto ec = numapp::thisThread::Apply(policy); ec) {
std::cout << "Failed to apply policy: " << ec.message() << std::endl;
return 1;
}
}
Create CPU affinity and apply to current thread.
Type-safe CPU mask.
Definition cpumask.hpp:25
static Cpumask MakeFromCpuStringAll(char const *cpustring)
Construct Cpumask from cpustring that does not consider current cpuset.
Definition cpumask.hpp:44
Contains declarations for CpuAffinity.

Scheduler API

Control how a thread is scheduled for execution by the kernel.

The scheduler and priority of threads can be controlled using the following primitives from <numapp/scheduler.hpp>.

Type Represents
numapp::IdleScheduler SCHED_IDLE.
numapp::DynamicScheduler The dynamic priority schedulers SCHED_OTHER and SCHED_BATCH.
numapp::StaticScheduler The static priority schedulers SCHED_RR and SCHED_FIFO.
numapp::Scheduler Represents any scheduler and internally holds one of the previous.

These APIs are used to create and apply policies, either from currently active policy or explicitly created manually.

To e.g. apply the static FIFO scheduling policy with highest priority:

#include <iostream>
#include <numapp/numa.hpp>
int main() {
std::cout << "NUMA not available";
return -1;
}
// Set CPU affinity for main thread
if (auto ec = numapp::thisThread::Apply(policy); ec) {
std::cout << "Failed to apply policy: " << ec.message() << std::endl;
return 1;
}
}
Static priority scheduler (real-time).
@ Fifo
A first-in, first-out "real-time" policy.
Contains scheduler declarations.

If the type of scheduler is not statically known the sum-type numapp::Scheduler can be used. In this example the current policy is queried:

#include <iostream>
#include <numapp/numa.hpp>
int main() {
using namespace numapp;
if (!NumaAvailable()) {
std::cout << "NUMA not available";
return -1;
}
auto sched = current.GetScheduler<numapp::DynamicScheduler>();
std::cout << "Nice: " << sched.GetNice() << std::endl;
}
// Or get underlying variant
std::variant<DynamicScheduler, StaticScheduler, IdleScheduler> variant = current.Get();
}
Normal non-realtime scheduler that use dynamic priority (nice value).
int GetNice() const noexcept
Get dynamic nice value.
A sum-type of all supported schedulers.
SchedulerVariant const & Get() const noexcept
Get the underlying scheduler.
static Scheduler MakeFromActive()
Return active scheduler policy for this thread.
constexpr T GetScheduler() const
Get the held scheduler.
constexpr bool HoldsScheduler() const noexcept
Query the held scheduler.