perfc 0.11.0
Loading...
Searching...
No Matches
perfc

perfc is a C++17 library providing lightweight performance counters.

Warning
Library API is unstable and is subject to change.

Build, Install and Use

Use waf to build run unit tests and install

waf configure build test install --mode=release

To use in a wtools project import pkg-config dependency perfc, e.g.:

from wtools.project import declare_project
def configure(cnf):
cnf.check_cfg(package="perfc", uselib_store="PERFC", args="--cflags --libs")
declare_project(
"my-project",
"0.1.0",
recurse="lib",
requires="cxx",
cxx=dict(cxx_std="c++17"),
)

Trivial use case (see also perfcDemo in roadrunner/extras):

#include <iostream>
int main() {
perfc::CounterDouble counter1; // Predefined double alias
perfc::CounterU64 counter2; // Predefined uint64_t alias
counter1.Store(128.0);
counter2.Store(128);
// Integral counters without timestamp have additional methods:
counter2.FetchAdd(128);
counter2 -= 128;
counter++;
std::cout << "Counter 1 value: " << counter1.Load() <<
"Counter 2 value: " << counter2.Load() << "\n";
}
Performance counter implementation.
Counter< double, void > CounterDouble
Definition counter.hpp:498
Counter< std::uint64_t, void > CounterU64
Definition counter.hpp:499

Motivation

Many performance sensitive applications require metrics with such a low overhead that they can safely be included in release builds and act as a source of application telemetry. The perfc project provides building blocks to do this using atomic performance counters.

The typical scenario is that performance counters are updated by performance sensitive threads and sampled by performance insensitive thread.

Library Components

The core of the library is the perfc::Counter type (see also section Counters below for examples and group Counters for details) which represents a thread safe atomic (possibly lock free) value with or without timestamp.

A common scenario that is foreseen is that an application will have potentially many performance counters that needs to be sampled at intervals and e.g. published as application telemetry. To facilitate this perfc provides perfc::Register (see also section Register below and group Counter Register) which allows registration of counters together with metadata and enumeration of registered counters.

Counters

Note
Use #include <perfc/counter.hpp>

perfc provide the templated counter type perfc::Counter<T,Clock> which represent either a

  • timestamped counter value when template parameter Clock is a TrivialClock or
  • counter value when template parameter Clock is void.

A small example where a timestamped counter is written to from a worker thread and read from by a monitoring thread:

/**
* Performs some worka and produce value for performance counter.
*/
void WorkerThread(perfc::Counter<double>& cntr) {
while (!done) {
// Do work...
double value = SomeWork();
// As we are only providing the value Counter will query the current time from the specifed
// clock type and update counter with both.
cntr.Store(value); } }
/**
* Monitors and logs value of a counter until asked to stop.
*/
void MonitoringThread(perfc::Counter<double> const& cntr) {
while (!done) {
// Load value
perfc::Timestamped<double> value = cntr.Load();
// You can also use structured bindings:
auto [value, timestamp] = cntr.Load();
// Note: Formatting time_point requires C++20.
std::cout << "Time: " << timestamp << ", value=" << value << "\n";
std::this_thread::sleep_for(2s);
}
}
Trivial type describing a counter value and associated timestamp.
Definition counter.hpp:112

All templates provide the following basic operations showed above:

Storing

perfc::Counter<T>::Store(ValueType, MemoryOrder) -> void

// Timestamped counter
// Defaults to perfc::MemoryOrder::Release
ts_counter.Store(Timestamped{12.34, std::chrono::steady_clock::now()});
ts_counter.Store({12.34, std::chrono::steady_clock::now()}, perfc::MemoryOrder::Relaxed);
// Non-timestamped counter
counter.Store(12.34, perfc::MemoryOrder::Relaxed);

Loading

perfc::Counter<T>::Load(MemoryOrder) -> ValueType

// Timestamped counter
Timestamped value = ts_counter.Load(); // Defaults to perfc::MemoryOrder::Acquire
Timestamped value = ts_counter.Load(perfc::MemoryOrder::Relaxed);
// Non-timestamped counter
double value = counter.Load();
double value = counter.Load(perfc::MemoryOrder::Relaxed);

Where ValueType is either simply T if Clock=void or perfc::Timestamped<T,TimePoint>.

If Clock is a TrivialClock the partial specialization perfc::Counter<T,Clock> provides the additional operation where the current time is automatically sampled from the clock:

perfc::Counter<T,Clock>::Store(T, MemoryOrder) -> void

// Timestamped counter
ts_counter.Store(12.34, perfc::MemoryOrder::Relaxed);

Aliases

There are aliases defined for common counter types:

Alias Type
Counters without timestamp
perfc::CounterDouble perfc::Counter<double, void>
perfc::CounterU64 perfc::Counter<std::uint64_t, void>
perfc::CounterI64 perfc::Counter<std::int64_t, void>
Counters with timestamp
perfc::CounterDoubleTs perfc::Counter<double, std::chrono::steady_clock>
perfc::CounterU64Ts perfc::Counter<std::uint64_t, std::chrono::steady_clock>
perfc::CounterI64Ts perfc::Counter<std::int64_t, std::chrono::steady_clock>

Register

See group Counter Register and perfc::Register for interface details.

perfc::CounterUint64>;
using Id = std::string;
int main() {
perfc::CounterDouble double_counter;
auto reg1 = register.Add(&double_counter, "counter:1");
assert(register.Lock()->size() == 1);
{
perfc::CounterU64 u64_counter;
auto reg2 = register.Add(&u64_counter, "counter:2");
assert(register.Lock()->size() == 2);
}
assert(register.Lock()->size() == 1);
}
Register of counter variants, used for example to facilitate discovery/monitoring.
Definition register.hpp:257
Implementation of the counter register perfc::Register.
Helper used when declaring counter types in perfc::Register.
Definition register.hpp:46

Performance Considerations

A lock-free counter type has very little performance impact. On x86-6 a relaxed counter store results in a mov instruction to main memory, however with internal timestamp there's additional cost to query clock and likely non-lock free counter load/store operations. perfc project contains benchmarks (see section Benchmarks) that can be executed to observe actual results.

Recommendations

For absolute highest performance lock-free counters should be used. In addition, counters updated at the same time (e.g. at the end of a loop) should be laid out in contiguous memory with the idea that multiple counters fit on the same cache line will lead to higher throughput. This has been observed in the benchmarks. For example, without contention on a Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz with benchmark ReleaseAcquireBatchedContention (tip: repeat test yourself with command perfBench --benchmark_filter=CounterFixture/ReleaseAcquireBatchedContention/.*threads:1/*):

Number of counters Mean Store Rate
1 680.183M/s
8 1.37381G/s
16 1.45461G/s
1 with timestamp 15.6979M/s (1)

(1): To give something to compare with this result use non-lock-free operations and queries clock for every store operation.

Benchmarks

Note
Benchmarks for perfc are provided with the roadrunner/extras project.

The test names have the following pattern:

<Fixture>/<Test Name>/<Counter#>/.../threads:<Thread#>

or

<Benchmark>_<Stats>

when reporting statistics of multiple runs of the run for <Benchmark>.

<Fixture> Notes
CounterFixture 64bit counters without timestamp (lock-free on x86-64).
CounterSteadyClockFixture 64bit counters with timestamp.
<Test Name> Notes
ReleaseAcquireContention

Explores performance impact of worst-case counter access contention. The first thread continuously performs stores whereas subsequent threads continuously performs reads.

Results from benchmark with 1 thread serves as the baseline with no contention. Results with two (or more) threads show impact of cache coherency.

ReleaseAcquireBatchedContention Similar to ReleaseAcquireContention test but performs relaxed operations in batches. At the end of each batch the operation is synchronized with barrier perfc::CounterRelease and perfc::CounterAcquire.
<Stats> Notes
mean Mean
median Median
stddev Standard deviation

Examples:

  • CounterFixture/ReleaseAcquireContention/1/repeats:5/real_time/threads:1

    Use fixture CounterFixture, test ReleaseAcquireContention, each iteration of the test operates on 1 counter and benchmark is executed with 1 thread (i.e. no contention).

  • CounterSteadyClockFixture/ReleaseAcquireContention/8/repeats:5/real_time/threads:2

    Use fixture CounterSteadyClockFixture, test ReleaseAcquireBatchedContention, each iteration of the test operate on 8 counters and two threads are competing to access the same counters.

  • CounterSteadyClockFixture/ReleaseAcquireBatchedContention/8/repeats:5/real_time/threads:2_stddev

    Provide statistics of the (repeats:5) separate benchmark runs.

The result of each test has the form:

--------------------------------------------------------------------
Benchmark
Time CPU Iterations UserCounters...
--------------------------------------------------------------------
CounterFixture/ReleaseAcquireContention/1/repeats:5/real_time/threads:1
1.47 ns 1.47 ns 475951114 Rate=680.187M/s
CounterSteadyClockFixture/ReleaseAcquireContention/1/repeats:5/real_time/threads:2
39.5 ns 78.8 ns 19774556 Rate=12.6636M/s

Where each result of "Time", "CPU" and "Rate" is the average result of each iteration. The user counter "Rate" represents a sort of normalized throughput under each case so different configurations and benchmarks can be compared. More precisely it is the average per-thread load/store operation rate on all counters.

Concepts

AtomicCounter

AtomicCounter specify requirements that is met by all perfc::Counter types.

For a type C:

  • C satisfy DefaultConstructible
  • C::ValueType satisfy TriviallyCopyable, CopyConstructible and CopyAssignable.
  • C::ClockType satisfy TrivialClock or is void
  • C::TimePointType is C::ClockType::time_point or void if C::ClockType is void.
  • C::CounterType is std::atomic<C::ValueType>.
  • C::IS_ALWAYS_LOCK_FREE is true if operations are always lock free, false otherwise.

Given ...

  • c lvalue type C
  • t lvalue type C::ValueType
  • o expression of type perfc::MemoryOrder
Expression Return Value Semantics
c.Store(t) (1) void Stores value atomically using (1) a default memory order or (2) the specified memory order.
c.Store(t, o) (2) void
c.Load() (1) C::ValueType Load value atomically using (1) a default memory order or (2) the specified memory order.
c.Load(o) (2) C::ValueType
c.IsLockFree() bool Queries whether all operations are lock free (returning true) or not (returning false).

Contracts

Thread Safety

Terminology

Note
The classification applies to the class, its member functions and other non-member functions, not their arguments.

Thread-safe - means that member or non-member functions are safe to call in parallel.

Thread-compatible - applies to classes as a whole and means that no external synchronization is required for parallel accesses to separate class instances or for const-only accesses on the same instance. Mixed const/non-const access must be externally synchronized.

Thread-hostile - means that parallel access is never safe. This may occur if e.g. a free function (or const member function) accesses a global shared state in a non-const fashion.

This table summarize operations that are safe in parallel for the different classifications:

Parallel operation Thread-safe Thread-compatible Thread-hostile
Const-only x x
Non-const w/ separate instances x x
Non-const w/ same instance x

See also related terminology: