Skip to content

Add stress testing framework, with basic metrics example to demonstrate. #3241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

lalitb
Copy link
Member

@lalitb lalitb commented Jan 10, 2025

Changes

This PR adds a basic stress testing framework to validate the scalability and reliability of the functionality under high-concurrency and long-running workloads. Unlike Google Benchmark, which focuses on micro-benchmarking and latency measurements for isolated operations, this framework tries to simulate sustained, multi-threaded workloads to test a given workload. The idea is to complement the existing benchmarks by adding stress-tests to addressing long-duration and high-concurrency use-cases.

This is already implemented for .Net and Rust, and most of the ideas are taken from there. I felt the need for this to test some optimizations I am doing for metrics, but feel to comment if this doesn't seem helpful.

Also added a basic stress-testing example for metrics to demonstrate. Below are the results from the metrics stress test as an example:

$ ./stress_metrics
Starting stress test with 16 threads...
Throughput: 5009490 it/s | Avg: 4885764 | Min: 4734280 | Max: 5132395
 
Test completed:
Total iterations: 203373637
Duration: 42 seconds
Average throughput: 4885764 iterations/sec
$

It’s still in the early stages and will need further enhancements but should be a good starting point. Future improvements could include adding memory and CPU usage information alongside the existing throughput, as well as refining the initial warm-up period to sustain consistent data collection.

Implementation Details:

Worker Threads:
- The worker threads (default to number of cores) are spawned to execute the workload.
- Each worker thread executes the workload function (func) in a loop until a global STOP flag is set. (ctrl-c)
- Each thread maintains its own iteration count to minimize contention.

Throughput Monitoring:
- A separate controller thread monitors throughput by periodically summing up iteration counts across threads.
- Throughput is calculated over a sliding window (SLIDING_WINDOW_SIZE) and displayed dynamically.

Final Summary:
- At the end of the test, the program calculates and prints the total iterations, duration, and average throughput.

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

@lalitb lalitb requested a review from a team as a code owner January 10, 2025 19:30
@lalitb lalitb marked this pull request as draft January 10, 2025 19:30
Copy link

netlify bot commented Jan 10, 2025

Deploy Preview for opentelemetry-cpp-api-docs canceled.

Name Link
🔨 Latest commit 4bfadb5
🔍 Latest deploy log https://app.netlify.com/projects/opentelemetry-cpp-api-docs/deploys/6864c6a5fc903800086d096c

Copy link

codecov bot commented Jan 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.95%. Comparing base (f4897b2) to head (4bfadb5).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #3241   +/-   ##
=======================================
  Coverage   89.95%   89.95%           
=======================================
  Files         219      219           
  Lines        7051     7051           
=======================================
  Hits         6342     6342           
  Misses        709      709           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added Stale and removed Stale labels Mar 15, 2025
@lalitb lalitb requested a review from Copilot July 2, 2025 05:41
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a multi-threaded stress testing framework and provides a basic OpenTelemetry metrics example to demonstrate its usage under high-concurrency workloads.

  • Adds a reusable C++ stress test library with throughput monitoring
  • Implements a metrics stress test example leveraging OpenTelemetry
  • Integrates the stress tests into the CMake build

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
stress/common/stress.h Declares the Stress class and supporting data structures
stress/common/stress.cc Implements thread spawning, monitoring, and graceful shutdown
stress/metrics/metrics.cc Adds a sample metrics stress test using the new framework
stress/common/CMakeLists.txt Builds stress as a static library
stress/metrics/CMakeLists.txt Builds stress_metrics executable and links necessary targets
CMakeLists.txt Hooks the stress directory into the main project build
Comments suppressed due to low confidence (1)

stress/common/stress.h:72

  • The new Stress framework lacks associated unit tests to validate its behavior. Consider adding tests to cover key functionality.
class Stress

std::vector<std::thread> threads_; // Vector to hold worker threads
std::vector<WorkerStats> stats_; // Vector to hold statistics for each thread
const size_t numThreads_; // Number of threads to run
std::atomic<bool> stopFlag_{false}; // signal to stop the test
Copy link
Preview

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The member variable stopFlag_ is never used, as the global STOP flag is used instead. Remove stopFlag_ or integrate it into the control flow.

Suggested change
std::atomic<bool> stopFlag_{false}; // signal to stop the test
// Removed unused stopFlag_ member variable

Copilot uses AI. Check for mistakes.

// Global flags
std::atomic<bool> STOP(
false); // Global flag to stop the stress test when signaled (e.g., via Ctrl+C)
std::atomic<bool> READY(false); // Global flag to synchronize thread start
Copy link
Preview

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The READY flag is declared but never used. Either remove it or use it to synchronize thread start.

Copilot uses AI. Check for mistakes.

void Stress::monitorThroughput()
{
uint64_t lastTotalCount = 0;
auto lastTime = std::chrono::steady_clock::now();
Copy link
Preview

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throughputHistory grows without bound in long-running tests, potentially causing high memory usage. Consider capping its size or computing rolling statistics without storing all entries.

Suggested change
auto lastTime = std::chrono::steady_clock::now();
auto lastTime = std::chrono::steady_clock::now();
const size_t MAX_HISTORY_SIZE = 100; // Maximum number of entries in throughputHistory

Copilot uses AI. Check for mistakes.

return attributes_set;
}

void InitMetrics(const std::string /*&name*/)
Copy link
Preview

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter 'name' is unused in InitMetrics. Either remove it or use it to parameterize the meter provider.

Suggested change
void InitMetrics(const std::string /*&name*/)
void InitMetrics(const std::string &name)

Copilot uses AI. Check for mistakes.

{
std::srand(static_cast<unsigned int>(std::time(nullptr))); // Seed the random number generator
// Pre-generate a set of random attributes
size_t attribute_count = 1000; // Number of attribute sets to pre-generate
Copy link
Preview

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Magic number 1000 used for attribute_count; consider making it a configurable constant or command-line parameter.

Suggested change
size_t attribute_count = 1000; // Number of attribute sets to pre-generate
size_t attribute_count = kDefaultAttributeCount; // Number of attribute sets to pre-generate

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant