-
Notifications
You must be signed in to change notification settings - Fork 494
Add stress testing framework, with basic metrics example to demonstrate. #3241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[pull] main from open-telemetry:main
✅ Deploy Preview for opentelemetry-cpp-api-docs canceled.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3241 +/- ##
=======================================
Coverage 89.95% 89.95%
=======================================
Files 219 219
Lines 7051 7051
=======================================
Hits 6342 6342
Misses 709 709 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a multi-threaded stress testing framework and provides a basic OpenTelemetry metrics example to demonstrate its usage under high-concurrency workloads.
- Adds a reusable C++ stress test library with throughput monitoring
- Implements a metrics stress test example leveraging OpenTelemetry
- Integrates the stress tests into the CMake build
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
stress/common/stress.h | Declares the Stress class and supporting data structures |
stress/common/stress.cc | Implements thread spawning, monitoring, and graceful shutdown |
stress/metrics/metrics.cc | Adds a sample metrics stress test using the new framework |
stress/common/CMakeLists.txt | Builds stress as a static library |
stress/metrics/CMakeLists.txt | Builds stress_metrics executable and links necessary targets |
CMakeLists.txt | Hooks the stress directory into the main project build |
Comments suppressed due to low confidence (1)
stress/common/stress.h:72
- The new Stress framework lacks associated unit tests to validate its behavior. Consider adding tests to cover key functionality.
class Stress
std::vector<std::thread> threads_; // Vector to hold worker threads | ||
std::vector<WorkerStats> stats_; // Vector to hold statistics for each thread | ||
const size_t numThreads_; // Number of threads to run | ||
std::atomic<bool> stopFlag_{false}; // signal to stop the test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The member variable stopFlag_ is never used, as the global STOP flag is used instead. Remove stopFlag_ or integrate it into the control flow.
std::atomic<bool> stopFlag_{false}; // signal to stop the test | |
// Removed unused stopFlag_ member variable |
Copilot uses AI. Check for mistakes.
// Global flags | ||
std::atomic<bool> STOP( | ||
false); // Global flag to stop the stress test when signaled (e.g., via Ctrl+C) | ||
std::atomic<bool> READY(false); // Global flag to synchronize thread start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The READY flag is declared but never used. Either remove it or use it to synchronize thread start.
Copilot uses AI. Check for mistakes.
void Stress::monitorThroughput() | ||
{ | ||
uint64_t lastTotalCount = 0; | ||
auto lastTime = std::chrono::steady_clock::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throughputHistory grows without bound in long-running tests, potentially causing high memory usage. Consider capping its size or computing rolling statistics without storing all entries.
auto lastTime = std::chrono::steady_clock::now(); | |
auto lastTime = std::chrono::steady_clock::now(); | |
const size_t MAX_HISTORY_SIZE = 100; // Maximum number of entries in throughputHistory |
Copilot uses AI. Check for mistakes.
return attributes_set; | ||
} | ||
|
||
void InitMetrics(const std::string /*&name*/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter 'name' is unused in InitMetrics. Either remove it or use it to parameterize the meter provider.
void InitMetrics(const std::string /*&name*/) | |
void InitMetrics(const std::string &name) |
Copilot uses AI. Check for mistakes.
{ | ||
std::srand(static_cast<unsigned int>(std::time(nullptr))); // Seed the random number generator | ||
// Pre-generate a set of random attributes | ||
size_t attribute_count = 1000; // Number of attribute sets to pre-generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Magic number 1000 used for attribute_count; consider making it a configurable constant or command-line parameter.
size_t attribute_count = 1000; // Number of attribute sets to pre-generate | |
size_t attribute_count = kDefaultAttributeCount; // Number of attribute sets to pre-generate |
Copilot uses AI. Check for mistakes.
Changes
This PR adds a basic stress testing framework to validate the scalability and reliability of the functionality under high-concurrency and long-running workloads. Unlike Google Benchmark, which focuses on micro-benchmarking and latency measurements for isolated operations, this framework tries to simulate sustained, multi-threaded workloads to test a given workload. The idea is to complement the existing benchmarks by adding stress-tests to addressing long-duration and high-concurrency use-cases.
This is already implemented for .Net and Rust, and most of the ideas are taken from there. I felt the need for this to test some optimizations I am doing for metrics, but feel to comment if this doesn't seem helpful.
Also added a basic stress-testing example for metrics to demonstrate. Below are the results from the metrics stress test as an example:
It’s still in the early stages and will need further enhancements but should be a good starting point. Future improvements could include adding memory and CPU usage information alongside the existing throughput, as well as refining the initial warm-up period to sustain consistent data collection.
Implementation Details:
Worker Threads:
- The worker threads (default to number of cores) are spawned to execute the workload.
- Each worker thread executes the workload function (func) in a loop until a global STOP flag is set. (ctrl-c)
- Each thread maintains its own iteration count to minimize contention.
Throughput Monitoring:
- A separate controller thread monitors throughput by periodically summing up iteration counts across threads.
- Throughput is calculated over a sliding window (SLIDING_WINDOW_SIZE) and displayed dynamically.
Final Summary:
- At the end of the test, the program calculates and prints the total iterations, duration, and average throughput.
For significant contributions please make sure you have completed the following items:
CHANGELOG.md
updated for non-trivial changes