-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
test_perf_profiler
fails on aarch64 Fedora Stable Refleaks buildbot
#131038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Unfortunately I cannot reproduce this locally and I don't have any idea of what may be happening in the buildbot. Was this updated recently? |
I cannot reproduce in the buildbot either:
It seems to be related to perf dropping events under high CPU usage. I honestly don't want to skip the test in this case because that is going to be a pain if we have real failures |
@stratakis Can you investigate if something has changed recently in the buildbot? |
I have been testing under high load using this script: import multiprocessing
import time
import argparse
def cpu_load(duration=None, intensity=1.0):
"""
Generate CPU load on a single core.
Args:
duration (float, optional): Duration in seconds to run the load.
If None, runs until interrupted.
intensity (float): Value between 0.0 and 1.0 to control load intensity.
1.0 means 100% load, 0.5 means approximately 50% load.
"""
start_time = time.time()
while True:
# Busy work
x = 0
# Adjust end range to control intensity
end = int(10000000 * intensity)
for i in range(end):
x += i
x *= 1.0000001
# Sleep to reduce CPU usage if intensity < 1.0
if intensity < 1.0:
time.sleep((1.0 - intensity) * 0.1)
# Check if duration has been reached
if duration is not None and time.time() - start_time >= duration:
break
def run_load_test(cores=None, duration=None, intensity=1.0):
"""
Run CPU load test on multiple cores.
Args:
cores (int, optional): Number of cores to utilize.
If None, uses all available cores.
duration (float, optional): Duration in seconds to run the test.
If None, runs until interrupted.
intensity (float): Value between 0.0 and 1.0 to control load intensity.
"""
if cores is None:
cores = multiprocessing.cpu_count()
print(f"Starting CPU load test on {cores} cores with {intensity*100}% intensity")
if duration:
print(f"Test will run for {duration} seconds")
else:
print("Test will run until interrupted with Ctrl+C")
# Create and start processes
processes = []
try:
for _ in range(cores):
p = multiprocessing.Process(target=cpu_load, args=(duration, intensity))
processes.append(p)
p.start()
# Wait for processes to complete if duration is specified
if duration:
for p in processes:
p.join()
print("CPU load test completed")
else:
# Keep the main process running until interrupted
while True:
time.sleep(1)
except KeyboardInterrupt:
print("Stopping CPU load test...")
for p in processes:
if p.is_alive():
p.terminate()
print("CPU load test stopped")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Generate CPU load for testing")
parser.add_argument("-c", "--cores", type=int, default=None,
help="Number of cores to utilize (default: all available)")
parser.add_argument("-d", "--duration", type=float, default=None,
help="Duration in seconds (default: run until interrupted)")
parser.add_argument("-i", "--intensity", type=float, default=1.0,
help="Load intensity from 0.0 to 1.0 (default: 1.0)")
args = parser.parse_args()
run_load_test(args.cores, args.duration, args.intensity) and still can't reproduce even on the buildbot |
I'm testing with the buildbot at 147.75.54.63 |
The failures coincide with the dnf update I've done a couple of day ago. More specifically the kernel (hence Perf as well) got updated from 6.12.11-200 to 6.13.5-200. I'm attaching the full dnf transaction. |
However I was running last week some Python compilations of my own to test some thing in the buildbot and if that happened at the same time a ref leak build was running that could be the problem. I'd let this open until a couple more refleak builds occur and if nothing shows up, we can close the issue. |
That was probably it. The refleak buildbots were green for a while, before starting to fail with an unrelated change. |
Yup. |
I'm re-opening it because we're still seeing failures (and this is flagged by https://buildbot.python.org/#/release_status). And the failures become more and more frequent at first glance. |
test_perf_profiler
fails on aarch64 Fedora Stable Refleaks buildbottest_perf_profiler
fails on aarch64 Fedora Stable Refleaks buildbot
I have tried another reproducer round and still could not reproduce |
This is really strange. I don't know what happens but I don't know if we should worry about this. @hugovk is this considered a deal breaker for the next release since it's a tier 2 bot that is failing? As Petr observed, the strings to find vs those that are actually here mismatch a bit. For instance, in the latest failure, the string to find is
OTOH, we have the following string (quite a lot of time):
So, it seems that the temporary folder is not the correct one. I don't know what happens nor what's being tested, but the fact that we have almost the correct string could also be a parallelism issue? Maybe another test is polluting the /tmp dir for some reason? |
Hmm, a flaky test won't necessarily block (the last) alpha release. (I'd prefer it fixed or skipped or otherwise green :) |
Uh oh!
There was an error while loading. Please reload this page.
Bug report
Since about 4 days ago,
test_perf_profiler.test_python_calls_appear_in_the_stack_if_perf_activated
usually fails on refleaks buildbots with one of these (see build 416 for both):AssertionError: 'py::foo:/tmp/test_python_2wtkitdm/tmpv30razor/perftest.py' not found in
<very long string containing frames with a different temp dir instead:/tmp/test_python_ecqze28x/tmpz61vpvo4/perftest.py+0xb
here>The text was updated successfully, but these errors were encountered: