`test_perf_profiler` fails on aarch64 Fedora Stable Refleaks buildbot #131038

encukou · 2025-03-10T10:37:35Z

Bug report

Since about 4 days ago, test_perf_profiler.test_python_calls_appear_in_the_stack_if_perf_activated usually fails on refleaks buildbots with one of these (see build 416 for both):

AssertionError: 'py::foo:/tmp/test_python_2wtkitdm/tmpv30razor/perftest.py' not found in <very long string containing frames with a different temp dir instead: /tmp/test_python_ecqze28x/tmpz61vpvo4/perftest.py+0xb here>

unexpected stderr:

Warning:
Processed 688 events and lost 3 chunks!

Check IO/CPU overload!

The text was updated successfully, but these errors were encountered:

vstinner · 2025-03-10T16:53:12Z

cc @pablogsal @stratakis

pablogsal · 2025-03-10T17:41:20Z

Unfortunately I cannot reproduce this locally and I don't have any idea of what may be happening in the buildbot. Was this updated recently?

pablogsal · 2025-03-10T21:26:39Z

I cannot reproduce in the buildbot either:

pablo@localhost:~/cpython$ ./python -m test test_perf_profiler -v
== CPython 3.14.0a5+ (heads/main:8431b1fee88, Mar 10 2025, 16:20:11) [GCC 14.2.1 20250110 (Red Hat 14.2.1-7)]
== Linux-6.13.5-200.fc41.aarch64-aarch64-with-glibc2.40 little-endian
== Python build: debug
== cwd: /home/pablo/cpython/build/test_python_worker_160904æ
== CPU count: 40
== encodings: locale=UTF-8 FS=utf-8
== resources: all test resources are disabled, use -u option to unskip tests

Using random seed: 2682059707
0:00:00 load avg: 16.79 Run 1 test sequentially in a single process
0:00:00 load avg: 16.79 [1/1] test_perf_profiler
test_pre_fork_compile (test.test_perf_profiler.TestPerfProfiler.test_pre_fork_compile) ... ok
test_python_calls_appear_in_the_stack_if_perf_activated (test.test_perf_profiler.TestPerfProfiler.test_python_calls_appear_in_the_stack_if_perf_activated) ... ok
test_python_calls_do_not_appear_in_the_stack_if_perf_deactivated (test.test_perf_profiler.TestPerfProfiler.test_python_calls_do_not_appear_in_the_stack_if_perf_deactivated) ... ok
test_python_calls_appear_in_the_stack_if_perf_activated (test.test_perf_profiler.TestPerfProfilerWithDwarf.test_python_calls_appear_in_the_stack_if_perf_activated) ... ok
test_python_calls_do_not_appear_in_the_stack_if_perf_deactivated (test.test_perf_profiler.TestPerfProfilerWithDwarf.test_python_calls_do_not_appear_in_the_stack_if_perf_deactivated) ... ok
test_sys_api (test.test_perf_profiler.TestPerfTrampoline.test_sys_api) ... ok
test_sys_api_get_status (test.test_perf_profiler.TestPerfTrampoline.test_sys_api_get_status) ... ok
test_sys_api_with_existing_trampoline (test.test_perf_profiler.TestPerfTrampoline.test_sys_api_with_existing_trampoline) ... ok
test_sys_api_with_invalid_trampoline (test.test_perf_profiler.TestPerfTrampoline.test_sys_api_with_invalid_trampoline) ... ok
test_trampoline_works (test.test_perf_profiler.TestPerfTrampoline.test_trampoline_works) ... ok
test_trampoline_works_with_forks (test.test_perf_profiler.TestPerfTrampoline.test_trampoline_works_with_forks) ... ok

----------------------------------------------------------------------
Ran 11 tests in 19.774s

OK
0:00:20 load avg: 24.35 [1/1] test_perf_profiler passed

== Tests result: SUCCESS ==

1 test OK.

Total duration: 20.9 sec
Total tests: run=11
Total test files: run=1/1
Result: SUCCESS
pablo@localhost:~/cpython$ uname -a
Linux localhost.localdomain 6.13.5-200.fc41.aarch64 #1 SMP PREEMPT_DYNAMIC Thu Feb 27 15:30:52 UTC 2025 aarch64 GNU/Linux

It seems to be related to perf dropping events under high CPU usage. I honestly don't want to skip the test in this case because that is going to be a pain if we have real failures

pablogsal · 2025-03-10T21:28:46Z

@stratakis Can you investigate if something has changed recently in the buildbot?

pablogsal · 2025-03-10T21:32:12Z

I have been testing under high load using this script:

import multiprocessing
import time
import argparse

def cpu_load(duration=None, intensity=1.0):
    """
    Generate CPU load on a single core.
    
    Args:
        duration (float, optional): Duration in seconds to run the load.
                                   If None, runs until interrupted.
        intensity (float): Value between 0.0 and 1.0 to control load intensity.
                          1.0 means 100% load, 0.5 means approximately 50% load.
    """
    start_time = time.time()
    
    while True:
        # Busy work
        x = 0
        # Adjust end range to control intensity
        end = int(10000000 * intensity)
        for i in range(end):
            x += i
            x *= 1.0000001
        
        # Sleep to reduce CPU usage if intensity < 1.0
        if intensity < 1.0:
            time.sleep((1.0 - intensity) * 0.1)
            
        # Check if duration has been reached
        if duration is not None and time.time() - start_time >= duration:
            break

def run_load_test(cores=None, duration=None, intensity=1.0):
    """
    Run CPU load test on multiple cores.
    
    Args:
        cores (int, optional): Number of cores to utilize. 
                              If None, uses all available cores.
        duration (float, optional): Duration in seconds to run the test.
                                   If None, runs until interrupted.
        intensity (float): Value between 0.0 and 1.0 to control load intensity.
    """
    if cores is None:
        cores = multiprocessing.cpu_count()
    
    print(f"Starting CPU load test on {cores} cores with {intensity*100}% intensity")
    if duration:
        print(f"Test will run for {duration} seconds")
    else:
        print("Test will run until interrupted with Ctrl+C")
    
    # Create and start processes
    processes = []
    try:
        for _ in range(cores):
            p = multiprocessing.Process(target=cpu_load, args=(duration, intensity))
            processes.append(p)
            p.start()
        
        # Wait for processes to complete if duration is specified
        if duration:
            for p in processes:
                p.join()
            print("CPU load test completed")
        else:
            # Keep the main process running until interrupted
            while True:
                time.sleep(1)
                
    except KeyboardInterrupt:
        print("Stopping CPU load test...")
        for p in processes:
            if p.is_alive():
                p.terminate()
        print("CPU load test stopped")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Generate CPU load for testing")
    parser.add_argument("-c", "--cores", type=int, default=None, 
                        help="Number of cores to utilize (default: all available)")
    parser.add_argument("-d", "--duration", type=float, default=None,
                        help="Duration in seconds (default: run until interrupted)")
    parser.add_argument("-i", "--intensity", type=float, default=1.0,
                        help="Load intensity from 0.0 to 1.0 (default: 1.0)")
    
    args = parser.parse_args()
    
    run_load_test(args.cores, args.duration, args.intensity)

and still can't reproduce even on the buildbot

pablogsal · 2025-03-10T21:32:44Z

I'm testing with the buildbot at 147.75.54.63

stratakis · 2025-03-10T23:46:11Z

The failures coincide with the dnf update I've done a couple of day ago.

More specifically the kernel (hence Perf as well) got updated from 6.12.11-200 to 6.13.5-200.

update.txt

I'm attaching the full dnf transaction.

stratakis · 2025-03-11T00:00:34Z

However I was running last week some Python compilations of my own to test some thing in the buildbot and if that happened at the same time a ref leak build was running that could be the problem. I'd let this open until a couple more refleak builds occur and if nothing shows up, we can close the issue.

encukou · 2025-03-11T12:13:50Z

That was probably it. The refleak buildbots were green for a while, before starting to fail with an unrelated change.
I'll confirm tomorrow.

encukou · 2025-03-13T15:05:57Z

Yup.
Sorry for the noise!

picnixz · 2025-03-23T17:46:01Z

The build bots are still failing but they aren't really "green". Most of the time they're yellow but the test is indeed failing:

The branch is 3.14 and the rightmost red is 10days ago. We're having more flaky failures than non-failures so I'm not really sure if there isn't something else happening.

picnixz · 2025-03-29T10:19:54Z

I'm re-opening it because we're still seeing failures (and this is flagged by https://buildbot.python.org/#/release_status). And the failures become more and more frequent at first glance.

pablogsal · 2025-03-29T11:38:32Z

I have tried another reproducer round and still could not reproduce

picnixz · 2025-03-29T13:59:44Z

This is really strange. I don't know what happens but I don't know if we should worry about this. @hugovk is this considered a deal breaker for the next release since it's a tier 2 bot that is failing?

As Petr observed, the strings to find vs those that are actually here mismatch a bit. For instance, in the latest failure, the string to find is

py::foo:/tmp/test_python_gjd_5u20/tmpf3ej7o5w/perftest.py

OTOH, we have the following string (quite a lot of time):

py::foo:/tmp/test_python_g13mifgx/tmp4sgs6ndf/perftest.py

So, it seems that the temporary folder is not the correct one. I don't know what happens nor what's being tested, but the fact that we have almost the correct string could also be a parallelism issue? Maybe another test is polluting the /tmp dir for some reason?

hugovk · 2025-03-29T15:59:11Z

I don't know what happens but I don't know if we should worry about this. @hugovk is this considered a deal breaker for the next release since it's a tier 2 bot that is failing?

Hmm, a flaky test won't necessarily block (the last) alpha release. (I'd prefer it fixed or skipped or otherwise green :)

encukou added 3.13 bugs and security fixes performance Performance or resource usage type-bug An unexpected behavior, bug, or error labels Mar 10, 2025

encukou closed this as completed Mar 13, 2025

encukou closed this as not planned Won't fix, can't repro, duplicate, stale Mar 13, 2025

picnixz reopened this Mar 29, 2025

picnixz changed the title ~~3.13: test_perf_profiler fails on aarch64 Fedora Stable Refleaks buildbot~~ test_perf_profiler fails on aarch64 Fedora Stable Refleaks buildbot Mar 29, 2025

picnixz added interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed 3.13 bugs and security fixes labels Mar 29, 2025

stratakis mentioned this issue Jun 4, 2025

asm_trampoline.S misses branch protection flags for x86_64 and aarch64 #128605

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`test_perf_profiler` fails on aarch64 Fedora Stable Refleaks buildbot #131038

`test_perf_profiler` fails on aarch64 Fedora Stable Refleaks buildbot #131038

encukou commented Mar 10, 2025 •

edited by github-actions bot

Loading

vstinner commented Mar 10, 2025

Uh oh!

pablogsal commented Mar 10, 2025

Uh oh!

pablogsal commented Mar 10, 2025

Uh oh!

pablogsal commented Mar 10, 2025

Uh oh!

pablogsal commented Mar 10, 2025 •

edited

Loading

Uh oh!

pablogsal commented Mar 10, 2025

Uh oh!

stratakis commented Mar 10, 2025

Uh oh!

stratakis commented Mar 11, 2025

Uh oh!

encukou commented Mar 11, 2025

Uh oh!

encukou commented Mar 13, 2025

Uh oh!

picnixz commented Mar 23, 2025

Uh oh!

picnixz commented Mar 29, 2025 •

edited

Loading

Uh oh!

pablogsal commented Mar 29, 2025

Uh oh!

picnixz commented Mar 29, 2025 •

edited

Loading

Uh oh!

hugovk commented Mar 29, 2025

Uh oh!

Uh oh!

test_perf_profiler fails on aarch64 Fedora Stable Refleaks buildbot #131038

test_perf_profiler fails on aarch64 Fedora Stable Refleaks buildbot #131038

Comments

encukou commented Mar 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug report

vstinner commented Mar 10, 2025

Uh oh!

pablogsal commented Mar 10, 2025

Uh oh!

pablogsal commented Mar 10, 2025

Uh oh!

pablogsal commented Mar 10, 2025

Uh oh!

pablogsal commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pablogsal commented Mar 10, 2025

Uh oh!

stratakis commented Mar 10, 2025

Uh oh!

stratakis commented Mar 11, 2025

Uh oh!

encukou commented Mar 11, 2025

Uh oh!

encukou commented Mar 13, 2025

Uh oh!

picnixz commented Mar 23, 2025

Uh oh!

picnixz commented Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pablogsal commented Mar 29, 2025

Uh oh!

picnixz commented Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hugovk commented Mar 29, 2025

Uh oh!

`test_perf_profiler` fails on aarch64 Fedora Stable Refleaks buildbot #131038

`test_perf_profiler` fails on aarch64 Fedora Stable Refleaks buildbot #131038

encukou commented Mar 10, 2025 •

edited by github-actions bot

Loading

pablogsal commented Mar 10, 2025 •

edited

Loading

picnixz commented Mar 29, 2025 •

edited

Loading

picnixz commented Mar 29, 2025 •

edited

Loading