Skip to content

gh-91960: skip test_gdb when built with clang #108993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

sorcio
Copy link
Contributor

@sorcio sorcio commented Sep 6, 2023

test_gdb should be skipped on all platforms when CPython is compiled with Clang, not only Darwin.

test_gdb should be skipped on all platforms when CPython is compiled with Clang, not only Darwin.
@sorcio
Copy link
Contributor Author

sorcio commented Sep 6, 2023

On second thought, #10318 (comment) says that clang itself is not the problem. Will do a bit more research.

@@ -55,7 +55,7 @@ def get_gdb_version():
if not sysconfig.is_python_build():
raise unittest.SkipTest("test_gdb only works on source builds at the moment.")

if 'Clang' in platform.python_compiler() and sys.platform == 'darwin':
if 'Clang' in platform.python_compiler():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not make more sense to skip if Python is not built with GCC? Python can for example be built with icc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of doubt, I prefer to not skip the test. Maybe icc is just fine.

@vstinner
Copy link
Member

vstinner commented Sep 6, 2023

Oh. I just realized that test_gdb failures on buildbots are specific to clang builders. There are many RHEL7, RHEL8, Fedora Stable and Fedora Rawhide. Half if using GCC, half is using clang. test_gdb only fails on clang builders.

I dislike skipping a test without digging a little bit :-( How is clang so different than gcc? Is test_gdb failing with any clang optimization level, from -O0 to -O3 including -Os and -Og?

Can we try to catch some gdb complains in gdb output to decide for the skip? I'm thinking at something similar to these existing skips:

        # bpo-40019: Skip the test if gdb failed to read debug information
        # because the Python binary is optimized.
        for pattern in (
            '(frame information optimized out)',
            'Unable to read information on python frame',
        ):
            if pattern in out:
                raise unittest.SkipTest(f"{pattern!r} found in gdb output")

@vstinner
Copy link
Member

vstinner commented Sep 6, 2023

I just build Python with clang 16.0.6 (Fedora 16.0.6-2.fc38) on Fedora 38 using ./configure. It generates:

# Compiler options
OPT=            -DNDEBUG -g -O3 -Wall
BASECFLAGS=      -fno-strict-overflow -Wsign-compare -Wunreachable-code

With clang -O3, many tests are skipped as expected:

$ ./configure CC=clang && make && ./python -m test -v test_gdb 
(...)
Total duration: 8.4 sec
Total tests: run=46 skipped=22
Total test files: run=1/1
Result: SUCCESS

test_gdb detects optimizations:

vstinner@mona$ ./python -m test -v test_gdb |grep skipped
(...)
Verify that the "py-bt" command works ... skipped 'Python was compiled with optimizations'
Verify that the "py-bt-full" command works ... skipped 'Python was compiled with optimizations'
Verify that "py-bt" indicates if a thread is garbage-collecting ... skipped 'Python was compiled with optimizations'
(...)
OK (skipped=22)

Do you see? test_gdb just pass. It would be always skip test_gdb knowning that 24 tests passed successfully on a total of 46 tests (22 tests were skipped).

@vstinner
Copy link
Member

vstinner commented Sep 6, 2023

The 4 failing buildbots on the main branch. All of them are built with ./configure --with-pydebug with clang -Og. Most use clang version 16.0.6 (on Fedora 38), but the s390x machine uses Fedora 37 and clang version 15.0.7.

@vstinner
Copy link
Member

vstinner commented Sep 6, 2023

On my Fedora 38 x86-64 with clang version 16.0.6 (Fedora 16.0.6-2.fc38), test_gdb fails if Python is built with ./configure --with-pydebug and clang -Og:

$ ./configure CC=clang --with-pydebug && make && ./python -m test -v test_gdb  
(...)
Total duration: 12.6 sec
Total tests: run=46 failures=4 skipped=16
Total test files: run=1/1 failed=1
Result: FAILURE

4 test_gdb tests are failing:

  • FAIL: test_bt (test.test_gdb.PyBtTests.test_bt)
  • FAIL: test_bt_full (test.test_gdb.PyBtTests.test_bt_full)
  • FAIL: test_pyup_command (test.test_gdb.StackNavigationTests.test_pyup_command)
  • FAIL: test_up_then_down (test.test_gdb.StackNavigationTests.test_up_then_down)

@vstinner
Copy link
Member

vstinner commented Sep 6, 2023

It seems like gdb is able to get the frame parameter of _PyEval_EvalFrameDefault() when Python is built with gcc -Og, but is unable to retrieve it when Python is built with clang -Og.

I think that we should focus on detecting frame=<optimized out>. If gdb is unable to retrieve the frame parameter, we cannot go very far...


gdb on Python build with clang -Og:

$ gdb -args ./python Lib/test/gdb_sample.py 
GNU gdb (GDB) Fedora Linux 13.2-3.fc38

(gdb) source python-gdb.py

(gdb) b builtin_id
Breakpoint 1 at 0x5c2e09: file Python/bltinmodule.c, line 1258.

(gdb) run
(...)

Breakpoint 1, builtin_id (self=<optimized out>, v=42) at Python/bltinmodule.c:1258
1258	    PyObject *id = PyLong_FromVoidPtr(v);

(gdb) py-bt
Traceback (most recent call first):
  <built-in method id of module object at remote 0x7fffea5965d0>
  (unable to read python frame information)

(gdb) frame 4
#4  0x00000000005dbcb8 in _PyEval_EvalFrameDefault (tstate=0xaa0910 <_PyRuntime+508720>, frame=<optimized out>, throwflag=0)
    at Python/generated_cases.c.h:3765
3765	            res = PyObject_Vectorcall(

(gdb) p frame
$1 = <optimized out>

gdb is unable to get the frame argument of _PyEval_EvalFrameDefault() and so cannot retrieve the code object name (co_name).


Comparison with Python built with gcc -Og:

$ ./configure --with-pydebug && make && ./python -m test -v test_gdb 
(...)
Total duration: 11.1 sec
Total tests: run=46 skipped=8
Total test files: run=1/1
Result: SUCCESS

8 tests are skipped with GCC: 7 are skipped because of the cpu resource is disabled, but test_print_after_up() is skipped with the message:

Unable to read information on python frame' found in gdb output

Logs:

Verify the pretty-printing of bytes ... skipped "resource 'cpu' is not enabled"
Verify the pretty-printing of frozensets ... skipped "resource 'cpu' is not enabled"
Verify the pretty-printing of various int values ... skipped "resource 'cpu' is not enabled"
Verify the pretty-printing of sets ... skipped "resource 'cpu' is not enabled"
Verify the pretty-printing of unicode strings ... skipped "resource 'cpu' is not enabled"
Verify that "py-bt" displays invocations of PyCFunction instances ... skipped "resource 'cpu' is not enabled"
Verify that "py-bt" indicates threads that are waiting for the GIL ... skipped "resource 'cpu' is not enabled"
test_print_after_up (test.test_gdb.PyPrintTests.test_print_after_up) ... skipped "'Unable to read information on python frame' found in gdb output"

Manual gdb test:

$ gdb -args ./python Lib/test/gdb_sample.py 
GNU gdb (GDB) Fedora Linux 13.2-3.fc38

(gdb) source python-gdb.py

(gdb) b builtin_id
Breakpoint 1 at 0x589495: file Python/bltinmodule.c, line 1257.

(gdb) run
Breakpoint 1, builtin_id (self=0x7fffea5965d0, v=v@entry=42) at Python/bltinmodule.c:1257
1257	{

(gdb) py-bt
Traceback (most recent call first):
  <built-in method id of module object at remote 0x7fffea5965d0>
  File "/home/vstinner/python/main/Lib/test/gdb_sample.py", line 10, in baz
    id(42)
  File "/home/vstinner/python/main/Lib/test/gdb_sample.py", line 7, in bar
    baz(a, b, c)
  File "/home/vstinner/python/main/Lib/test/gdb_sample.py", line 4, in foo
    bar(a=a, b=b, c=c)
  File "/home/vstinner/python/main/Lib/test/gdb_sample.py", line 12, in <module>
    foo(1, 2, 3)

(gdb) frame 4
#4  0x000000000059e2ea in _PyEval_EvalFrameDefault (tstate=0xa4e4f0 <_PyRuntime+508720>, frame=0x7ffff7fb91a0, throwflag=0)
    at Python/generated_cases.c.h:3765
3765	            res = PyObject_Vectorcall(

(gdb) p ((PyCodeObject*)frame->f_executable)->co_name
$8 = 'baz'

gdb is able to retrieve the frame parameter of _PyEval_EvalFrameDefault() and from that, get the code object name: baz.

@sorcio
Copy link
Contributor Author

sorcio commented Sep 6, 2023

Oh, I was on a very similar path at the same time :)

I suspect it's a matter of debug information that gdb is able to find, rather than actually being optimized out. I also get gaps with lldb though, so the story might be a bit more complicated.

Suggestion: we add the skip to unblock the FreeBSD CI build, and we open a new issue for the clang+gdb situation.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always skipping is obvious the simplest approach, but I think that we can do better: I wrote PR #108999 to have a more precise test on when gdb is unable to retrieve the frame argument of _PyEval_EvalFrameDefault().

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@vstinner
Copy link
Member

vstinner commented Sep 6, 2023

Thanks for putting the spot light on test_gdb :-) Your PR works obviously, but I chose to skip at the test function level, rather than skipping the whole module. See my tests: #108999 (comment) Many tests actually pass when Python is built with clang. Only "a few" tests fail depending on the clang optimization level which is not surprising, gcc+gdb has the same symptoms.

I merged my PR #108999 instead.

@vstinner vstinner closed this Sep 6, 2023
@sorcio sorcio deleted the patch-1 branch September 6, 2023 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting changes tests Tests in the Lib/test dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants