Skip to content

gh-107265: Fix initialize/remove_tools for ENTER_EXECUTOR case #108482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 27, 2023

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Aug 25, 2023

@corona10
Copy link
Member Author

@gvanrossum
See why this fix is needed: #107265 (comment)
But I am not sure how I can write test code for this case :(

@corona10 corona10 changed the title gh-107265: Fix initialize_tools for ENTER_EXECUTOR case [WIP] gh-107265: Fix initialize_tools for ENTER_EXECUTOR case Aug 25, 2023
@corona10 corona10 changed the title [WIP] gh-107265: Fix initialize_tools for ENTER_EXECUTOR case gh-107265: Fix initialize_tools for ENTER_EXECUTOR case Aug 25, 2023
@corona10 corona10 changed the title gh-107265: Fix initialize_tools for ENTER_EXECUTOR case gh-107265: Fix initialize/remove_tools for ENTER_EXECUTOR case Aug 25, 2023
Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know this code very well; let's see if @markshannon wants to review it.

@gvanrossum gvanrossum requested a review from markshannon August 25, 2023 16:36
Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG except formatting/naming nits.

@corona10 corona10 requested a review from gvanrossum August 26, 2023 07:07
Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Green light! Thanks so much for powering through these.

@corona10 corona10 merged commit 6cb48f0 into python:main Aug 27, 2023
@corona10 corona10 deleted the gh-107265-initialize_tools branch August 27, 2023 03:31
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot aarch64 Fedora Stable LTO 3.x has failed when building commit 6cb48f0.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/336/builds/3843) and take a look at the build logs.
  4. Check if the failure is related to this commit (6cb48f0) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/336/builds/3843

Failed tests:

  • test.test_concurrent_futures.test_shutdown

Failed subtests:

  • test_interpreter_shutdown - test.test_concurrent_futures.test_shutdown.ProcessPoolForkserverProcessPoolShutdownTest.test_interpreter_shutdown

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

448 tests OK.

10 slowest tests:

  • test_gdb: 4 min 51 sec
  • test.test_multiprocessing_spawn.test_processes: 1 min 14 sec
  • test.test_concurrent_futures.test_wait: 1 min 11 sec
  • test_socket: 1 min 3 sec
  • test_signal: 1 min 2 sec
  • test.test_multiprocessing_forkserver.test_processes: 52.5 sec
  • test_subprocess: 50.8 sec
  • test_unparse: 48.2 sec
  • test_io: 43.5 sec
  • test.test_multiprocessing_spawn.test_misc: 42.6 sec

1 test failed:
test.test_concurrent_futures.test_shutdown

14 tests skipped:
test.test_asyncio.test_windows_events
test.test_asyncio.test_windows_utils test_devpoll test_ioctl
test_kqueue test_launcher test_startfile test_tkinter test_ttk
test_winconsoleio test_winreg test_winsound test_wmi
test_zipfile64

1 re-run test:
test.test_concurrent_futures.test_shutdown

Total duration: 4 min 57 sec

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/test/test_concurrent_futures/test_shutdown.py", line 49, in test_interpreter_shutdown
    self.assertFalse(err)
AssertionError: b'Exception in thread Thread-1:\nTraceback (most recent call last):\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/threading.py", line 1059, in _bootstrap_inner\n    self.run()\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/concurrent/futures/process.py", line 339, in run\n    self.add_call_item_to_queue()\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/concurrent/futures/process.py", line 394, in add_call_item_to_queue\n    self.call_queue.put(_CallItem(work_id,\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/multiprocessing/queues.py", line 94, in put\n    self._start_thread()\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/multiprocessing/queues.py", line 177, in _start_thread\n    self._thread.start()\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/threading.py", line 978, in start\n    _start_new_thread(self._bootstrap, ())\nRuntimeError: can\'t create new thread at interpreter shutdown\nTraceback (most recent call last):\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/multiprocessing/forkserver.py", line 274, in main\n    code = _serve_one(child_r, fds,\n           ^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/multiprocessing/forkserver.py", line 313, in _serve_one\n    code = spawn._main(child_r, parent_sentinel)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/multiprocessing/spawn.py", line 132, in _main\n    self = reduction.pickle.load(from_parent)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/multiprocessing/synchronize.py", line 115, in __setstate__\n    self._semlock = _multiprocessing.SemLock._rebuild(*state)\n                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nFileNotFoundError: [Errno 2] No such file or directory\n' is not false

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the removed opcode != ENTER_EXECUTOR assertions were correct.
There should never be instrumented ENTER_EXECUTOR instructions.

I think the correct fix is to remove all ENTER_EXECUTOR instructions in _Py_Instrument() prior to calling update_instrumentation_data(), some thing like:

if (code->co_executors->size > 0) {
    // Walk code removing ENTER_EXECUTOR
    // Clear all executors
}

@@ -566,7 +566,13 @@ de_instrument(PyCodeObject *code, int i, int event)
_Py_CODEUNIT *instr = &_PyCode_CODE(code)[i];
uint8_t *opcode_ptr = &instr->op.code;
int opcode = *opcode_ptr;
assert(opcode != ENTER_EXECUTOR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is correct. Please don't remove assertions, unless you are really sure that they are incorrect.

ENTER_EXECUTOR should never have associated instrumentation.

Copy link
Member Author

@corona10 corona10 Aug 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was moved to L574, it will be the same effect no?

@@ -711,7 +717,22 @@ remove_tools(PyCodeObject * code, int offset, int event, int tools)
assert(event != PY_MONITORING_EVENT_LINE);
assert(event != PY_MONITORING_EVENT_INSTRUCTION);
assert(PY_MONITORING_IS_INSTRUMENTED_EVENT(event));
assert(opcode_has_event(_Py_GetBaseOpcode(code, offset)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also correct, provided the assertion assert(PY_MONITORING_IS_INSTRUMENTED_EVENT(event)) is true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it will guarantee that the opcode is not ENTER_EXECUTOR?

opcode = code->_co_monitoring->lines[i].original_opcode;
}
assert(opcode != ENTER_EXECUTOR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert should be moved before the original if (opcode == INSTRUMENTED_LINE) {, we shouldn't get here with and ENTER_EXECUTOR present.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move it to L1310 will be enough?

@markshannon
Copy link
Member

Note that it is OK to have instrumented instructions and ENTER_EXECUTOR in the same code objects, but only if ENTER_EXECUTOR is inserted after the instrumentation happened, and ENTER_EXECUTOR never replaces an instrumented instruction.

@corona10
Copy link
Member Author

I think the correct fix is to remove all ENTER_EXECUTOR instructions in _Py_Instrument() prior to calling update_instrumentation_data(), some thing like:

I will try to apply this approach :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants