-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
py/scheduler: De-inline and fix race with pending exception / scheduler (v2) #8869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Size and performance diffs (arm-none-eabi-gcc & gcc 12.1)
Linux:
pybv11:
Just as an interesting aside into compiler versions... Originally I ran this with gcc 11.2 and got quite different results...
pybv11 (gcc 11.2)
|
The optimisation that allows a single check in the VM for either a pending exception or non-empty scheduler queue doesn't work when threading is enabled, as one thread can clear the sched_state if it has no pending exception, meaning the thread with the pending exception will never see it. This removes that optimisation for threaded builds. Also fixes a race in non-scheduler builds where get-and-clear of the pending exception is not protected by the atomic section. Also removes the bulk of the inlining of pending exceptions and scheduler handling from the VM. This just costs code size and complexity at no performance benefit. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
c05ef43
to
3fea7d7
Compare
} else { | ||
MICROPY_END_ATOMIC_SECTION(atomic_state); | ||
} | ||
static inline void mp_sched_run_pending(void) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline
seems unnecessary here. The compiler probably ignores it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying the compiler ignores it and doesn't inline? Or that it's best to leave the decision to the compiler?
In this case I definitely want it to inline (to reduce stack usage and code size), I am hoping that -Os
does the right thing here and would be very surprised if the compiler wasn't inlining. The alternative is to manually inline it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some disassembly and it looks like it gets inlined even without the inline
keyword (gcc 10, default stm32 build).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good, thanks for checking. I don't see the harm in leaving the inline
keyword there though (if nothing else than as a form of documentation), and there are other functions in this file that use inline
too.
I wonder if perhaps we should introduce a MP_INLINE
macro (matching the existing MP_NOINLINE
) to use the __attribute__((always_inline))
attribute. (After all, when we put inline
on a function we're doing it on purpose, we may as well force it).
(While we're at it, other functions in this file use static
when they really should be using STATIC
...)
Thanks, this is a good fix and clean up. Merged in 8db99f1 |
Disable i2ctarget on ESP
This is an alternative to #8845 that applies the same fix but also de-inlines this code from the VM.
Justification for de-inlining is that it improves performance (likely by reducing overall size of mp_execute_bytecode), reduces code size (-72 bytes overall), and reduces complexity. There should be no impact on scheduled task execution, but for the case of raising a pending exception the C-stack usage is slightly higher as there's now a function call (and a full NLR jump) rather than the inlined code before. (I think this is worth it as it only applies when there's a pending exception and doesn't impact the common case).
Performance results for pybv11 (non-threading)
For pybv11 (with threading) (note: the previous code was incorrect on non-GIL builds)
The optimisation that allows a single check in the VM for either a pending
exception or non-empty scheduler queue doesn't work when threading is
enabled, as one thread can clear the sched_state if it has no pending
exception, meaning the thread with the pending exception will never see it.
This removes that optimisation for threaded builds.
Also fixes a race in non-scheduler builds where get-and-clear of the
pending exception is not protected by the atomic section.
Also removes the bulk of the inlining of pending exceptions and scheduler
handling from the VM. This just costs code size and complexity at no
performance benefit.