-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
uasyncio: make uasyncio.Event() safe to call from an interrupt (RFC, WIP) #6056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uasyncio: make uasyncio.Event() safe to call from an interrupt (RFC, WIP) #6056
Conversation
This is how CPython does it, using a special "self socket".
So it can be set from a scheduled callback (but not a hard IRQ). TODO: work out if it can be set from another thread, probably only with the GIL enabled.
Here's an example import time, machine, micropython, pyb, uasyncio as asyncio
from machine import Pin
class AsyncPin:
def __init__(self, pin, trigger):
self.pin = pin
self.event = asyncio.Event()
self.timestamp = 0
self.counter = 0
self.pin.irq(self._cb, trigger)
def _cb(self, pin):
if self.counter == 0:
self.timestamp = time.ticks_us()
self.counter += 1
self.event.set()
async def wait_edge(self):
ev = self.event
await ev.wait()
micropython.scheduler_lock()
timestamp = self.timestamp
count = self.counter
self.counter = 0
ev.clear()
micropython.scheduler_unlock()
return timestamp, count
async def wait_on_pin(pin, trigger):
apin = AsyncPin(pin, trigger)
while True:
timestamp, count = await apin.wait_edge()
dt = time.ticks_diff(time.ticks_us(), timestamp)
print('edge dt={} count={} pin={}'.format(dt, count, pin))
async def led_flash(led, freq):
while True:
led.toggle()
await asyncio.sleep(0.5 / freq)
async def main():
asyncio.create_task(led_flash(pyb.LED(1), 1))
asyncio.create_task(wait_on_pin(Pin('Y1', Pin.IN, Pin.PULL_DOWN), Pin.IRQ_RISING))
asyncio.create_task(wait_on_pin(Pin('Y2', Pin.IN, Pin.PULL_UP), Pin.IRQ_FALLING))
usr = AsyncPin(Pin('USR'), Pin.IRQ_FALLING)
await usr.wait_edge()
asyncio.run(main()) Edit: removed unused |
Here's a simpler example that works on the unix port (coverage build, needs import sys, uasyncio
ev = uasyncio.Event()
if sys.platform == 'linux':
import signal
signal.signal(signal.SIGPIPE, lambda sig: ev.set())
else:
import machine
machine.Pin('USR').irq(lambda p: ev.set())
async def main():
print('waiting on event', ev.is_set())
await ev.wait()
print('event:', ev.is_set())
uasyncio.run(main()) On unix do |
In your first sample: async def wait_edge(self):
ev = self.event
await ev.wait()
micropython.scheduler_lock()
timestamp = self.timestamp Is there a possible issue where an edge occurred immediately after Incidentally the |
No there shouldn't be. The edge counter will just be incremented by 1. The AsyncPin class is designed here so that all edges will be accounted for, it's just whether the count is included in the current or next
Thanks, now fixed. |
Sure. My general point is that an IRQ might occur between those instructions. I guess, in cases where this matters, the ISR can detect this case by testing event.is_set(). |
Some alternatives with pros/cons relative to the approach in this PR:
|
Interesting PR! I like the fact that the changeset is manageable! What concerns me is that it appears to layer a simple event mechanism on top of a very heavyweight I/O mechanism. Yes, it's the way CPython does it, but CPython does it the way Unix does it, which has its own decades of constraints. In embedded systems I'm familiar with, the event mechanism is a tiny little thing that is deep in the core and super efficient. The approach taken in this PR layers events on top of ipoll, which, looking at moduselect, means a busy-polling loop in poll_poll_internal, which calls poll_map_poll, which itself is a loop over all file descriptors being awaited calling through the stream ioctl machinery for each one. In the case of network connections that turns into a select call into LwIP for each file descriptor. (Am I getting this right?) So in a way this PR kicks the proverbial can down the road in terms of making the innermost MP loop efficient and event driven. That may in itself not be a bad step as long as there's some discussion on how ipoll can be made event driven and efficient. It looks to me like that will take rewriting moduselect into a sort of rendez-vous mechanism where MP's core loop is on one side and the I/O providers (LwIP, BLE, USB, ..) are on the other and uselect really decouples the threads churning in those I/O providers from MP. |
That is a big constraint, because it's important to try and match the CPython API (although that doesn't really dictate the underlying implementation of events), and also more important that it works on unix MicroPython.
The problem is it's hard to come up with something that will work across many different systems, eg unix, zephyr/RTOS, esp32, bare-metal. Poll is something which is common, and already exists in MicroPython, so that's why it's used as a base. And I did try to consider efficiency with the approach taken here. The point being that there's only one extra stream to be added to poll (namely socketpair) which now becomes the "event interface": any event at all just needs to write to this stream to signal to poll to wake up.
Yes that's about right, although on bare-metal (eg stm32, esp8266) there is no final select call to lwip, it's done in
At this stage I'd rather just get the functionality and user interface correct, instead of trying to also rewrite all the internal event/polling code. And it's not clear how to do that yet on all supported platforms. Maybe you have ideas how to do it for esp32?
As I mention above, an alternative is to abandon select/poll and introduce something like In the future bare-metal ports like stm32 could optimise the |
It sounds like you've already made up your mind... To make this stuff reasonably efficient, I think what needs to happen is to change the way ioctl is used. I don't know whether the current semantics are dictated by CPython or not. Anyway, currently:
This could be augmented by:
In order to avoid having to convert all device drivers at once, poll could be made to deal with both types of signaling by first trying the new ioctl and reverting to the old method for devices that don't support it. With this type of change, whenever poll has to block, if all FD support the new callback functionality it can WFI or otherwise sleep 'til woken up. If at least one FD doesn't support the new ioctl then it has to do the current busy-wait dance but only needs to iterate through the ioctl(MP_STREAM_POLL_XX) for those devices. On the ESP32 the way LwIP would tie into all this is via a separate task (unless there's some new event driven interface to LwIP sockets). That task's sole purpose would be to handle the ioctl(MP_STREAM_REGISTER_RD), perform the actual poll or select call into LwIP, and make the callback to unblock the MP thread. Thoughts? |
Your comments about this are correct and there are some good ideas how to make it better with MP_STREAM_REGISTER_XX. But that's a bit of a tangent to the main issue of this PR, which is to have a way for external events to hook into uasyncio. Yes the issues are related, so it's a good discussion to have, but regardless of how poll works (or even if it's used) we still need to make a decision on how external events like pin change IRQs trigger a uasyncio task to wake up. The idea with this PR is that, even if there are 100 tasks await'ing on 100 different pins, the corresponding poll call is still O(1), not O(number of pins). If pins were made pollable via an ioctl call then await'ing on 100 pins could lead to a very inefficient poll, even if it did support MP_STREAM_REGISTER_XX (because it'll have to scan and register 100 objects on each poll call, unless there is sophisticated caching of the calls).
It would be possible (and probably a good idea) for the esp32 to switch away from using |
OK, just for the record, I didn't suggest:
I would build poll on top of events, not the other way around. But I'll stop beating this dead horse.
That's an interesting idea. My concern would be that this would mean that the socketpair trick to unblock poll when there's an event turns into a round-trip through LwIP on the other processor. If that's the case there's really a trade-off whether to prioritize the efficiency of Pin or other events vs the efficiency of socket I/O polling. There's also the problem of how one writes to the socketpair from an interrupt handler, that probably adds a helper task somewhere into the mix. |
Using such a scheme, can you explain how (eg) pin IRQs would wake the uasyncio scheduler? And how it would work on the unix port? That would give a good alternative to this PR.
The write is from a soft callback scheduled via |
Do I have a week?
That's cheating! The point of the socketpair is to wake up the MP task, and micropython.schedule() runs in the MP task, so to wake up the MP task the MP task has to run?? I assume it works now because the MP task never really sleeps and keeps polling. |
Sure.
The point of the socketpair is to wake the uasyncio scheduler. This is currently only done from Python code because that's the only way to make external events at the moment (internal ones like stream IO use ioctl polling). Remember that this PR is just about enabling user external events, not making polling event based. On stm32 it's an efficient sleep because it does a WFI to wait for the next IRQ (all events in the system must originate from an IRQ), eg a pin IRQ. Such a hard IRQ handler may call On unix it's an efficient sleep because the poll will wait forever until the socketpair becomes readable (eg written by another thread) or a signal arrives (external event) and that signal runs some Python code which writes to the socketpair. |
Well, I still believe you're cheating and let me explain why ;-).
Agreed, but making polling event based is the ultimate goal and my argument here is that by not stacking everything on top of ipoll we'd arrive at a better end result. (I think... there are so many moving parts it's difficult to be sure until it's implemented...) As to the cheating... The way I understand it, the core of the asyncio loop services three things (not necessarily in this order):
In your PR, Event.set runs in a sched-unit and it does two things:
So if the asyncio loop is blocked in step 2 (waiting on file descriptors) then the sched-unit cannot run since that requires the asyncio loop to pop out of step 2 and reach step 3. Chicken and egg problem. Right? But you gave two examples that work, so you must be cheating somehow. Here's how you are cheating. (Sorry, this is a bit long, I tried to lay it out in detail to convince myself I'm not smoking something here, even though in the end I may well be ;-). On unix, step 2 above means calling the poll system call with a timeout. If there's nothing to do, there should be no timeout. But the call to poll is wrapped in MP_HAL_RETRY_SYSCALL and that includes a call to mp_handle_pending, i.e. it services the sched-units. So the mere fact of sending the process a signal already pops the asyncio loop out of the poll system call. The socketpair is only a convoluted way to tell moduselect and MP_HAL_RETRY_SYSCALL to stop calling poll forever. In more detail, if I understand correctly (big IF!!!) the sequence of events is:
On STM32, if I understand things correctly (there's that big IF again!), https://github.com/micropython/micropython/blob/master/ports/stm32/stm32_it.c#L348 clears the SLEEPONEXIT bit, which means that when an interrupt handler returns, if the processor was in sleep mode on a WFI instruction the WFI completes (i.e., the processor does NOT go back to sleep mode). I have not traced through the code and stm32 has so many options that I have difficulties locating the right code, but, if ipoll were to block on a WFI instruction the sequence of events would be as follows:
So let me rephrase my overall argument as follows:
I believe the more difficult cases are the ports with RTOS 'cause they may force the blocking poll model for file descriptors without providing a notion of EINTR.
Overall, what I would be delighted to see is:
What this does is say "here's the loop and here's where it blocks" and "here's how you get to unblock it". The current PR has half the loop in asyncio and the other half in poll_poll_internal, and it says you have to go through the file descriptor machinery to unblock MP. Ugh, long post, I just hope I didn't mis-understand something major that makes it irrelevant. NB: while re-reading I noticed that on unix in the presence of threads there may be another source of wake-up that would map into the MP_UNBLOCK_FROM_OTHER_TASK macro but it's 'been a few years since I looked at kernel and user level threads on unix and linux so I don't know what the low-level implications are. |
Thanks for the detailed post. Just to give a short answer now:
|
If this scheme were implemented it'd need to support 1) the possibility for multiple MP_BLOCK call sites blocking at the same time; 2) the possibility that a mp_sched_schedule callback does not unblock MP_BLOCK. For 1 (simultaneous MP_BLOCK call sites) consider a port with threading and a different asyncio scheduler per thread, or one thread using asyncio and another just using poll on its own to wait for IO. This means that each active MP_BLOCK needs to have a unique entity associated with it (eg blocking_id), and this is used like MP_UNBLOCK_FROM_SCHED_SCHEDULE(blocking_id). (The socketpair essentially provides this unique id/object, and the current PR should work ok on unix with multiple threads, a mp_sched_schedule callback running in one thread and waking a different thread via the socketpair write.) For 2 (mp_sched_schedule callback not leading to an unblocking event) consider any of the following, which should not wake uasyncio:
Handling these situations requires mp_sched_schedule callbacks to execute during blocking, because these callbacks must decide if the blocking is finished. So the macro would be something like MP_BLOCK_WHILE_EXECUTING_CALLBACKS. And also the MP_UNBLOCK_xxx triggers must be called from Python code when it decides that it's time to wake up, eg via Note: the above assumes that MP_BLOCK should block precisely until there is something to do, otherwise uasyncio is woken unnecessarily. |
Thanks for highlighting all these corner cases ;-) I think what I'll do is back out the socketpair stuff and see whether I can replace it with a simple flag causing the polling to stop in bullet items 8 in my previous comment and trying to avoid any looping inside of poll_poll_internal. I'll see what all I run into...
It seems to me that unnecessarily waking up uasyncio might be an OK trade-off if it makes everything simpler (I'm not saying that it does). After all, the big cost of waking up from sleep mode or of context switching under unix is already paid and if an "idle" go-around the uasyncio loop is very expensive then that's a different problem in and of itself (which it currently kind'a is given how poll_map_poll queries each awaited device). |
On unix (and probably windows, zephyr, other OS/RTOS-based ports) I think the system poll() function (or equivalent) will be the only way to do an efficient sleep. And to wake it up requires writing to file descriptor that it is monitoring, ie socketpair. Constrained by this, and to make the same uasyncio code run on all ports, that's why I implemented a bare-metal socketpair. Note that the use of socketpair is really an implementation detail of uasyncio (and has nothing to do with poll itself) and the user shouldn't be exposed to it. So it could be swapped out for something else if that something else is better. That something else needs to provide 1) a way of creating the entity; 2) ability to register that entity with poll; 3) ability to signal that entity to wake up poll; 4) ability for poll to clear the signal that woke it up. |
free RX and TX on QTPY-ESP32S2 in non debug builds
The set of commits in this PR add an interface between soft interrupt handlers (ie
micropython.schedule
calls) and theuasyncio
module. For example, one can use it to build an async-pin which can be awaited on for an edge or level change.There's a lot to discuss here, and at the moment it's more of a proof-of-concept to show what's necessary to make it work.
The approach taken here has two parts to it:
select.poll
) to be woken asynchronously/externallyuasyncio.Event()
"thread safe" so it can be set from a soft interrupt handler, ie from something scheduled bymicropython.schedule
The solution for the first part was chosen so that it will work on the unix port, and it's also the same way that CPython works: create a special
socket.socketpair()
(like a unix file pipe), register that withselect.poll
and then write to it to signal that the asyncio scheduler should wake up. This is thread-safe and race-free (the signal can come in just before poll is called and it'll still work).The second part (making
Event.set()
thread safe) is done for efficiency: even if there are hundreds of external events (eg pin irqs, UARTs, ble events, etc) they do not put a burden on poll because only the socketpair (and IO streams) are registered with poll.The new things in this PR are:
micropython.schedule_lock()
andmicropython.schedule_unlock()
usocket.socketpair()
including a bare-metal version for ports that need itMICROPY_EVENT_POLL_HOOK
on stm32 to be atomic wrt waiting for eventsuasyncio.Event()
thread/IRQ safeThe way to use this feature is:
uasyncio.Event()
corresponding to the external eventmicropython.schedule()
Event
-- internally the event will schedule all waiting tasks (immediately in the soft callback) and then notify the asyncio poll via the socketpair that it should wake up and run the next task