Skip to content

Towards consistent stream polling API across baremetal/unix ports #1550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pfalcon opened this issue Oct 29, 2015 · 38 comments
Closed

Towards consistent stream polling API across baremetal/unix ports #1550

pfalcon opened this issue Oct 29, 2015 · 38 comments
Labels
ports Relates to multiple ports, or a new/proposed port

Comments

@pfalcon
Copy link
Contributor

pfalcon commented Oct 29, 2015

What we have now is:

  • unix port lacks builtin polling mechanism, instead select.epoll is implemented in micropython-lib. epoll() is Linux-specific (but the most efficient after POSIX select() and poll()).
  • stmhal has builtin select.poll(), which roughly corresponds to Python3 select.poll() (e.g. upstream poll.poll() takes timeout in milliseconds, which is very convenient for us). One glaring incompatibility is return value of poll.poll() - instead of list of tuples (fd, flags) it returns just list of "fds".

Proposed:

  1. Make stmhal's return value be compliant with upstream API (and read/write ready flags are important on their own). If there's concern of efficiency, return list size can be limited (micropython-lib limits it to 1). Further "dangerous" optimization may be to have static tuple and/or list object and reuse it (user app should not cache any such values, that's fair assumption, and well, nowhere poll.poll() description says it returns fresh objects ;-) ).
  2. As a stopgap measure, micropython-lib to provide select.poll implementation in terms of select.epoll. Next step is to actually implement select.poll to improve portability.
  3. As a step forward, we actually should look into providing upward path to support "selectors" module, which allows to specify "callback" data together with "fd". micropython-lib actually implements such extension already. So, for step 1, I'd suggest to return 3-tuple right away, even if last element is None.
@dpgeorge
Copy link
Member

One glaring incompatibility is return value of poll.poll() - instead of list of tuples (fd, flags) it returns just list of "fds".

? https://github.com/micropython/micropython/blob/master/stmhal/moduselect.c#L250-L251

@pfalcon
Copy link
Contributor Author

pfalcon commented Oct 29, 2015

http://docs.micropython.org/en/latest/library/select.html#select.poll.poll

Wait for at least one of the registered objects to become ready. Returns list of ready objects, or empty list on timeout.

Nowhere it makes clear that it returns list of tuples. I'll take care of updating the docs. Thanks.

@pfalcon pfalcon added the ports Relates to multiple ports, or a new/proposed port label Oct 29, 2015
@pfalcon
Copy link
Contributor Author

pfalcon commented Oct 29, 2015

Ok, docs are updated: 98b6d35 .

Next issue: stmhal doesn't define symbolic constants for events. And then Linux e.g. has: http://lxr.free-electrons.com/source/include/uapi/asm-generic/poll.h#L5 . So, there's POLLPRI stuck in front of POLLOUT, making it have value of 4. At least, as that file says, values are governed by iBCS2, which is still "intel", but it being in asm-generic gives out good hope...

Anyway, what to do about this?

@dpgeorge
Copy link
Member

dpgeorge commented Nov 7, 2015

Anyway, what to do about this?

I'd say add POLL* constants to stmhal's select module, called as per CPython (POLLIN, POLLOUT, etc). And to keep things sane and portable, lets just use the numeric values as per asm-generic/poll.h (then same will go for errno, and others).

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 7, 2015

Thanks, that's the solution I'd be looking for. Any concerns about handling of integer 2 value? We could check for it, and throw an error with instructions on upgrade. If there're no big concerns, I'd rather skip that, as it's carrying around unneeded luggage.

@dpgeorge
Copy link
Member

dpgeorge commented Nov 8, 2015

@pfalcon regarding 98b6d35 : returning a tuple which is not exactly 2 elements in size might lead to compatibility issues with CPython. Eg code that does obj, ev = result[0] will be broken. What do you think?

Any concerns about handling of integer 2 value?

I don't have big concerns, so we can skip providing an upgrade warning. What will happen instead is that polled objects will just never be available for "reading" so that should be a relatively easy error to track down.

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 9, 2015

Eg code that does obj, ev = result[0] will be broken. What do you think?

So, the idea behind that stuff is to be able to specify "callback data" in Poll.register() and get that data back in .poll() result. This is required for efficient processing of poll results. Yes, that's extension to CPython, but will let to standardize on simple clean interface Poll provides. The way CPython does it is via "selectors" module. I was ready to implement it for unix port (in micropython-lib), but I think it will be too much bloat for bare-metal. So, given all these reasons, I think it's acceptable if "obj, ev = result[0]" won't work and more specific code is required.

Note that all this is tentative yet and will take time to be fully worked out, and there's possibility to handle the case discussed in compatible way (by returning 2-tuple in case non-standard callback data wasn't passed). But I'm actually thinking about optimizations which will put even more constraints on what can be done with result of .poll() - like mentioned above having a singleton return structure and refilling it on each return, instead of allocating new each time.

@dpgeorge
Copy link
Member

dpgeorge commented Nov 9, 2015

So, the idea behind that stuff is to be able to specify "callback data" in Poll.register() and get that data back in .poll() result. This is required for efficient processing of poll results.

I don't have any experience using selectors so would need to see and understand a use case of returning "callback data" before I could form a better opinion on this. I'd really rather not break CPython compatibility like this if we really don't need to. (Although one could look through CPython lib to see if this would indeed break things. It might not.)

having a singleton return structure and refilling it on each return, instead of allocating new each time.

Could always implement poll.poll_into(list[, timeout])

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 13, 2015

I don't have any experience using selectors so would need to see and understand a use case of returning "callback data" before I could form a better opinion on this.

That's one of pretty big advantages of Linux epoll vs standard POSIX poll - almost always you need more state to handle particular connection than just fd. epoll allows to just set arbitrary data structure, thread pointer, coroutine pointer, and get it back directly (and know all this stuff is handled efficiently, in kernel). With poll, you need to map returned fd into your data/control structure yourself.

This feature is used by uasyncio for example - it associates a coroutine with waited fd, to resume it directly.

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 13, 2015

having a singleton return structure and refilling it on each return, instead of allocating new each time.
Could always implement poll.poll_into(list[, timeout])

Yes, but why? The implementation I described would still be compliant with description of .poll() - indeed, it doesn't prohibit such behavior.

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 13, 2015

This feature is used by uasyncio for example - it associates a coroutine with waited fd, to resume it directly.

And well, to pronounce it explicitly (though I guessed it was obvious), the drive of this work is to run uasyncio on baremetal ports.

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 13, 2015

Ok, with bunch of thinking, and practical hurdles like #583 (comment) , I have following plan for unix implementation of Poll:

  1. Move it into C code.
  2. Base it on POSIX poll() after all. The motivation for using epoll() was the highest promised performance. But:
    2.1. I read suggestion that for less than 1000 monitored fds, there won't be noticeable performance difference in poll() vs epoll(). Also, sources suggest that benefit of epoll() depends of type of workload (because it takes a syscall to add/remove each individual fd to a polled set, so if it's done too often, syscall overhead can eat into performance).
    2.2. Poll is more portable after all.
    2.3. As described above, epoll allows to map arbitrary object to fd, but due to way uPy GC works, we still need to keep collection of such mapped objects, to preclude them from GCing (they are referenced by kernel, but uPy doesn't know that, so if other Python data doesn't keep ref to it, it may be collected). Then, this bookkeeping collection is either memory-hungry, or not so efficient. So, epoll + bookkeep may have about the same performance as poll + simpler bookkeep.
  3. I also would like to have mechanism to poll for non-fd objects (e.g. lwIP sockets). This also calls for C impl.

@dpgeorge
Copy link
Member

Yes, by why? Such implementation would still be compliant with description of .poll() - indeed, it doesn't prohibit such behavior.

True, it does return a list. But the tuples in the list would need to be reallocated each time because a tuple is read only you can expect to keep a reference to it and for it not to change.

And if there is ever multithreading support in uPy, it'd need a separate poll buffer for each thread.

Regarding epoll: it seems that the Python interface to this does not support registering additional data along with the fd.

I also would like to have mechanism to poll for non-fd objects (e.g. lwIP sockets). This also calls for C impl.

This is how stmhal (and cc3200?) work, since there is not really any concept of a file descriptor.

If you can use non-fd objects then that provides a way to attach data to the polling event: just pass a tuple (fd, data) as the entity to register, and then the same tuple is returned when an event occurs. That way the GC refs are automatically stored.

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 21, 2015

because a tuple is read only you can expect to keep a reference to it and for it not to change.

Keeping a reference to tuple returned from poll.poll() doesn't make sense.

And if there is ever multithreading support in uPy, it'd need a separate poll buffer for each thread.

Well, all buffers belong to particular poll object instance.

Regarding epoll: it seems that the Python interface to this does not support registering additional data along with the fd.

But C does, and micropython-lib supports that, and usasyncio depends on it.

This is how stmhal (and cc3200?) work, since there is not really any concept of a file descriptor.

Sure, and I'm talking about integrating of both native POSIX fd's and user-space i/o objects.

If you can use non-fd objects then that provides a way to attach data to the polling event: just pass a tuple

Well, tuple definitely doesn't know how to be waited on. And even tuple is pretty expensive object - I'm currently doing optimizations on uasyncio to reduce memory allocation for bookkeeping purposes like argument storage. (You should like this approach of avoiding memory allocation, right? ;-) )

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 21, 2015

Other thoughts of optimizing poll implementation (with uasyncio as client, as usual):

  1. Poll constructor should take an initial capacity.
  2. Also, for robustness, should take maximum capacity.
  3. Need to have operation "register or modify". Not sure how to go about this - add separate method like .regmod() or a flag to .register() or .modify() ?
  4. Need to support one-shot mode (after event fired, it (or all events??) are automatically disabled for fd). This is again how uasyncio works - as it emulates sequential program flow, it's interested only of next event of particular, and once it fired, not interested in anything else on that fd unless explicitly asked for something again (or otherwise a coroutine in .sleep() may be woken up up by data arriving on fd it read some time in the past - but heck, now there's nothing can even read it - coroutine code does something else). epoll allows to register one-shot mode per-fd, by ORing with EPOLLONESHOT. There's no place to store such flag for poll() to emulate the behavior, only if trying to use "unused" POLL* bits. But that's gotta be portability hazard. But as described above, such granularity isn't needed for uasyncio. So, I propose to add flag to poll.poll() to make it work in one-shot mode.

I also consider an optimization, where poll has special knowledge about socket objects, and each socket object stores its last used index in a poll map (which is just an array for POSIX poll()).

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 29, 2015

@dpgeorge : Would be nice to hear your opinion on the matters above, if you have interest/time, otherwise my plan is to add optional arg to .register(), and to positional optional args to poll.poll() to cover usecases above.

@pfalcon
Copy link
Contributor Author

pfalcon commented Dec 4, 2015

I changed my mind and already coded .regmod(), then went to read cpython docs about details of .register() behavior to make it compliant. D'oh:

Registering a file descriptor that’s already registered is not an error, and has the same effect as registering the descriptor exactly once.

So, poll.register() already has the desired behavior. Well, depends on the interpretation of "the same effect as registering the descriptor exactly once". This effect, what it is - ignoring any .register() request for existing fd, or just making the last call have effect? Obviously, only the latter behavior is useful.

Another CPython miracle is that 4 different implementation of poll objects have 4 different ways how .register() works for each implementation, namely: explicitly undefined, implicitly undefined, allowed (with ambiguity above), raising exception.

@dpgeorge
Copy link
Member

dpgeorge commented Dec 5, 2015

@pfalcon sorry got lost, will think about it and reply.

@pfalcon
Copy link
Contributor Author

pfalcon commented Dec 5, 2015

I merged new implementation of .register() in 082b121. I tested that following testcase work as expected in both CPython and MicroPython (i.e. last .register() call has effect):

try:
    from uselect import *
except ImportError:
    from select import *

p = poll()
print(p)
print(p.register(0, 0))
print(p.register(0, POLLIN))
print(p.poll())

@pfalcon
Copy link
Contributor Author

pfalcon commented Dec 5, 2015

And that was last minimal change required to switch uasyncio from FFI-based epoll implementation to uselect.poll . And I did correctness testing to make sure it all works as expected (basic one of course, there should be more advanced tests), and then performance tests. So, for short requests, even with more or less high concurrency (10000 of 100 concurrency), poll() is faster. However, for large requests (>6MB) with moderate concurrency (30), on a overloaded system, poll() is 2 times slower. I rebooted my box to get a clean environment, but even with no user apps running (only X and daemons), I get 100% variation of poll() running times. Switching to epoll(), I get both less variable results, and, taking minimal poll() time I saw, epoll() is still 25% faster.

This pretty well matches what I read about poll() vs epoll(), though I didn't think it would be so visible (and I still don't believe it). Anyway, it's like that - switching to portable poll() means losing good deal of large-request performance :-(.

I also implemented poll.poll() one-shot mode as described above, but I wasn't able to quantify its result on the tests above - due to high variability and low enough concurrency (30, i.e. poll array size to scan is merely 30). Its effect would be more visible with concurrency of 1000. Anyway, it's obvious CPU cycle-saver, so let's have it.

@dpgeorge
Copy link
Member

dpgeorge commented Dec 6, 2015

I grep'd through the stdlib and site-packages that I have on my machine for use of (e)poll. It's difficult to find (maybe because it's not used often? I find only 12 non-test scripts that import select). The two cases I found use poll in the same way:

# site-packages/requests/packages/urllib3/util/connection.py:
for (fno, ev) in p.poll(0.0):

and:

# site-packages/serial/serialposix.py
for fd, event in poll.poll(self._timeout*1000):

Such usage would break if poll() returned a list of tuples that had 3 elements (fd, ev, user_data). At least the breakage will raise an exception and can be fixed manually, instead of failing silently.

epoll allows to register one-shot mode per-fd, by ORing with EPOLLONESHOT. There's no place to store such flag for poll() to emulate the behavior, only if trying to use "unused" POLL* bits. But that's gotta be portability hazard.

I would just use "unused" POLL bits in the constants provided at the Python level. At the C level it can be translated to a separate flag so not to clash with the C bits.

But as described above, such granularity isn't needed for uasyncio. So, I propose to add flag to poll.poll() to make it work in one-shot mode.

But does that mean that poll.poll(oneshot=True) applies to all fds that have been registered? Or just the ones that fire for this particular call? Ie, does it disable everything after firing, or just those that fire and are returned? I guess the latter is the most sensible thing to do.

A shame about the lack of performance of poll vs epoll.

Given all this (CPy compatibility, performance) why not create our own portable poll class (ppoll?) that has the semantics we want and works efficiently on all machines? Behind the scenes on Linux it can use epoll if available. Other platforms can fallback to poll (or something else). The API can be close to poll/epoll so that it requires minimal effort to provide a CPy poll class in terms of "ppoll".

@pfalcon
Copy link
Contributor Author

pfalcon commented Dec 8, 2015

It's difficult to find (maybe because it's not used often?

Yes, I guess so. Someone who needs to do something quick, just uses select(). That's why I'm not too concerned to add various optimizations which assume various restrictions on usage.

The two cases I found use poll in the same way:
for (fno, ev) in p.poll(0.0):

Ok, then we can pass additional flags to .poll() for extended usage, as I suggested above:

for (fno, ev, data) in p.poll(0.0, select.ONESHOT | select.DATA):

I would just use "unused" POLL bits in the constants provided at the Python level. At the C level it can be translated to a separate flag so not to clash with the C bits.

But there's no place to store them on C level.

But does that mean that poll.poll(oneshot=True) applies to all fds that have been registered? Or just the ones that fire for this particular call? Ie, does it disable everything after firing, or just those that fire and are returned? I guess the latter is the most sensible thing to do.

Of course, one-shot works as: fd is "charged", then if event is triggered on it, it's "discharged", and no longer triggers until explicitly charged again. The behavior is tied to fds which trigger events, and don't affect other fds. So, the whole semantics can be handled purely on .poll() level, not on .register() (though that makes it incompatible with epoll()).

Given all this (CPy compatibility, performance) why not create our own portable poll class (ppoll?)

Because it's duplication of functionality and code. Why, if we need only fine adjustments to already existing behavior of Poll?

Behind the scenes on Linux it can use epoll if available.

That's again duplication of code. After "revelation" regarding dynamic loading and FFI usage, I gave a good thought what I want to achieve with all this. And that's being able to use MicroPython everywhere (well, run software on any MicroPython install). And way to achieve that is to work on the most performant of portable APIs (and that's poll()), and work on consistent support for baremetal. Supporting Linux-specific solution just complicates the task. epoll() implementation stays available in micropython-lib, and someone interested in it (maybe me later), perhaps later can reintegrate it into uasyncio. But current aim is to make it usable in real-world, not just on development machines.

@pfalcon
Copy link
Contributor Author

pfalcon commented Dec 13, 2015

Ok, I switched uasyncio to use uselect.poll().

Next TODO for unix port: accept arbitrary stream objects, not just fd's. But that requires implementing "callback data" support (if we pass in object, we want to get it back, not just its fd, right?)

But that's still in backlog while I'm trying to get a sample webapp running on a static unix port, and there're bunch of related issue to solve first for that...

@pfalcon
Copy link
Contributor Author

pfalcon commented Aug 6, 2016

Next TODO for unix port: accept arbitrary stream objects, not just fd's. But that requires implementing "callback data" support (if we pass in object, we want to get it back, not just its fd, right?)

Can't believe this still wasn't done :-(. I'm adding a stopgap implementation which can automagically (i.e. even without calling .fileno() method) extract fd from file and socket object. This will still return a raw fd from .poll() method, but still useful for number of usecases (like just adding timeout on a socket).

@pfalcon
Copy link
Contributor Author

pfalcon commented Dec 30, 2016

Ok, time to have a final pass/decision on how to make (or not make) baremetal and unix uselect modules consistent, and how to optimize (or not optimize) them (further) for uasyncio.

The main client for these changes is, as before, uasyncio. And uasyncio doesn't care about about getting "file descriptor" or "socket object" from poll.poll() call. What it needs is "user data" (in the discussion above called "callback data") as associated with a file-like object. As neither unix nor baremetal uselect provides such functionality, uasyncio does such mapping itself, using a dictionary. For unix port, this dictionary has also another purpose: to keep references to file-like objects, to preclude them from being GCed while OS' poll works on them.

Now, the state of unix uselect module is that it can accept a file/socket in call to poll.register(), just like baremetal, but unlike it, as a result of poll.poll() call, will return an underlying file descriptor, not the original object. With unix port, a file/socket object can also be GCed while it's registered in poll object (because it's actually underlying fd is what's registered).

Suppose we want to make unix implementation be compliant with baremetal one, and return an original object passed to poll.register(). That will take storing this object in the internal internal array of poll object, in addition to already stored array of struct pollfd. One semi-naive thought here would be "but if we store a file object there, we can store any "user data" object instead of it". That's not exactly true due to GC considerations above - to protect a file object from GC, we must store reference to it (unless we can prove that something else will guranteedly store file object reference).

But all in all, if we store a file-like object in poll internal structures, it's just one step ahead to store additional information along with it. This is true for both unix and baremetal uselect's.

Now, what we can do, specifically:

  1. Don't do anything on C side, do everything on Python side, like uasyncio on esp8266 micropython-lib#141 does. This is the least preferrable, CPython-inconsistent, fiasco-style solution.
  2. Make baremetal sockets, etc. have a .fileno() returning something (self, or .id()), then https://github.com/micropython/micropython-lib/blob/master/uasyncio/uasyncio/__init__.py#L19 can just unconditionally do self.objmap[fileobj.fileno()] = (cb, args). This won't work well anyway.
  3. Make unix poll.poll() return file object if that was passed in, making it consistent with baremetal poll (at the expense of storing these file objects in poll object). File-like objects then need to be hashable, so can be used as keys in a dict.
  4. The above, and also add support for storing user data along with file object for both unix and baremetal poll.

The neutral point is apparently p.3, and it also lies on the criticial path to p.4. But @dpgeorge, I wonder if you're ready for p.4, after a year in discussion.

@pfalcon
Copy link
Contributor Author

pfalcon commented Dec 30, 2016

Make unix poll.poll() return file object if that was passed in, making it consistent with baremetal poll (at the expense of storing these file objects in poll object).

Implemented in 093a8f5.

File-like objects then need to be hashable, so can be used as keys in a dict.

Before I go and patch all files and sockets to implement MP_UNARY_OP_HASH, please confirm that you don't want to got for solution 4. It would be much better solution. For extmod/moduselect, We can stop caching ->ioctl and use it for user data, while staying within 16 bytes. One caveat is that it's useful to have 2 words of user data, e.g. a callbacks and arguments (single one or tuple of many). But we can scope-creep it step by step ;-).

@pfalcon
Copy link
Contributor Author

pfalcon commented Jan 5, 2017

@dpgeorge: Ping ^^^

@dpgeorge
Copy link
Member

dpgeorge commented Jan 5, 2017

I wonder if you're ready for p.4, after a year in discussion.

Point 4 is good (it's efficient) but the question for me is about the user-facing API. I don't want to have poll.poll() return a 3-tuple and breaking compatibility with CPython.

The "selectors" module is really what we need here so why not just implement it? It can be done efficiently (in C) using pretty much all the existing code in extmod/moduselect. Just make existing poll_register and poll_modify take an extra optional "data" arg, and set up some aliases DefaultSelector=PollSelector=select.poll.

If that's too much, then just do the modifications to add "data" args to register and modify, and have poll.poll() return a 2-tuple but where the first arg is a tuple of the file-like-object and the user data.

@pfalcon
Copy link
Contributor Author

pfalcon commented Jan 5, 2017

Point 4 is good (it's efficient) but the question for me is about the user-facing API. I don't want to have poll.poll() return a 3-tuple and breaking compatibility with CPython.

My question is mostly about desire to store additional information in poll objects. API-wise, we can do .poll_into(), or maybe even .poll_iter() which would return an iterator.

The "selectors" module is really what we need here so why not just implement it?

That was my original idea, and how it evolves after that is I'm trying to add selectors-like functionality to existing select, to not bloat with adding a separate module (especially that for efficiency, we need another "into"-style API anyway).

@pfalcon
Copy link
Contributor Author

pfalcon commented Jan 26, 2017

Ok, I'd like to move further with this. The key point is whether we can constrain number of returned events for different underlying polling implementations to 1. Answer: we can for baremetal poll (we can do anything), we can for Linux epoll (epoll_wait takes maxevents param). But we can't for standard POSIX poll().

So, then:

Have an poll.ipoll() return an iterator (allocate-once apparently). Then its __next__() is to return once-allocated list (given users a hint that they shouldn't store it) of [fileobj, ev, userdata] (maybe more elements later).

So, the use would be:

for f, ev, data in poller.ipoll(delay):
    ...

Btw, using internal pre-allocated list and sequence unpacking may make a more natural Python syntactic pattern after all, then external pre-allocated list, passed as an argument. Not as explicit about the fact that the list may change between the invocations, though.

@dpgeorge
Copy link
Member

Have an poll.ipoll() return an iterator

Sounds ok, hopefully it indeed doesn't allocate RAM in practice.

So, the use would be: for f, ev, data in poller.ipoll(delay):

If code is going to be written like that then we can't change the number of elements returned by ipoll's next. So should make a decision now that [fileobj, ev, userdata] is good enough.

@pfalcon
Copy link
Contributor Author

pfalcon commented Jan 27, 2017

If code is going to be written like that then we can't change the number of elements returned by ipoll's next.

How we can't, of course we can. I don't see any issue even with adding extra param to .poll() - we have bigger differences anyway. And of course we can state what's implied explicitly in the docs - "this function is provisional for next 5 (or 10) years". But the caveat is that there's no docs for .ipoll(), and before they can be written, docs for other things having existed for some time, need to be written first.

Btw, with the plan like above I keep thinking whether returning a "caller-owned" tuple is better after all (than a list). With a tuple there's more or less big issue - that it would be changing tuple, but at least it's the only one. And it would be hard to abuse it for bad. So, we just document it as a special kind of tuple, voila, problem solved. With a list, there're lot of "smaller", but more severe issues, like users being able to change size of this list, etc. So, a reliable solution will be inventing a special type of list whose size can't be changed and some other operations are disabled. Well, it's better to invent a special kind of tuple, whose values can change by external force. At least that requires 0 lines of code.

@dpgeorge
Copy link
Member

How we can't, of course we can.

But then we break our own code as well as other people's... not a fun situation. The other way of doing it is for (f, ev, data, *rest) in poller.ipoll(delay): That allows for changes in ipoll's return tuple. Except it probably allocates on the heap. So then just a simple for x in poller.ipoll(delay): x[0], x[1], x[2] etc.

Btw, with the plan like above I keep thinking whether returning a "caller-owned" tuple is better after all (than a list).

I think at this stage the best thing to do is make a proof of concept patch based on the discussion so far, then evaluate the code.

@pfalcon
Copy link
Contributor Author

pfalcon commented Feb 6, 2017

Initial ipoll() impl posted: #2841

@larsks
Copy link
Contributor

larsks commented Nov 9, 2017

I'm curious about this:

File-like objects then need to be hashable, so can be used as keys in a dict.

Has there been any progress on this front? I naively tried to set up a dictionary of <object returned by poll>: <handler function>, which of course didn't work. I'm using id(<object returned by poll>) right now, which works but seems hacky.

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 9, 2017

I'm using id(<object returned by poll>) right now, which works but seems hacky.

Per https://github.com/micropython/micropython/wiki/ContributorGuidelines#django-criteria , that's the correct approach. Anything you can do, please do the way you can. The focus of adding new features is features which can't, or very cumbersome, to do otherwise.

@pfalcon
Copy link
Contributor Author

pfalcon commented Sep 27, 2018

Careful inspection shows that we still have consistency issues:

  1. Unix poll.modify(obj, events) doesn't raise ENOENT if obj doesn't exist, while baremetal does. Given that there's a way to avoid raising exception - use .register() (which will just add object if not yet in poller), then the fix is clear - unix should raise exception too.

  2. The situation with poll.unregister(obj) is more subtle - both modules don't raise ENOENT if obj is not in poller, while CPython docs say they should. The reason for this is that not raising exceptions, i.e. idempotent operation, is "more optimal". Funnily, uasyncio still has try/except bracket for it, apparently from times when uasyncio still ran under CPython. At the same time, discussion at uasyncio: KeyError in remove_reader (MacOS) micropython-lib#248 shows that people (cluelessly) rely on no-raise behavior.

So, we need to decide what to do about unregister(). Either we:

  1. Say that specifically for the uselect module, the semantics of unregister() is that it doesn't raise an exception, or
  2. We add an optional flag to either raise exc in CPython compatible way, or skip raising exception (in this case, probably worth returning a bool telling whether obj was found or not).

It's worth thinking how to do that flag right, as it's probably not the last case when it's useful to select between raising exceptions and not. I believe we already had a discussion that usage of not negated words is preferred, e.g. raise=False is preferred to no_raise=True. But raise is keyword anyway, so we'd use throw, which would be similar terminology as for generator's .throw() method.

And still, "nothrow" is a term on its own, e.g. https://en.cppreference.com/w/cpp/memory/new/nothrow (damn, they now have noexcept too: https://en.cppreference.com/w/cpp/language/noexcept_spec).

But if this flag will ever be a keyword param (not expected to happen), just be able to reuse "throw" qstr calls for it to be named "throw", not "nothrow".

tannewt pushed a commit to tannewt/circuitpython that referenced this issue Feb 19, 2019
@dpgeorge
Copy link
Member

  1. Unix poll.modify(obj, events) doesn't raise ENOENT if obj doesn't exist,

This was fixed in b9bad7f

2. The situation with poll.unregister(obj) is more subtle - both modules don't raise ENOENT if obj is not in poller, while CPython docs say they should.

This is tracked by a TODO in the code.


I'd say this issue can be closed. The select module has been working well for a while now, with uasyncio v3 running well on both unix and bare-metal. #6125 discusses future optimisations of stream polling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ports Relates to multiple ports, or a new/proposed port
Projects
None yet
Development

No branches or pull requests

3 participants