Skip to content

[MNT]: Where to use _api.check_in_list #24828

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
oscargus opened this issue Dec 28, 2022 · 3 comments
Open

[MNT]: Where to use _api.check_in_list #24828

oscargus opened this issue Dec 28, 2022 · 3 comments
Labels

Comments

@oscargus
Copy link
Member

oscargus commented Dec 28, 2022

Summary

There's been some discussion how the "best" way to write if-clauses depending on a text parameter in combination with _api.check_in_list, e.g. #24486 (comment)

Proposed fix

There are three primary ways in the code base:

  1. Explicit checking all arguments twice
_api.check_in_list(["foo", "bar"], param=param)
if param == "foo":
     ...
elif param == "bar":
    ...
  1. Skip explicit checking of the last argument
_api.check_in_list(["foo", "bar"], param=param)
if param == "foo":
     ...
else: # "bar"
    ...
  1. Use _api.check_in_list for generating an error message once we know that no option is matching.
if param == "foo":
     ...
elif param == "bar":
    ...
else:
    _api.check_in_list(["foo", "bar"], param=param)

Personally, I'm in favor of 2. Primarily as it skips a redundant check and that it is not possible to get 100% coverage in case 1. Also, it is often the case that the _api.check_in_list is done in another place in the code (like in the constructor), so option 3 may lead to a delayed error.

I know that @timhoffm prefers 1 as that is more likely to lead to errors if a case is missed in the if-part.

(In addition, this should really be called _api.check_in_tuple and tuples being used as this will have some performance benefits when it comes to compiled code. Parts of the code use tuples already though...)

@anntzer
Copy link
Contributor

anntzer commented Dec 28, 2022

(In addition, this should really be called _api.check_in_tuple and tuples being used as this will have some performance benefits when it comes to compiled code. Parts of the code use tuples already though...)

It is true that passing a tuple is a bit faster, but the overhead of the function call is enormous compared to everything else anyways so micro-optimizing lists vs. tuples doesn't seem to make much sense.

In [1]: def check1(vals, **kwargs):
   ...:     for k, v in kwargs.items():
   ...:         if v not in vals: raise ValueError(f"{k}={v}, not in {vals}")
   ...: 
   ...: def check2(vals, name, val):
   ...:     if val not in vals: raise ValueError(f"{name}={val}, not in {vals}")
   ...: 
   ...: x = 2
   ...: %timeit if x not in [1, 2, 3]: raise ValueError(f"x={x}, not in [1, 2, 3]")
   ...: %timeit if x not in (1, 2, 3): raise ValueError(f"x={x}, not in (1, 2, 3)")
   ...: %timeit check1([1, 2, 3], x=x)
   ...: %timeit check1((1, 2, 3), x=x)
   ...: %timeit check2([1, 2, 3], "x", x)
   ...: %timeit check2((1, 2, 3), "x", x)
33.3 ns ± 0.12 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
33.4 ns ± 0.145 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
313 ns ± 0.405 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
267 ns ± 0.584 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
116 ns ± 1.67 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
72.6 ns ± 0.127 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

check1 is basically check_in_list, check2 is a simpler version that instead checks a single value (which covers virtually all calls to check_in_list). Note that this only times the successful case, as that should take precedence over the failing case.
I guess this argues in favor of option 3 (which is also the one I find aesthetically the best, basically matching a switch-case), which avoids the function call overhead in all successful cases.

@oscargus
Copy link
Member Author

A new insight:

For short methods that are often called with the correct argument and have some conditional execution anyway, it is better to use case 3 as error message generation rather than checking early.

@timhoffm
Copy link
Member

The advantage of case 1 and 2 is that the list passed to _api.check_in_list() is a hard gate for the supported values. No other values can pass beyond this point.

With case 3 we could add additional values to the if-else control flow but forget to update the check_in_list. It's values are only used in the error message and have no influence on control flow.

On a more general note: I think the timing discussion is meaningless micro-optimiztation. We're talking about 100-300ns per We don't have tens of thousands of them. Unless you can show you gain a measurable speedup for real applictations, the performance is not an argument for the preferred code structure. OTOH as said before readability and error-safety are better in case 1 and 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants