Skip to content

ENH: Make 'low' optional in randint #7151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

ENH: Make 'low' optional in randint #7151

wants to merge 1 commit into from

Conversation

gfyoung
Copy link
Contributor

@gfyoung gfyoung commented Jan 30, 2016

Added functionality to accept a combination of low and high parameter values such as
(None, None) or (None, <number>). The (None, None) option is different from np.empty because it doesn't fill an array with just Py_None values. Such options are useful for example if you want to generate numbers for various given dtype arguments but don't want to have to keep passing in low every time.

The resulting behavior is then defined as follows:

  1. low == None and high == None

Numbers are generated over the range [lowbnd, highbnd), where lowbnd = np.iinfo(dtype).min, and highbnd = np.iinfo(dtype).max, where dtype is the provided integral type.

  1. low != None and high == None

If low >= 0, numbers are still generated over the range [0, low), but if low < 0, numbers
are generated over the range [low, highbnd), where highbnd is defined as above.

  1. low == None and high != None

Numbers are generated over the range [lowbnd, high), where lowbnd is defined as above.

@gfyoung
Copy link
Contributor Author

gfyoung commented Feb 4, 2016

Can someone take a look at this?

@@ -34,11 +34,11 @@ class Randint(Benchmark):

def time_randint_fast(self):
"""Compare to uint32 below"""
np.random.randint(0, 2**30, size=10**5)
np.random.randint(low=0, high=2**30, size=10**5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should keep the invocation as it was before to ensure backwards compatibility. The same goes for all the other benchmarks and tests you modified in a similar way, especially the ones that just use the function but don't explicitly test it, such as test_multiarray.py, test_mem_overlap.py, etc. If necessary, add a couple more to use the new functionality instead of changing the old ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow what backwards compatibility you are trying to enforce here. I see no reason why tests should continue to use a "dated" API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API is not dated since I am going to guess that most existing code calls this function as np.random.randint(somenumber) and np.random.randint(somenumber, someothernumber). Just because the new version is a useful convenience does not mean that the old one is invalid or even less desirable. Removing the old calling convention from so many of the tests is just asking for trouble later down the line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API would become dated if this PR is merged because low no longer is a positional argument. While it may be worth adding a test to ensure the non-keyword argument call still works, I don't think it should persist in the code-base with this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing non-test code is imo ok, but when changing test code you need to make sure that no coverage of the old still supported way of calling it is lost. It is not obvious if that has been done.
Please add new test cases for the old way of calling the function if you have removed them.

I may have overlooked it but I also see no tests for the new None cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliantaylor : The tests are here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not very enthusuastic about changing all the old unit tests to use the new keywords.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO unit tests should always use the most up-to-date versions of the API for clarity and consistency. That is why I put one test in this PR for backwards compatibility in which I call the function using the "old" way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think guaranteeing that old functionality hasn't broken is more important than checking that new functionality works. I favor virtually never changing old unit tests, unless there is an intentional backwards incompatible change.

I would be OK with you leaving the old unit tests alone, but duplicating some of them and adding keywords in the duplicates, if your new unit tests below don't cover those cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically wrote a test here to address that concern. Maybe it should be expanded, but I don't follow why you would want to have all of your other tests testing the old functionality.

@madphysicist
Copy link
Contributor

  1. low != None and high == None

If low >= 0, numbers are still generated over the range [0, low), but if low < 0, numbers
are generated over the range [low, highbnd), where highbnd is defined as above.

One other thing that bothers me that case #2 has non-intuitive behavior (the old behavior) compared to how the rest of the function now works. At the very least there should be a very emphatic note in the docs saying that the previous behavior is a special case for the new function, preserved for backwards compatibility.

Alternatively, it may be better to introduce a wrapper for randint that has completely consistent behavior for all the argument combinations. That is basically what your code is doing anyway. Instead of munging the parameters directly in randint, why not keep the original function as-is and have a consistent wrapper that knows what to pass in to it?

@@ -3680,7 +3680,7 @@ def luf(lamdaexpr, *args, **kwargs):

Examples
--------
>>> x = np.random.randint(9, size=(3, 3))
>>> x = np.random.randint(low=9, size=(3, 3))
>>> x
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure your comment quite applies here because this isn't even documentation for np.randint where you are commenting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Removed.

@gfyoung
Copy link
Contributor Author

gfyoung commented Feb 4, 2016

I think I will go for the documentation emphasis on backwards incompatibility. I think the wrapper would add unnecessary complexity to already complicated function.

@gfyoung
Copy link
Contributor Author

gfyoung commented Feb 10, 2016

  1. Added test for old non-keyword behavior
  2. Expanded documentation and examples for randint
  3. Added low = None and high < 0 test for completeness

Travis and Appveyor are both happy, so should be good to merge then.

@gfyoung
Copy link
Contributor Author

gfyoung commented Feb 17, 2016

Can someone with merging capabilities take a look at this? I think it is ready to go.

@ahaldane
Copy link
Member

This is a good idea, and the behavior looks right.

I agree with some of the comments above though, where I replied.

where `lowbnd` is defined as previously. If only `high` is None,
then if `low` > 0, results are from [0, `low`). This behaviour is
intended to maintain **backwards compatibility**. Otherwise, results
are from [`low`, `highbnd`), where `highbnd` is defined as previously.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @madphysicist here that instead of saying "This behaviour is intended to maintain backwards compatibility", I would prefer to say something like "This is so that randint(n) will give a random integer from 0 to n".

I think most/many uses of randint are still going to use positional arguments, even after this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter explanation is confusing IMO: it seems very out of place when you compare it to the case when low=None and high != None. Backwards compatibility is the sole reason for me leaving the behavior as it is. Otherwise, I would have changed it.

Probably many will still use randint with positional arguments, but nevertheless people should be encouraged to adapt to the new API since it technically isn't breaking. I'm not sure I follow your reasoning as to why function calls in the code-base should continue to follow positional arguments when using = is a lot cleaner and clearer just because of ingrained user habits.

Added functionality to accept a combination
of 'low' and 'high' parameter values such as
(None, None) or (None, <number>). The (None,
None) option is different from np.empty because
it doesn't fill an array with 'Py_None' values.
@madphysicist
Copy link
Contributor

Just a question: have you bounced this off the mailing list? If so, would you mind posting the link here for reference? This is a big change, even if it does nominally maintain backwards compatibility. While I am personally not a huge fan of this change, but I do not think that should stop it from happening if the consensus is that people would like to see it done.

@gfyoung
Copy link
Contributor Author

gfyoung commented Feb 17, 2016

@madphysicist : There is an email out now, and the link is here

@madphysicist
Copy link
Contributor

I appreciate that. I think that if you get a lot of positive response, it will help convince the devs to merge your changes.

For future reference, here is the archive link: https://mail.scipy.org/pipermail/numpy-discussion/2016-February/075018.html

@gfyoung
Copy link
Contributor Author

gfyoung commented Feb 17, 2016

@madphysicist : Well certainly a positive response from the general mailing list couldn't hurt! :)

@gfyoung
Copy link
Contributor Author

gfyoung commented Feb 17, 2016

Alas, after posting to the email list, there didn't seem to be much support for this, so closing.

@gfyoung gfyoung closed this Feb 17, 2016
@gfyoung gfyoung deleted the randint_no_low branch February 17, 2016 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants