Skip to content

randomkit.h is not included in NumPy 1.17 builds #14517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MSeifert04 opened this issue Sep 15, 2019 · 30 comments
Closed

randomkit.h is not included in NumPy 1.17 builds #14517

MSeifert04 opened this issue Sep 15, 2019 · 30 comments

Comments

@MSeifert04
Copy link
Contributor

MSeifert04 commented Sep 15, 2019

There has been a recent StackOverflow question what happened to the randomkit.h that was included in the numpy/random directory of the binary distributions of NumPy < 1.17 and isn't included in NumPy 1.17+.

The removal of the header file wasn't mentioned in the changelog so I wonder if that was accidental or intentional and if it should have been mentioned in the changes.

@charris charris added this to the 1.17.3 release milestone Sep 15, 2019
@charris
Copy link
Member

charris commented Sep 15, 2019

Hmm, I suspect it got lost. Looks like in 1.16.x it was added in numpy/random/setup.py.

    config.add_data_files(('.', join('mtrand', 'randomkit.h')))

And that is missing in 1.17.x.

@charris
Copy link
Member

charris commented Sep 15, 2019

@MSeifert04 That said, I don't think the random internals were meant to be used downstream, which seems to be the case with the StackOverflow user. Perhaps the best solution to his problem would be to copy the relevant files.

@mattip
Copy link
Member

mattip commented Sep 15, 2019

The recommended use of cython, numba and cffi with numpy.random is now documented, and we even have some examples. The examples should be somehow linked to the documentation, suggestions welcome.

@paul-the-noob
Copy link

(I am that SO user.) numpy.random.distributions.pxd which one of the examples cimports seems to be missing from 17.2 binary dist. common.pxd and bit_generator.pxd are there, legacy_distributions.pxd and bounded_integers.pxd are not. The last one looks relevant to my use case.

@mattip
Copy link
Member

mattip commented Sep 15, 2019

Thanks for the feedback. That seems like an oversight.

@mattip
Copy link
Member

mattip commented Sep 15, 2019

@bashtage should we be including more pxd files?

@bashtage
Copy link
Contributor

I do not believe that randomkit is used any more. C and h should be removed from 1.17.

@bashtage
Copy link
Contributor

The question about including the pxd comes does to whether the low level api side of things is to be supported. These simplify using the numpy random infrastructure in cython.

@paul-the-noob
Copy link

Should you decide not to officially support these interfaces, you probably want to mention this in the docs/examples referenced above.

@rgommers
Copy link
Member

It would be nice to revisit the API here even if we do keep it, there's things like

from numpy.random.common cimport bitgen_t
from numpy.random.distributions  cimport random_gauss_zig

The common looks weird; distributions makes sense as a name but doesn't mirror the Python API, and I'm not sure what the random_gauss_zig name is about.

@charris
Copy link
Member

charris commented Sep 17, 2019

I'm not sure what the random_gauss_zig name is about.

Probably random_normal_ziggurat would be better :) I'm in favor of having an API if we did before, apparently it was used. But the naming and organization should be laid out.

@bashtage
Copy link
Contributor

Certainly could use a clean-up. random_gauss_zig should be random_gauss. It wasn't random_gauss since this name was taking by the BM normal generator at the time. If the legagy generator is rolled back to the 1.16 version, then naming may get a little simpler.

bitgen_t should probably have its own pxd file, but was added common to avoid creating another pxd file. common is a set of functions shared by Generator-like. Should be sort of private, but it useful is someone wants to write a custom Generator since it exposes methods to automagically handle all broadcasting.

I think randomkit, if restored, should probably remain undocumented at the API level.

@rgommers
Copy link
Member

I think randomkit, if restored, should probably remain undocumented at the API level.

Then let's not restore it ....

random_gauss_zig should be random_gauss. It wasn't random_gauss since this name was taking by the BM normal generator at the time

This seems to be the algorithm that implements Generator.standard_normal, so why not standard_normal?

In general, the Python and Cython API should match unless there's a good reason to deviate. And imho that should apply to function/method names and namespaces alike.

@bashtage
Copy link
Contributor

This seems to be the algorithm that implements Generator.standard_normal, so why not standard_normal?

I'm not 100% on how name resolution works in Cython. Probably OK , at least if we use a module, distributions.standard_normal instead of just standard_normal. The random_ prefix is inherited from randomkit. This is probably the right idea and solves most of the bikeshedding.

Then let's not restore it ....

Having though a bit harder I don't think it is possible to go down that path. See #14531 (comment)

@bashtage
Copy link
Contributor

I'm in favor of having an API if we did before, apparently it was used.

There was no API before. There was a long-standing issue asking for an official API that got closed when randomgen was merged.

@rgommers
Copy link
Member

Having though a bit harder I don't think it is possible to go down that path. See #14531 (comment)

I just meant randomkit.h. We can restore all of the pre-randomgen code to fix the backwards compat issues, but that doesn't mean we have to put back private header files.

@bashtage
Copy link
Contributor

We can restore all of the pre-randomgen code to fix the backwards compat issues, but that doesn't mean we have to put back private header files.

I not sure it is that easy. Unless the feature of charing a common source of randomness between the legacy methods and the new methods is abandoned. The code in randomkit would need to be ported to the new structure.

@rgommers
Copy link
Member

Quick cleanup PR till we're ready to add a complete C/Cython API opened: gh-14562

@rgommers
Copy link
Member

Unless the feature of charing a common source of randomness between the legacy methods and the new methods is abandoned. The code in randomkit would need to be ported to the new structure.

Good point. In the community call on Wednesday there was anyway a preference to fix issues as they arise, and not restore all the legacy code to its 1.16.x status.

@mattip
Copy link
Member

mattip commented Oct 10, 2019

Summarizing:

@MSeifert04 said

what happened to the randomkit.h that was included in the numpy/random directory of the binary distributions of NumPy < 1.17 and isn't included in NumPy 1.17+.

conclusion:
Depending on third-party vendored code in your project seems unreliable. If needed that code should be copied to your project. No conclusion was reached about whether, as policy, we should describe removed/added vendored code in the release notes, but I think that is a separate issue.

@paul-the-noob said

numpy.random.distributions.pxd which one of the examples cimports seems to be missing from 17.2 binary dist. common.pxd and bit_generator.pxd are there, legacy_distributions.pxd and bounded_integers.pxd are not. The last one looks relevant to my use case.

Could you describe your use case? It seems we would prefer not to export bounded_integers as a public C-API.

@rgommers
Copy link
Member

Depending on third-party vendored code in your project seems unreliable. If needed that code should be copied to your project. No conclusion was reached about whether, as policy, we should describe removed/added vendored code in the release notes, but I think that is a separate issue.

Vendored code vs self-written code is irrelevant. What matters is: does it look like a public API or not. If not, change it or rip it out any time. If it does (or is ambiguous), it needs to be treated as such.

In this case, the only headers that are public are those you'll find through np.get_include(). I think this is documented well enough. randomkit.h was in numpy/random/ not in numpy/core/include/numpy/ (which is what np.get_include() points to). So randomkit.h was private (admittedly, would have been better not to install it at all though). If it was easy to restore, we probably would (if we get one bug report that someone was relying on it, there may be more users relying on it), but it seems that it's not.

@paul-the-noob
Copy link

@mattip

Could you describe your use case? It seems we would prefer not to export bounded_integers as a public C-API.

I just assumed from the name that it is the "backend" to randint. If that's not the case or randint like functionality is available elsewhere that's fine with me.

@mattip
Copy link
Member

mattip commented Oct 13, 2019

rand_int is now supplied by _bounded_integers, but you need to choose your exact int type. @paul-the-noob could you describe your use case in a little more detail so we can make sure to include it as part of gh-14604? Are you building a BitGenerator to reuse in NumPy's Generator, or sampling uniform integers of a specific C type to be used directly by your code, or creating your own distributions?

@bashtage
Copy link
Contributor

bashtage commented Oct 13, 2019 via email

@paul-the-noob
Copy link

paul-the-noob commented Oct 13, 2019

@mattip

rand_int is now supplied by _bounded_integers, but you need to choose your exact int type. @paul-the-noob could you describe your use case in a little more detail so we can make sure to include it as part of gh-14604? Are you building a BitGenerator to reuse in NumPy's Generator, or sampling uniform integers of a specific C type to be used directly by your code, or creating your own distributions?

I'm essentially creating permutations or shuffling vectors subject to certain constraints. IIRC, one reason I decided to use Cython was that at the time randint didn't support array iimits. But there are other factors that make the code "too sequential for numpy".

To summarize I need to draw random ints of a given C type from continually changing ranges, either one-by-one or small batch-by-small batch.

@mattip
Copy link
Member

mattip commented Oct 27, 2019

The recommended C-API was documented in #14604, this is the documentation from that PR. I think you are looking for int64_t random_positive_int64(bitgen_t *bitgen_state) and friends for a single value?

@charris
Copy link
Member

charris commented Nov 12, 2019

@mattip What was the decision on this for the random C-API?

@mattip
Copy link
Member

mattip commented Nov 12, 2019

I think we can close it, we have an alternative for users who needed this and #14778 widens the use cases for this and other ways to call into the numpy.random C-API

@charris
Copy link
Member

charris commented Nov 12, 2019

OK, will close. This does introduce an discontinuity from 1.16 to 1.18, correct? If so, it would be good to mention it somewhere in the release notes where it will be noticed, highlights maybe. Do we have a way to add highlights?

@charris charris closed this as completed Nov 12, 2019
@mattip
Copy link
Member

mattip commented Nov 12, 2019

According to the doc/release/ucoming_changes/README.rst , we can add a nnn.highlight.rst

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants