API: provide examples of use of numpy random API via user stories #14778

mattip · 2019-10-25T03:50:39Z

Gathered from gh-14604, gh-14517 and the discussions.

"To summarize I need to draw random ints of a given C type from continually changing ranges, either one-by-one or small batch-by-small batch."
Someone asked how to use random in a ufunc.
"An ideal API would allow projects like https://github.com/deepmind/torch-randomkit/tree/master/randomkit or numba to consume the code in NumPy without vendoring it."
"There are c++ applications which use boost::random, would be nice to be able to swap it for numpy.random."
"Using the existing distributions from Cython was a requested feature and an explicit goal, yes. There are users waiting for this." (mattip: but isn't this supported via Generator?)
"Numba would definitely appreciate C functions to access the random distribution implementations, and
have a side-project (numba-scipy) that is making the Cython wrapped functions in SciPy visible to Numba" (mattip: I think we do this after API: rearrange the cython files in numpy.random #14608 from _generator.so via these cdef extern declarations, but that requires the H file.
Someone who wants to write a new BitGenerator, i.e., the numpy/bitgenerator repo without needing to have the numpy code as a git submodule

Some of these are already be handled by gh-14604, but I am putting them here for completeness

Edit: turned into a checklist

rkern · 2019-10-25T04:22:45Z

"Using the existing distributions from Cython was a requested feature and an explicit goal, yes. There are users waiting for this." (mattip: but isn't this supported via Generator?)

No. In the context of the discussion this statement was drawn from, this was referring to the functions in distributions.c: efficient C-level functions for drawing numbers one-by-one without all of the overhead that the Generator methods have for cramming these numbers into numpy arrays. That's what people have been waiting for.

mattip · 2019-11-15T19:55:48Z

@rkern the way the code is currently structured in #14604 we do not ship distributions.c directly, rather we expose all the funcitons in _generator.pyx from the shared object _generator.*.so. If we wish to officially support them from C we should either add a _generator.pxd which will expose them via cimport _generator or ship a _generator.h header file and a link library for MSVC. I think a pxd file would be better, what do you think?

bashtage · 2019-11-21T17:10:41Z

Coming from #14951, I think the C functions represent are a lot of the value add of a low-level interface. While some are pretty trival, many are not.

I don't think it adds much to maintenance since these functions are all used by Generator. IMO the simple thing to do is to always have the set of random_ functions exactly match the set required in Generator. Functions that are not in Generator should probably not be exposed, and if not used in Generator or RandomState, then they should be removed (don't think there are many if any like this).

Did you change the export of the distributions in #14604? I couldn't figure out where I might find them.

mattip · 2019-11-21T23:01:33Z

I removed unused functions and renamed some others more consistently (mainly gauss_zig -> normal) in #14604 which also

moved distributions.h to numpy/core/include/numpy/random/distributions.h
moved (only the needed) function declarations from distributions.pxd to mtrand.pyx and _generator.pyx
removed distributions.pxd

The functions are exported from the _generator c-extension shared object, so the issue is how best to access them from C/Cython/CFFI. xref #14954 (the first commit there shows there is a problem on windows)

bashtage · 2019-11-21T23:06:51Z

The Windows .pyd files only export a single symbol, PyInit_generator. If this is fixable, then the numba examples will work well.

So that leaves Cython. I've always compiled the C source directly into Cython modules. Is there a reasonable alternative? Most examples that I found searching for Cython dll assume that you have the source of the DLL code, and don't actually rely on the DLL when the Cython extension module is built.

mattip · 2019-11-22T15:56:31Z

you have the source of the DLL code

The numpy code is only a click away. This has the added advantage of compiling the code directly, rather than calling into a DLL which not only adds runtime overhead but allows for compile-time optimizations.

rkern · 2019-11-22T17:57:19Z

We need to expose the functions in the DLL/SO so that numba can use them. They have made an unsupported hack into the mtrand DLL/SO, and we need to move them off onto something supported. Copying C source code from the numpy source dist doesn't help them.

mattip · 2019-11-22T18:10:02Z

The functions in distributions.h are now exposed from the _generator DLL/SO. Can you point to a numba issue or PR so we can help them work out how to do this moving forward?

rkern · 2019-11-22T18:41:05Z

I'm only reacting to @bashtage's comment: "The Windows .pyd files only export a single symbol, PyInit_generator. If this is fixable, then the numba examples will work well."

mattip · 2019-11-22T19:12:48Z

The export on windows was a bug that is fixed in gh-14954

bashtage · 2019-11-22T19:27:04Z

I missed that change in the dll export. Seems like a good solution.

…

On Fri, Nov 22, 2019, 19:12 Matti Picus ***@***.***> wrote: The export on windows was a bug that is fixed in gh-14954 <#14954> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14778?email_source=notifications&email_token=ABKTSRKMOSHSLYMUNKBLRM3QVAVLDA5CNFSM4JE6F322YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE6SYPI#issuecomment-557657149>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKTSROCTKNNDURUK7GWAFTQVAVLDANCNFSM4JE6F32Q> .

rgommers · 2019-11-28T21:59:39Z

Since the last open PR was merged, this is probably the best issue for questions/comments. So here are some:

In new-basic-rngs there's some text to explain the bitgen struct. From that it's unclear what the difference is between next_uint64 and next_raw. It just refers to "raw value" but is that a different uint64 number than returned by next_uint64?
Naming in the now public header is still not optimal probably: s_binomial_t, some _zig names, some other cryptic abbreviations (_mvhg, _btpe)
The random C API docs still use private imports: cimport numpy.random._bit_generator and cimport numpy.random._generator
The See extending for examples link on that same C API page is broken
Does it make sense to expose duplicate functions with and without _standard? It only saves the user adding , 0.0, 1.0) to a function call (or outside of the function call) as far as I can tell. It's also unclear if the _standard functions are made available consistently (e.g. normal has with/without _standard, lognormal does not).
Formatting of functions in https://numpy.org/devdocs/reference/random/c-api.html is off, should look like https://numpy.org/devdocs/reference/c-api/array.html
The title is Cython API, still TBD if it should say C API I think?
Not quite sure about the "massaging" in https://numpy.org/devdocs/reference/random/extending.html#cffi. Should that be provided as functionality somewhere, or if not, should CFFI be dropped? It doesn't look very usable as is.
Also, more use of private _generator in that CFFI section.

bashtage · 2019-11-28T23:18:54Z

Next raw is the next number using the natural number of bits the bg uses. It can be 32, 53, 63 or 64 in the range of generators I've considered. It is used to directly test the correctness against a reference implementation.

bashtage · 2019-11-28T23:21:21Z

Most distributions don't have _standard versions. For example, there isn't a standard lognormal, or a standard chisquare, etc. Some of the standard functions are misnomer imo. For example, standard_t should really be just t, or perhaps students t. It isn't actually standard in any sense that is different from the definition of a t. Ditto for standard cauchy.

bashtage · 2019-11-28T23:23:16Z

Cffi is pretty easy to use if you want a since function. The examples extending to numba are pretty simple. Mattis example is pretty complete which requires parsing the header file and expanding or replacing macros.

rgommers · 2019-11-29T01:13:03Z

Most distributions don't have _standard versions

Yes, that's what I was commenting on. Whether something "is standard" is in the eye of the beholder I'd say. This is basically a default loc=0, scale=1 like in the scipy.stats distributions. It could be defined for all distributions here. It just looks odd to me now.

Some of the standard functions are misnomer imo. For example, standard_t should really be just t, or perhaps students t. It isn't actually standard in any sense that is different from the definition of a t. Ditto for standard cauchy.

Now would be the time to rename them. Once we ship 1.18.0 it becomes a lot harder.

rgommers · 2019-11-29T01:18:42Z

Cffi is pretty easy to use if you want a since function. The examples extending to numba are pretty simple. Mattis example is pretty complete which requires parsing the header file and expanding or replacing macros.

I'm not sure I understand this comment. At the moment there's ~50 lines of boilerplate. Why couldn't we simply put that in a function so not every user has to copy-paste that every time she wants to use cffi?

rkern · 2019-11-29T01:38:19Z

Most of the standard_ functions are there because those special forms are used inside of other distributions where the efficiency does matter. The fact that they get used like is what picks them out as "standard". It's not arbitrary.

I think I named standard_t that because I thought t was too short, and since it didn't expose the mean or scale as parameters. Should have called it student_t, but we're way past that. That's the name at the Python level. 1.18.0 is irrelevant.

rkern · 2019-11-29T01:41:43Z

And I have no explanation for standard_cauchy except that I think, having done standard_t, I had mentally established a pattern with the other bell-shaped distributions that if they didn't expose the loc and scale that I'd call it standard_ to avoid misleading people.

rgommers · 2019-11-29T01:52:20Z

Right indeed. For the names that match Python it is what it is, I missed that. 1.18.0 only matter for the new names of functions that don't have a Python equivalent.

mattip · 2019-11-29T05:39:41Z

The random C API docs still use private imports: cimport numpy.random._bit_generator and cimport numpy.random._generator
Also, more use of private _generator in that CFFI section.

For Cython, we could somehow extend the __init__.pxd so the pattern would be cimport numpy.random which would expose all the gory details. I tried to do this and it was messy, but maybe doable. However CFFI can only access the shared object/DLL, which is named _generator.*.so

mattip · 2019-11-29T05:43:54Z

The random C API docs still use private imports
The title is Cython API, still TBD if it should say C API I think

The top half of that document is Cython-specific, the bottom half builds on it to document the available C functions. Perhaps it could be divided into two pages

mattip · 2019-11-29T05:50:14Z

We should decide which of the points @rgommers raised are 1.18 blockers and break them out into separate issues to be resolved in the next couple of days.

mattip · 2019-11-29T05:53:32Z

Not quite sure about the "massaging" ... CFFI

This is standard stuff for CFFI. All the C wrapper libraries have a boilerplate problem: ctypes must define args and retype, Cython need pxd files, SWIG needs interface files, CFFI needs to pre-parse headers.

bashtage · 2019-11-29T10:05:41Z

I think I named standard_t that because I thought t was too short, and since it didn't expose the mean or scale as parameters. Should have called it student_t, but we're way past that. That's the name at the Python level. 1.18.0 is irrelevant.

Should the C functions correct this? They could become random_student_t (or students_t) and random_cauchy.

bashtage · 2019-11-29T10:11:18Z

I'm not sure I understand this comment. At the moment there's ~50 lines of boilerplate. Why couldn't we simply put that in a function so not every user has to copy-paste that every time she wants to use cffi?

Sorry, on the phone.

It should have said: CFFI is pretty easy if you only want to use one or a few functions.

See

https://github.com/numpy/numpy/blob/7548073e58e89b414ca999d666c19189291dc1e5/numpy/random/_examples/numba/extending_distributions.py

which is no longer in master (and isn't quite right, but easy to fix). There is no need for tons of boiler plate. TBH, I'm not sure why @mattip got rid of this simple example which is basic but follows the style pretty much every CFFI example on the internet.

Of course, NumPy could provide a components needed for CFFI interfacing, but this seems to me like committing support something that may not be worth the overhead since downstream might want to do things differently.

bashtage · 2019-11-29T10:18:42Z

As for 1.18, I think only naming if essential, since changing it later is harder. I don't think it really is essential to rename some of the functions, but my vote would be

standard_t -> student_t
standard_cauchy -> cauchy
Remvoe all _zig since there are no other versions of these distributions

The one feature that I would really like, and I think is needed to complete the story, is to have something like npymath.lib for the C functions, perhaps npyrandom.lib. This would allow C/Cython to link to these functions without needing to get the C source form somewhere. For my, this would allow what will be a slimmed-down version of randomgen to become a dependent on the random code in NumPy instead of having an out-of-sync copy. This is a bigger ask though, and can definitely wait.

mattip · 2019-11-29T12:06:43Z

... numba/extending_distributions.py which is no longer in master

Must be a glitch in the way you are looking at the repo. The file is there, it is referenced in the docs but not tested since it is complicated to run.

CFFI is pretty easy if you only want to use one or a few functions

The example I provided shows how to use all the functions available in distributions.h, and is closer to how I have seen CFFI used in industry. I agree that many CFFI examples and tutorials "cheat" by redefining only one or two functions without parsing a header, but I wanted to provide a more complete and powerful example. It seems I missed the mark, the attention was drawn to the boilerplate and not the actual code. I will refactor the parsing into a separate file.

Remove all _zig since there are no other versions of these distributions

Then we would add _legacy to the "bare" versions needed for mtrand.pyx?

mattip · 2019-11-29T12:20:06Z

Then we would add _legacy to the "bare" versions needed for mtrand.pyx?

nvermind, that is wrong. Changing the names as suggested in gh-15007

mattip · 2019-11-29T12:29:38Z

The Generator.standard_exponential method has a zig keyword. This uses the random_standard_exponential_zig_fill and random_standard_exponential_fill functions from distributions.c. I will rename them to random_standard_exponential_fill and random_standard_exponential_inv_fill respectively, and drop random_standard_exponential_inv_fill from the documentation.

mattip · 2019-11-29T13:13:46Z

Done in #15007 (pending review and passing tests)

jamesthomasgriffin · 2020-01-29T00:00:12Z

The reference page: Cython API for random implies that it is possible to use a long list of functions "random_..." from Cython and points to Extending for examples of using these functions. However those examples only use the methods associated to bitgen_t.

I am unable to cimport any of those functions.

from numpy.random._generator cimport random_interval
^

test_cython.pyx:13:0: 'numpy\random_generator.pxd' not found

Perhaps "_generator.pxy" is required but is not present. I am able to generate random numbers from bitgen_t.next_double, following the Extending example above.

Would it be possible to clarify the documentation by including an example? Or have I misunderstood and those functions should be private, in which case could this be clarified?

Numpy/Python version information:

1.18.1 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 23:11:46) [MSC v.1916 64 bit (AMD64)]

mattip · 2020-01-29T06:47:57Z

It seems the cython use case was indeed overlooked. There is an example using CFFI, but none for cython.

@jamesthomasgriffin

…API (gh-15463) xref gh-14778 As pointed out in the comment by @jamesthomasgriffin, we did not include a pxd file to expose the distribution functions documented in the random c-api. This PR adds a c_distributions.pxd file that exposes them. Squashed commits: * BUG: add missing c_distributions.pxd to enable cython use of random C-API * ENH, TST: add npyrandom library like npymath, test cython use of it * BUG: actually prefix f-string with f * MAINT: fixes from review, add _bit_generato_bit_generator.pxd * STY: fixes from review * BLD: don't use nprandom library for mtrand legacy build * TST: WindowsPath cannot be used in subprocess's list2cmdline * MAINT, API: move _bit_generator to bit_generator * DOC: add release note about moving bit_generator * DOC, MAINT: fixes from review * MAINT: redo dtype determination from review

mattip added 30 - API component: numpy.random labels Oct 25, 2019

mattip mentioned this issue Oct 27, 2019

API: restructure and document numpy.random C-API #14604

Merged

mattip mentioned this issue Nov 12, 2019

randomkit.h is not included in NumPy 1.17 builds #14517

Closed

mattip mentioned this issue Nov 21, 2019

DOC: Clean up examples of low-level random access #14951

Merged

mattip mentioned this issue Nov 22, 2019

TST. API: test using distributions.h via cffi #14954

Merged

mattip mentioned this issue Nov 29, 2019

DOC, API: improve the C-API/Cython documentation and interfaces for random #15007

Merged

mattip mentioned this issue Jan 29, 2020

ENH: expose bit_generator and random C-API to cython #15463

Merged

madarez mentioned this issue Jun 13, 2020

DOC: Generator is documented as part of the random c-api, but is missing #16597

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: provide examples of use of numpy random API via user stories #14778

API: provide examples of use of numpy random API via user stories #14778

mattip commented Oct 25, 2019 •

edited

Loading

rkern commented Oct 25, 2019

mattip commented Nov 15, 2019

bashtage commented Nov 21, 2019

mattip commented Nov 21, 2019

bashtage commented Nov 21, 2019

mattip commented Nov 22, 2019

rkern commented Nov 22, 2019 •

edited

Loading

mattip commented Nov 22, 2019

rkern commented Nov 22, 2019

mattip commented Nov 22, 2019

bashtage commented Nov 22, 2019 via email

rgommers commented Nov 28, 2019

bashtage commented Nov 28, 2019

bashtage commented Nov 28, 2019

bashtage commented Nov 28, 2019

rgommers commented Nov 29, 2019

rgommers commented Nov 29, 2019

rkern commented Nov 29, 2019

rkern commented Nov 29, 2019

rgommers commented Nov 29, 2019

mattip commented Nov 29, 2019

mattip commented Nov 29, 2019

mattip commented Nov 29, 2019

mattip commented Nov 29, 2019

bashtage commented Nov 29, 2019

bashtage commented Nov 29, 2019

bashtage commented Nov 29, 2019

mattip commented Nov 29, 2019

mattip commented Nov 29, 2019

mattip commented Nov 29, 2019

mattip commented Nov 29, 2019

jamesthomasgriffin commented Jan 29, 2020

from numpy.random._generator cimport random_interval
^

mattip commented Jan 29, 2020

API: provide examples of use of numpy random API via user stories #14778

API: provide examples of use of numpy random API via user stories #14778

Comments

mattip commented Oct 25, 2019 • edited Loading

rkern commented Oct 25, 2019

mattip commented Nov 15, 2019

bashtage commented Nov 21, 2019

mattip commented Nov 21, 2019

bashtage commented Nov 21, 2019

mattip commented Nov 22, 2019

rkern commented Nov 22, 2019 • edited Loading

mattip commented Nov 22, 2019

rkern commented Nov 22, 2019

mattip commented Nov 22, 2019

bashtage commented Nov 22, 2019 via email

rgommers commented Nov 28, 2019

bashtage commented Nov 28, 2019

bashtage commented Nov 28, 2019

bashtage commented Nov 28, 2019

rgommers commented Nov 29, 2019

rgommers commented Nov 29, 2019

rkern commented Nov 29, 2019

rkern commented Nov 29, 2019

rgommers commented Nov 29, 2019

mattip commented Nov 29, 2019

mattip commented Nov 29, 2019

mattip commented Nov 29, 2019

mattip commented Nov 29, 2019

bashtage commented Nov 29, 2019

bashtage commented Nov 29, 2019

bashtage commented Nov 29, 2019

mattip commented Nov 29, 2019

mattip commented Nov 29, 2019

mattip commented Nov 29, 2019

mattip commented Nov 29, 2019

jamesthomasgriffin commented Jan 29, 2020

from numpy.random._generator cimport random_interval ^

Numpy/Python version information:

mattip commented Jan 29, 2020

mattip commented Oct 25, 2019 •

edited

Loading

rkern commented Nov 22, 2019 •

edited

Loading

from numpy.random._generator cimport random_interval
^