Skip to content

ENH: expose bit_generator and random C-API to cython #15463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 16, 2020

Conversation

mattip
Copy link
Member

@mattip mattip commented Jan 29, 2020

xref gh-14778

As pointed out in the comment by @jamesthomasgriffin, we did not include a pxd file to expose the distribution functions documented in the random c-api. This PR adds a c_distributions.pxd file that exposes them.

A test is still needed that this all works as advertised. Could someone familiar with typical use please suggest one? Complicated test to build a cython c-extension module added.

@mattip
Copy link
Member Author

mattip commented Jan 29, 2020

Just a pxd file is not enough. We need to ship either the source for these functions or a so/dll+lib that users can link to. Shipping the source is complicated, but so is the other option. @rkern @bashtage ideas?

@bashtage
Copy link
Contributor

Nothing other than what you say. The "first best" would be to ship a lib/so that could be linked to like npymath.lib/so, e.g., npyrandom.lib/so (or possibly to stick them in npymath if possible).

@mattip
Copy link
Member Author

mattip commented Jan 29, 2020

the npymath parallel looks good. I will try that, thanks

@bashtage
Copy link
Contributor

Looks like a good and surprisingly simple solution.

@mattip
Copy link
Member Author

mattip commented Feb 1, 2020

@rkern I don't understand the test failure in test_random.py 1585 in the threaded tests. on windows. Any thoughts?

@@ -52,78 +54,87 @@ def generate_libraries(ext, build_dir):
PCG64_DEFS = []
# One can force emulated 128-bit arithmetic if one wants.
#PCG64_DEFS += [('PCG_FORCE_EMULATED_128BIT_MATH', '1')]
depends = ['__init__.pxd', 'c_distributions.pxd', '_bit_generator.pxd']
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_bit_generator.pxd existed in 1.18, but was not copied into the wheel. The use of depends on line 137 should fix that.

@mattip
Copy link
Member Author

mattip commented Feb 2, 2020

apparently something in this PR broke windows. Maybe the npyrandom.lib is off somehow

# Cast the pointer
rng = <bitgen_t *> PyCapsule_GetPointer(capsule, capsule_name)
randoms = np.empty(n, dtype=_dtype)
if _dtype is np.float32:
Copy link
Member

@eric-wieser eric-wieser Feb 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if _dtype is np.float32:
if _dtype.type is np.float32:

or

Suggested change
if _dtype is np.float32:
if _dtype == np.float32:

As written this is always false. The former is more consistent with line 100.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got be careful here, the latter is correct. The former would have to also check _dtype.isnative. Right now dtype=">f8" (on little endian machines) is probably broken!

Copy link
Member

@seberg seberg Feb 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I think I actually like the dtype.type is np.float32 and dtype.isnative spelling I think.

EDIT: I guess here the isnative check needs to be further up anyway though, maybe just create an issue for it as a separate thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant as a test-run for something to replace the key = np.dtype(dtype).name in _generator.pyx and mtrand.pyx so it would be good to get it right here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dtype == np.float32 is probably simplest, but make sure to update line 100 too.

Copy link
Member

@seberg seberg Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True dtype == ... is shortest. May be worth to time it in case the type + isnative happens to be much faster (right now).
I timed it, and it:

In [2]: %timeit dt == np.float64                                                                                        
85 ns ± 5.03 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [3]: %timeit dt.type is np.float64                                                                                   
57.3 ns ± 0.108 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [4]: %timeit dt.type is np.float64 and dt.isnative                                                                   
81.6 ns ± 0.697 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

so since the == includes the isnative check there is no speed difference between the spellings , so probably should use the == one. The only thing we could consider with respect to fixing the speed issue of dtype.name check would be to add a fast-path for the very common dtype default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using the == form

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still seems to be the == form here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing

@mattip
Copy link
Member Author

mattip commented Feb 3, 2020

"Dogfooding" our code is helpful. The use of the npyrandom library internally and for the C-API cython test showed that the RAND_INT_TYPE macro in the header files is confusing (well, it confused me). I moved it out of the header and duplicated a header file with the long alternative. It should now be more likely to get a compilation error rather than runtime crashes.

double r
double q
double fm
int64_t m
long m
double p1
double xm
double xl
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cython doesn't seem to care what int is used in the struct, it is later filled out by the compiler. But cleaning this up anyway for consistency.

#include "ziggurat_constants.h"
#include "logfactorial.h"

#if defined(_MSC_VER) && defined(_WIN64)
#include <intrin.h>
#include <intrin.h>
#endif

/* Inline generators for internal use */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is formatted with 2-space tabs instead of 4-space. Is now a good time to remedy that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy if you do that (just ping Kevin on it before merging?), should be its own pure STY PR though (and if possible cover other files in random if they exist?). Not sure how others feel about such fixups, but I tend to feel that not doing style cleanups is also not a solution...

@mattip mattip requested a review from eric-wieser February 4, 2020 15:30
@mattip mattip added the triage review Issue/PR to be discussed at the next triage meeting label Feb 12, 2020
@rkern
Copy link
Member

rkern commented Feb 19, 2020

+1 to move _bit_generator to bit_generator.

@mattip
Copy link
Member Author

mattip commented Feb 19, 2020

@r-devulap the reciprical strided test is failing here as well

@r-devulap
Copy link
Member

Hmm, did that test failure vanish? @mattip

@mattip
Copy link
Member Author

mattip commented Feb 19, 2020

I reran the failing CI run, and it dissappeared. But see gh-15610

@mattip
Copy link
Member Author

mattip commented Feb 24, 2020

CI is passing after the _bit_generator -> bit_generator rename

@mattip
Copy link
Member Author

mattip commented Feb 25, 2020

Added a release note, CI is still green.

@mattip mattip requested a review from seberg March 15, 2020 21:17
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping Matti. LGTM, the is part should be fixed in the example but everything else we can do later I think, so lets keep things moving.

# Cast the pointer
rng = <bitgen_t *> PyCapsule_GetPointer(capsule, capsule_name)
randoms = np.empty(n, dtype=_dtype)
if _dtype is np.float32:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still seems to be the == form here?

install_dir='lib',
build_info={
'include_dirs' : [], # empty list required for creating npyrandom.h
'extra_compiler_args' : (['/GL-'] if is_msvc else []),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to comment on why global optimization is disabled? (and maybe that it is global optimization?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied it from numpy/core/setup.py

@seberg
Copy link
Member

seberg commented Mar 16, 2020

Sould the release notes in the same or different fragment include that c_distributions.pxd is now exposed, it is, right?

@mattip
Copy link
Member Author

mattip commented Mar 16, 2020

Mentioned the c_distributions.pxd in the release note.

@@ -106,7 +106,7 @@ def uniforms_ex(bit_generator, Py_ssize_t n, dtype=np.float64):
# Cast the pointer
rng = <bitgen_t *> PyCapsule_GetPointer(capsule, capsule_name)
randoms = np.empty(n, dtype=_dtype)
if _dtype is np.float32:
if _dtype == np.float32:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I did not look at it careful enough again right now. This is still incorrect for non-native byte order, would need to add an error branch and elif _dtype == np.float64. On the up-side, you can just move the above raise TypeError('Unsupported dtype "%r"' % dtype) in the else branch then, so that it is nicer anyway.
I can do that if you are getting tired of this :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@seberg seberg changed the title BUG: add missing c_distributions.pxd, enables cython use of random C-API ENH: expose bit_generator and random C-API to cython Mar 16, 2020
@seberg
Copy link
Member

seberg commented Mar 16, 2020

Thanks, I renamed it the title, hope that fits well enough (if not, can change it in the PR title at least, will squash merge).

@seberg seberg merged commit 4f2b219 into numpy:master Mar 16, 2020
@charris charris modified the milestones: 1.19.0 release, 1.18.3 release Apr 6, 2020
@charris
Copy link
Member

charris commented Apr 6, 2020

@mattip Hmm, I missed this for the 1.18.1 release (wrong milestone) and it may be a bit late to put it in the 1.18.3 release. It doesn't seem right to add new functionality so late in the release cycle.

@bashtage
Copy link
Contributor

bashtage commented Apr 6, 2020

I think there is no need to rush is out since it is an enhancement and not a bug. 1.19 seems like a good choice..

@charris
Copy link
Member

charris commented Apr 6, 2020

@bashtage OK, let's go with that.

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Apr 6, 2020
@charris charris removed this from the 1.18.3 release milestone Apr 6, 2020
@mattip mattip deleted the c_distributions branch November 2, 2020 08:27
@rkern
Copy link
Member

rkern commented Aug 4, 2021

@charris @bashtage Can we backport at least the np.random.bit_generator rename to a 1.18.6 bugfix release. This actually is a bug, not an enhancement; we always wanted people to be able to import BitGenerator. It seems to be causing issues with scipy.

@charris
Copy link
Member

charris commented Aug 4, 2021

We could try, the wheels code may have bitrotted, making things more difficult. Want to make a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: numpy.random triaged Issue/PR that was discussed in a triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants