-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
API: rearrange the cython files in numpy.random #14608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The first changeset moves things around without any function renaming yet. |
numpy/random/setup.py
Outdated
# config.add_data_files('mtrand.pxd') | ||
# config.add_data_files('pcg64.pxd') | ||
# config.add_data_files('philox.pxd') | ||
# config.add_data_files('sfc64.pxd') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to create these pxd files so users can do from pcg64 cimport PCG64
to use the BitGenerator directly from cython
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keep this in the main namespace? PCG64
is in numpy.random
so I'd much prefer it not be in separate pxd
files here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main namespace
This mimics the way we import the BitGenerators now.
Do you mean we should refactor the module so that __init__.py
imports differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's already done remember? gh-14360
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still seeing the BitGenerators in both the c-extension module and as imported by __init__.py
:
>>> np.random.mt19937.MT19937
<class 'numpy.random.mt19937.MT19937'>
>>> np.random.MT19937
<class 'numpy.random.mt19937.MT19937'>
>>>
I am not sure we can flatten the namespace without unifying all the c-extension modules into one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, missing some more underscores.
I am not sure we can flatten the namespace without unifying all the c-extension modules into one
Of course we can, it's just a matter of adding underscores to the .pyx
files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made private in 9bdf1749e. mtrand
must remain public as per NEP 19
... the global
RandomState
instance MUST be accessible in this initial release by the namenumpy.random.mtrand._rand
...
Those principles all sound good, thanks @mattip. |
There was no reply yet to your mailing list question on whether these really need to be exposed. Your preference on the mailing list was "no" I believe. I'd like to see a good rationale too before we expose the legacy generator. |
the next commit moves |
I am going back and forth about shipping |
I think that would be better indeed.
I still don't quite understand why
Is there anything else in those two submodules that is public but not imported into the |
082e17e
to
724ec06
Compare
I think this is ready for review now. Note there are no public c-extension modules, all of them (except mtrand, which is private but cannot be renamed as per NEP 19) are private. Once this is merged, we can revisit #14604 to try to develop the correct C-API. |
Awesome, thanks Matti.
sounds like a good plan |
bit_generator.ISeedSequence | ||
bit_generator.ISpawnableSeedSequence | ||
bit_generator.SeedlessSeedSequence | ||
_bit_generator.ISeedSequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong. You should never have something starting with an underscore in a toctree listing (or anywhere in the docs really).
So when I asked if there was anything in bit_generator
besides SeedSequence
that needs to be public, I guess the right answer was not "no" but "ISeedSequence
, ISpawnableSeedSequence
and SeedlessSeedSequence
" (these are needed for writing other bitgenerators outside of NumPy, right?)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they are needed to implement new SeedSequence interfaces as kind of an abstract base class. BitGenerators should use bit_generator.SeedSequence
. I think we should remove these then, and maybe document them in the (as yet nonexistant) random C-API documentation, not in the random python reference documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they're not supposed to be used from Python at all, but only from Cython/C for new BitGenerator authors, then that sounds like the right course of action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should we do with text like this "One may also pass in an implementor of the ISeedSequence interface like SeedSequence"? How about simplifying it to not mention ISeedSequence ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adopted the "don't mention" approach for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds good to me
API and docs part of this PR look quite good to me now, module the few typos I just pointed out. |
I'm confused why |
The public/private distinction is for python access. C and Cython access still need to be worked out, once we have solidified the python side, see #14604. |
Did you mean to commit |
Same comment for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very small points. Only worthwhile ones are the commits of the pyx and pxd that come from templates.
PCG64 <pcg64> | ||
Philox <philox> | ||
SFC64 <sfc64> | ||
MT19937 <mt19937> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does 3 or 4 spaces matter? File seem to use both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unified to 4, would be nice to do this gradually across the documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it only has to be consistent per file to work. Pretty sure the default is 3 spaces everywhere.
__all__ = [] | ||
|
||
np.import_array() | ||
|
||
cdef extern from "include/distributions.h": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be in _bounded_integers.pxd.in
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, then we would have to ship distributions.h
. But I think we should revisit this in terms of documented use-cases in #14604 after this goes in.
Okay this seems good to go now. Let's get this in so we can start looking at the API part. Thanks @mattip! |
This is the code part of #14604 to cleanup the api and allow numpy.random to be used from cython.
Principles:
pxd
files define the cython API, any public except mtrand*.so
should ship a*.pxd
and any*.pxd
should have a*.so
. Any "private"*.so
should begin with_
. Edit: qualify to state "public except mtrand", which cannot be made private as per NEP 19. We only have private c-extension modules now, so there are nopxd
files to ship.numpy/random/include
ship theEdit: there are no pxd or headers to shippxd
files and headersCleanup
common
->_common
bound_integers
->_bound_integers
?*
libc
imports from_common
, let eachpxd
import on its ownbitgen_t
fromcommon
tobit_generator.pxd
distributions.pxd
intogenerator.pxd
,legacy_distributions.pxd
intomtrans.pxd
.Renaming and removing unused functions in
distributions.h
andlegacy_distributions.h
random_gauss_zig
->random_standard_normal
zig
suffixedit: except
random_standard_exponential_zig
_f
suffixrandom_double
->random_random
for consistency withrandom.random
random_standard_uniform
random_geometric_*
do not need to be exposedgauss
namesAfter this cleanup we can create some cython user stories and check that the API is sufficient.