Skip to content

DEP: deprecate np.bool, np.int, np.float, np.long, np.str, np.unicode, and np.complex, which are misleading aliases for builtin types #6103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

njsmith
Copy link
Member

@njsmith njsmith commented Jul 22, 2015

This isn't ready yet -- it causes hundreds of test failures due to our own internal usage of np.bool etc., plus it is missing new tests of its own -- but posting now as an RFC.


For reasons which are lost in the mists of time, numpy has always
re-exported a bunch of built-in types like int as np.int, bool as
np.bool, etc. We can't possibly remove these exports anytime in the
near future, because they are widely used -- people write things like

  np.asarray(obj, dtype=np.int)

These people are confused, and would be better off writing either

  np.asarray(obj, dtype=int)

or

  np.asarray(obj, dtype=np.int_)

depending on what they actually want, but their code does work (and is
equivalent to the dtype=int version), so we can't break it. But,
thanks to a new feature added in 3.5, plus some glue and sealing back
for previous versions, it is finally at least possible to deprecate
these attributes!

This adds a dependency on the metamodule package, which is a tiny
pure-Python package that supports every version of Python that numpy
supports. This will Just Work for pip and other sensible installation
tools, but the alternative would be to just vendor our own copy of
metamodule.py into the numpy/ directory instead.

Personally I am kinda inclined to try going with the explicit dependency route and see how far we can get, because maintaining vendored libraries is a huge pain, and because c'mon it's 2015, and because sooner or later we are probably going to want to add an external dependency or two (e.g. sempervirens) and so we might as well find out what the pain points are now. But I can see the arguments either way...

njsmith added 2 commits July 21, 2015 23:57
For reasons which are lost in the mists of time, numpy has always
re-exported a bunch of built-in types like 'int' as 'np.int', 'bool' as
'np.bool', etc. We can't possibly remove these exports anytime in the
near future, because they are widely used -- people write things like

  np.asarray(obj, dtype=np.int)

These people are confused, and would be better off writing either

  np.asarray(obj, dtype=int)

or

  np.asarray(obj, dtype=np.int_)

depending on what they actually want, but their code does work (and is
equivalent to the 'dtype=int' version), so we can't break it. But,
thanks to a new feature added in 3.5, plus some glue and sealing back
for previous versions, it is finally at least possible to deprecate
these attributes!

This adds a dependency on the 'metamodule' package, which is a tiny
pure-Python package that supports every version of Python that numpy
supports. This will Just Work for pip and other sensible installation
tools, but the alternative would be to just vendor our own copy of
metamodule.py into the numpy/ directory instead.
This is necessary since otherwise the new deprecations for np.int and
friends cause an error to be raised while nose is scanning various
modules looking for tests, before it even starts running our test
suite.
@seberg
Copy link
Member

seberg commented Jul 22, 2015

While you are at it ;P:

CLIP                Sure, you could use this, but is that better then the string?
DataSource       Not sure what this is.
ERR_CALL            These should not be exposed (at least most of them, but I think all).
ERR_DEFAULT
ERR_DEFAULT2
ERR_IGNORE
ERR_LOG
ERR_PRINT
ERR_RAISE
ERR_WARN
FLOATING_POINT_SUPPORT
FPE_DIVIDEBYZERO        These FPE don't look like they should be exposed.
FPE_INVALID
FPE_OVERFLOW
FPE_UNDERFLOW
PackageLoader       main namespace?
RAISE               My guess is, everyone uses the string.
SHIFT_DIVIDEBYZERO      These should not be exposed (are pretty much useless)
SHIFT_INVALID
SHIFT_OVERFLOW
SHIFT_UNDERFLOW
Tester              main namespace?
UFUNC_BUFSIZE_DEFAULT
UFUNC_PYVALS_NAME      not even sure what this is.
WRAP                My guess is, everyone uses the string.
deprecate           why is this in the main namespace?
deprecate_with_doc  why is this in the main namespace?
disp                seems like a matlab "compatibility" layer, deprecate?
extract             compress (boolean indexing) on ravelled array, maybe deprecate?
fastCopyAndTranspose        also C-api, but nothing except a.T.copy('C'), so deprecate?
fix                 should be deprecated in favor of trunc?
format_parser       does this belong in the main namespace?
fromregex           never saw this before ;)
get_numarray_include        does this still make sense?
getbuffer           what is the use in the main namespace?
getbufsize          same as above
geterrobj           should not be exposed geterr is the public API.
int_asbuffer        is this dark magic? Is it useful (in main namespace)?
msort               deprecate? considering the trivial equivalence. np.sort(arr, axis=0)
ndfromtxt           equivalent to genfromtxt.
newbuffer           OK, but main namespace?
pkgload             main namespace?
place               Seems equivalent to boolean array assignment, no reason not to deprecate?
put                  Pretty much fancy indexing (but flattened) -- we do have arr.flat though, even I don't find it ideal.
putmask             Pretty much boolean indexing (but flattened)
round_              Something about not shadowing python I suppose?
set_numeric_ops     Could possibly consider deprecation (but github search finds a few cases)
seterrobj           should not be exposed (seterr is the public api)
sort_complex        this seems pretty useless? This is just sort + upcast to complex
source              Could be deprecated I think. There is inspect and ipython magic for this.
typeNA              NA, is this a remnent from the masked stuff?

And financials of course the 8 financial functions ;)

Sorry, don't mean this serious, but at least most of that seems like cludge.

@seberg
Copy link
Member

seberg commented Jul 22, 2015

Anyway, I am +0.5. I definitely think it is right, just wondering if it is worth the hassle (or if we should spend our "piss of users" credits on other things ;)).

@rkern
Copy link
Member

rkern commented Jul 22, 2015

Hopefully-unmistifying historical note: In early numpy days, these were aliases to the actual numpy scalar types, e.g. numpy.float is numpy.float64, numpy.int is numpy.int32 (or int64 depending on your architecture). Since this shadowed builtins when one did from numpy import *, it was decided that this was a bad idea. However, we had already had a few releases, and people wrote code relying on these names, so we left these as aliases to the builtin Python types for backwards compatibility (the primary use was to be consumed as a dtype= argument, for which the builtin types work just fine).

@njsmith
Copy link
Member Author

njsmith commented Jul 22, 2015

@rkern: awesome, thanks for that contact context. Ironic then isn't it that ~10 years later we've mostly successfully trained people not to do from numpy import *, but there are still new users being taught that they should "correct" their usage of float by replacing it with np.float...

It's true that this deprecation will be very noisy (at least for those rare individuals who actually ask to see deprecation warnings). The tradeoff is that in my experience people who are using np.float are generally shocked and grateful to learn what it actually means, because they were genuinely confused and no-one likes the idea of going on believing incorrect things indefinitely. So I guess the meta-point here is that there isn't necessarily a one-to-one mapping between nose level and degree of user annoyance, and its the latter we want to be careful with.

In any case we need to talk about this on the list, I'll make a post when I get a moment.

@rgommers
Copy link
Member

It's 2015 but the Python packaging system is still terrible. An explicit dependency on a package no one has installed is not a good idea. +10 for vendoring it if needed.

@njsmith
Copy link
Member Author

njsmith commented Jul 22, 2015

My experience is that the python packaging system for pure python packages
with properly put together wheels
is now totally seamless and Just Works.
I can definitely see the argument for proceeding with extreme caution, but
would be very interested if you have any examples of where specifically
this fails.
On Jul 22, 2015 10:34 AM, "Ralf Gommers" notifications@github.com wrote:

It's 2015 but the Python packaging system is still terrible. An explicit
dependency on a package no one has installed is not a good idea. +10 for
vendoring it if needed.


Reply to this email directly or view it on GitHub
#6103 (comment).

@rgommers
Copy link
Member

This fails the very simplest test of all:

  • Install Python (from python.org for example)
  • Download and unzip a numpy release
  • $ python setup.py install (in the unzipped numpy dir)

Also note that pip isn't even installed by default on all Python versions we support, is by no means universal (buildout, conda, people who don't use any of those), can fail behind corporate firewalls, .... (I can go on). A dependency is quite simply not worth it.

@njsmith
Copy link
Member Author

njsmith commented Jul 22, 2015

'python setup.py install' is broken anyway, though, it makes clean
uninstall or upgrade impossible and we semi-regularly get bug reports from
people who have borked their setup this way. The only correct way to
install from a source directory is 'pip install .' :-/.
.
(Also, I don't know much about buildout but citing the existence of conda
as a reason why inter-package dependencies don't work seems... really odd
to me?)
On Jul 22, 2015 11:11 AM, "Ralf Gommers" notifications@github.com wrote:

This fails the very simplest test of all:

  • Install Python (from python.org for example)
  • Download and unzip a numpy release
  • $ python setup.py install (in the unzipped numpy dir)

Also note that pip isn't even installed by default on all Python versions
we support, is by no means universal (buildout, conda, people who don't use
any of those), can fail behind corporate firewalls, .... (I can go on). A
dependency is quite simply not worth it.


Reply to this email directly or view it on GitHub
#6103 (comment).

@rgommers
Copy link
Member

I'm sorry, but a statement like "python setup.py install is broken anyway" makes it kind of hard to argue. That's still the canonical way you'll find in almost any install instruction. Anyway, I don't want to argue about this at all, it's a waste of time. Even if pip were perfect and everyone was using it (both very much not the case), I would be against adding a dependency to numpy for one deprecation warning. It really does not make any sense at all.

@njsmith
Copy link
Member Author

njsmith commented Jul 22, 2015

Fair enough.
.
It is true that all those install instructions are wrong and need fixing,
independently of this :-/.
.
It will be interesting to revisit this discussion later once sempervirens
is a real thing, since assuming everything goes to plan it will be both
extremely compelling and extremely difficult to vendor for technical
reasons.
On Jul 22, 2015 11:30 AM, "Ralf Gommers" notifications@github.com wrote:

I'm sorry, but a statement like "python setup.py install is broken anyway"
makes it kind of hard to argue. That's still the canonical way you'll find
in almost any install instruction. Anyway, I don't want to argue about this
at all, it's a waste of time. Even if pip were perfect and everyone was
using it (both very much not the case), I would be against adding a
dependency to numpy for one deprecation warning. It really does not make
any sense at all.


Reply to this email directly or view it on GitHub
#6103 (comment).

@njsmith
Copy link
Member Author

njsmith commented Jul 22, 2015

As a note for future reference in case this comes up again: this discussion:

http://stackoverflow.com/questions/8161617/how-can-i-specify-library-versions-in-setup-py
is from 2011 and affirms that buildout has known how to handle
install_requires= directives for at least that long, i.e. it is afaict
another situation where pure python dependencies will automagically Just
Work these days.
On Jul 22, 2015 11:54 AM, "Nathaniel Smith" njs@pobox.com wrote:

Fair enough.
.
It is true that all those install instructions are wrong and need fixing,
independently of this :-/.
.
It will be interesting to revisit this discussion later once sempervirens
is a real thing, since assuming everything goes to plan it will be both
extremely compelling and extremely difficult to vendor for technical
reasons.
On Jul 22, 2015 11:30 AM, "Ralf Gommers" notifications@github.com wrote:

I'm sorry, but a statement like "python setup.py install is broken
anyway" makes it kind of hard to argue. That's still the canonical way
you'll find in almost any install instruction. Anyway, I don't want to
argue about this at all, it's a waste of time. Even if pip were perfect and
everyone was using it (both very much not the case), I would be against
adding a dependency to numpy for one deprecation warning. It really does
not make any sense at all.


Reply to this email directly or view it on GitHub
#6103 (comment).

@argriffing
Copy link
Contributor

I don't think I'd be a huge fan of having something like sempervirens attached onto raw numpy. If groups like Continuum or Enthought or Sourceforge or Berkeley's Social Sciences Data Laboratory want to make their own instrumented installers then of course that's allowed but I'd prefer to keep it out of the main numpy github repo.

@njsmith
Copy link
Member Author

njsmith commented Jul 22, 2015

Let's defer that discussion until sempervirens is an actual thing, and in
its own thread...
On Jul 22, 2015 12:51 PM, "argriffing" notifications@github.com wrote:

I don't think I'd be a huge fan of having something like sempervirens
attached onto raw numpy. If groups like Continuum or Enthought or
Sourceforge or Berkeley's social sciences data laboratory want to make
their own instrumented installers then of course that's allowed but I'd
prefer to keep it out of the main numpy github repo.


Reply to this email directly or view it on GitHub
#6103 (comment).

@njsmith
Copy link
Member Author

njsmith commented Jul 23, 2015

Vendored metamodule.py.

@njsmith
Copy link
Member Author

njsmith commented Jul 23, 2015

Hmm, sent email to the list a few hours ago, but it doesn't seem to have gone through (nor have I received several emails from the last few days that are in the archives). Who is it we bug about this again?

@charris
Copy link
Member

charris commented Jul 23, 2015

@rkern I think Robert usually handles this.

@rkern
Copy link
Member

rkern commented Jul 23, 2015

I relay it to @infowolfe

@jni
Copy link
Contributor

jni commented Jul 24, 2015

I'm by no means a beginner numpy user and I have definitely used np.float thinking it was somehow nicer than built-in float. So 👍 to this change. I will leave the discussion of vendoring, dependencies, and Python packaging to the experts. ;)

@sebix
Copy link

sebix commented Jul 24, 2015

I'm in the same position as @jni. I also thought that np.float is a numpy-type as it is in the numpy namespace. And I also taught that other users. Just assumed that np.float defaults to np.float64 and np.int defaults to np.int32 and so on.

Also, the docstring does not mention that this is the python float, not the numpy float:

>>> np.float.__doc__
'float(x) -> floating point number\n\nConvert a string or number to a floating point number, if possible.'

I don't understand why this alias has been created in the beginning. If the alias would not exist, after a import * from numpy the builtin float would still exist?

EDIT: I also had the misconseption that np.int_ would be an internally used type, not some that should be used in programs.

@njsmith
Copy link
Member Author

njsmith commented Jul 24, 2015

Also, the docstring does not mention that this is the python float, not the numpy float:

The docstring is a property of the object, not a property of the attribute, if that makes sense -- so the reason it doesn't say anything about numpy is that it's just the docstring for the regular builtin float, it has no idea that it is being referred to by this weird confusing extra name.

I don't understand why this alias has been created in the beginning. If the alias would not exist, after a import * from numpy the builtin float would still exist?

The alias wasn't created to make from numpy import * work -- in fact the way numpy/__init__.py is written, it historically wouldn't import float and friends (they're in the namespace, but not listed in np.__all__). The issue is that there was code doing both of these things:

np.array(..., dtype=np.float)
from numpy import *
array(..., dtype=float)

and it was important that (a) they both keep working (for backward compatibility), and (b) they should have identical meaning (because anything else would be even more confusing). And then there was also the requirement (c) that in the latter case, float should mean the builtin. Put all these requirements together, and you end up with the only solution being that you keep np.float around, but make it an alias to the builtin.

EDIT: I also had the misconseption that np.int_ would be an internally used type, not some that should be used in programs.

[NB: edits don't trigger email notifications, so many people may never see them]

I can see why you'd think that, given that similarity to the leading-underscore-means-private convention. But what's going on here is that this is the convention that a trailing underscore is used to avoid naming conflicts -- e.g. when people have a variable that refers to an arbitrary class object in python, then you can't name the variable class, because that would be a syntax error. So a common convention is to instead use the name class_.

If you can see a way to make this clearer in the docs then we'd certainly appreciate suggestions/contributions!

larsmans added a commit to scipy/scipy that referenced this pull request Jul 24, 2015
use int, float, bool instead of np.int, np.float, np.bool

See numpy/numpy#6103 for rationale.
@rlamy
Copy link
Contributor

rlamy commented Jul 27, 2015

FWIW, numpy depending on or vendoring an explicitly PyPy-incompatible module is likely to be a bit annoying for us in numpypy, though I guess that the solution will be to get metamodule supported on PyPy.

eric-wieser added a commit to eric-wieser/numpy that referenced this pull request Nov 13, 2019
maxnoe pushed a commit to maxnoe/astropy that referenced this pull request Jan 6, 2020
As discussed in astropy#6603 (and numpy/numpy#6103) it's
unnecessary and misleading to use the Python types from the NumPy namespace.

With this PR this uses the default NumPy dtype corresponding to these
Python types.
@seberg
Copy link
Member

seberg commented Jan 29, 2020

I am going to close this, it is superseded by Erics work in #14882 now (although it coudl be easier to move forward with a small NEP maybe, which I wanted to write at some point, but did not do yet). In any case, I doubt that we will go the metamodule route, and instead just go slow and only deprecate on python 3.7 or higher where it is simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.