DEP: deprecate np.bool, np.int, np.float, np.long, np.str, np.unicode, and np.complex, which are misleading aliases for builtin types #6103

njsmith · 2015-07-22T07:12:05Z

This isn't ready yet -- it causes hundreds of test failures due to our own internal usage of np.bool etc., plus it is missing new tests of its own -- but posting now as an RFC.

For reasons which are lost in the mists of time, numpy has always
re-exported a bunch of built-in types like int as np.int, bool as
np.bool, etc. We can't possibly remove these exports anytime in the
near future, because they are widely used -- people write things like

  np.asarray(obj, dtype=np.int)

These people are confused, and would be better off writing either

  np.asarray(obj, dtype=int)

or

  np.asarray(obj, dtype=np.int_)

depending on what they actually want, but their code does work (and is
equivalent to the dtype=int version), so we can't break it. But,
thanks to a new feature added in 3.5, plus some glue and sealing back
for previous versions, it is finally at least possible to deprecate
these attributes!

This adds a dependency on the metamodule package, which is a tiny
pure-Python package that supports every version of Python that numpy
supports. This will Just Work for pip and other sensible installation
tools, but the alternative would be to just vendor our own copy of
metamodule.py into the numpy/ directory instead.

Personally I am kinda inclined to try going with the explicit dependency route and see how far we can get, because maintaining vendored libraries is a huge pain, and because c'mon it's 2015, and because sooner or later we are probably going to want to add an external dependency or two (e.g. sempervirens) and so we might as well find out what the pain points are now. But I can see the arguments either way...

For reasons which are lost in the mists of time, numpy has always re-exported a bunch of built-in types like 'int' as 'np.int', 'bool' as 'np.bool', etc. We can't possibly remove these exports anytime in the near future, because they are widely used -- people write things like np.asarray(obj, dtype=np.int) These people are confused, and would be better off writing either np.asarray(obj, dtype=int) or np.asarray(obj, dtype=np.int_) depending on what they actually want, but their code does work (and is equivalent to the 'dtype=int' version), so we can't break it. But, thanks to a new feature added in 3.5, plus some glue and sealing back for previous versions, it is finally at least possible to deprecate these attributes! This adds a dependency on the 'metamodule' package, which is a tiny pure-Python package that supports every version of Python that numpy supports. This will Just Work for pip and other sensible installation tools, but the alternative would be to just vendor our own copy of metamodule.py into the numpy/ directory instead.

This is necessary since otherwise the new deprecations for np.int and friends cause an error to be raised while nose is scanning various modules looking for tests, before it even starts running our test suite.

seberg · 2015-07-22T07:29:17Z

While you are at it ;P:

CLIP                Sure, you could use this, but is that better then the string?
DataSource       Not sure what this is.
ERR_CALL            These should not be exposed (at least most of them, but I think all).
ERR_DEFAULT
ERR_DEFAULT2
ERR_IGNORE
ERR_LOG
ERR_PRINT
ERR_RAISE
ERR_WARN
FLOATING_POINT_SUPPORT
FPE_DIVIDEBYZERO        These FPE don't look like they should be exposed.
FPE_INVALID
FPE_OVERFLOW
FPE_UNDERFLOW
PackageLoader       main namespace?
RAISE               My guess is, everyone uses the string.
SHIFT_DIVIDEBYZERO      These should not be exposed (are pretty much useless)
SHIFT_INVALID
SHIFT_OVERFLOW
SHIFT_UNDERFLOW
Tester              main namespace?
UFUNC_BUFSIZE_DEFAULT
UFUNC_PYVALS_NAME      not even sure what this is.
WRAP                My guess is, everyone uses the string.
deprecate           why is this in the main namespace?
deprecate_with_doc  why is this in the main namespace?
disp                seems like a matlab "compatibility" layer, deprecate?
extract             compress (boolean indexing) on ravelled array, maybe deprecate?
fastCopyAndTranspose        also C-api, but nothing except a.T.copy('C'), so deprecate?
fix                 should be deprecated in favor of trunc?
format_parser       does this belong in the main namespace?
fromregex           never saw this before ;)
get_numarray_include        does this still make sense?
getbuffer           what is the use in the main namespace?
getbufsize          same as above
geterrobj           should not be exposed geterr is the public API.
int_asbuffer        is this dark magic? Is it useful (in main namespace)?
msort               deprecate? considering the trivial equivalence. np.sort(arr, axis=0)
ndfromtxt           equivalent to genfromtxt.
newbuffer           OK, but main namespace?
pkgload             main namespace?
place               Seems equivalent to boolean array assignment, no reason not to deprecate?
put                  Pretty much fancy indexing (but flattened) -- we do have arr.flat though, even I don't find it ideal.
putmask             Pretty much boolean indexing (but flattened)
round_              Something about not shadowing python I suppose?
set_numeric_ops     Could possibly consider deprecation (but github search finds a few cases)
seterrobj           should not be exposed (seterr is the public api)
sort_complex        this seems pretty useless? This is just sort + upcast to complex
source              Could be deprecated I think. There is inspect and ipython magic for this.
typeNA              NA, is this a remnent from the masked stuff?

And financials of course the 8 financial functions ;)

Sorry, don't mean this serious, but at least most of that seems like cludge.

seberg · 2015-07-22T09:03:25Z

Anyway, I am +0.5. I definitely think it is right, just wondering if it is worth the hassle (or if we should spend our "piss of users" credits on other things ;)).

rkern · 2015-07-22T10:19:41Z

Hopefully-unmistifying historical note: In early numpy days, these were aliases to the actual numpy scalar types, e.g. numpy.float is numpy.float64, numpy.int is numpy.int32 (or int64 depending on your architecture). Since this shadowed builtins when one did from numpy import *, it was decided that this was a bad idea. However, we had already had a few releases, and people wrote code relying on these names, so we left these as aliases to the builtin Python types for backwards compatibility (the primary use was to be consumed as a dtype= argument, for which the builtin types work just fine).

njsmith · 2015-07-22T14:34:57Z

@rkern: awesome, thanks for that ~~contact~~ context. Ironic then isn't it that ~10 years later we've mostly successfully trained people not to do from numpy import *, but there are still new users being taught that they should "correct" their usage of float by replacing it with np.float...

It's true that this deprecation will be very noisy (at least for those rare individuals who actually ask to see deprecation warnings). The tradeoff is that in my experience people who are using np.float are generally shocked and grateful to learn what it actually means, because they were genuinely confused and no-one likes the idea of going on believing incorrect things indefinitely. So I guess the meta-point here is that there isn't necessarily a one-to-one mapping between nose level and degree of user annoyance, and its the latter we want to be careful with.

In any case we need to talk about this on the list, I'll make a post when I get a moment.

rgommers · 2015-07-22T17:33:57Z

It's 2015 but the Python packaging system is still terrible. An explicit dependency on a package no one has installed is not a good idea. +10 for vendoring it if needed.

njsmith · 2015-07-22T17:51:55Z

My experience is that the python packaging system for pure python packages
with properly put together wheels is now totally seamless and Just Works.
I can definitely see the argument for proceeding with extreme caution, but
would be very interested if you have any examples of where specifically
this fails.
On Jul 22, 2015 10:34 AM, "Ralf Gommers" notifications@github.com wrote:

It's 2015 but the Python packaging system is still terrible. An explicit
dependency on a package no one has installed is not a good idea. +10 for
vendoring it if needed.

—
Reply to this email directly or view it on GitHub
#6103 (comment).

rgommers · 2015-07-22T18:11:18Z

This fails the very simplest test of all:

Install Python (from python.org for example)
Download and unzip a numpy release
$ python setup.py install (in the unzipped numpy dir)

Also note that pip isn't even installed by default on all Python versions we support, is by no means universal (buildout, conda, people who don't use any of those), can fail behind corporate firewalls, .... (I can go on). A dependency is quite simply not worth it.

njsmith · 2015-07-22T18:19:03Z

'python setup.py install' is broken anyway, though, it makes clean
uninstall or upgrade impossible and we semi-regularly get bug reports from
people who have borked their setup this way. The only correct way to
install from a source directory is 'pip install .' :-/.
.
(Also, I don't know much about buildout but citing the existence of conda
as a reason why inter-package dependencies don't work seems... really odd
to me?)
On Jul 22, 2015 11:11 AM, "Ralf Gommers" notifications@github.com wrote:

This fails the very simplest test of all:

Install Python (from python.org for example)

Download and unzip a numpy release

$ python setup.py install (in the unzipped numpy dir)

Also note that pip isn't even installed by default on all Python versions
we support, is by no means universal (buildout, conda, people who don't use
any of those), can fail behind corporate firewalls, .... (I can go on). A
dependency is quite simply not worth it.

—
Reply to this email directly or view it on GitHub
#6103 (comment).

rgommers · 2015-07-22T18:30:24Z

I'm sorry, but a statement like "python setup.py install is broken anyway" makes it kind of hard to argue. That's still the canonical way you'll find in almost any install instruction. Anyway, I don't want to argue about this at all, it's a waste of time. Even if pip were perfect and everyone was using it (both very much not the case), I would be against adding a dependency to numpy for one deprecation warning. It really does not make any sense at all.

njsmith · 2015-07-22T18:54:34Z

Fair enough.
.
It is true that all those install instructions are wrong and need fixing,
independently of this :-/.
.
It will be interesting to revisit this discussion later once sempervirens
is a real thing, since assuming everything goes to plan it will be both
extremely compelling and extremely difficult to vendor for technical
reasons.
On Jul 22, 2015 11:30 AM, "Ralf Gommers" notifications@github.com wrote:

I'm sorry, but a statement like "python setup.py install is broken anyway"
makes it kind of hard to argue. That's still the canonical way you'll find
in almost any install instruction. Anyway, I don't want to argue about this
at all, it's a waste of time. Even if pip were perfect and everyone was
using it (both very much not the case), I would be against adding a
dependency to numpy for one deprecation warning. It really does not make
any sense at all.

—
Reply to this email directly or view it on GitHub
#6103 (comment).

njsmith · 2015-07-22T19:00:39Z

As a note for future reference in case this comes up again: this discussion:

http://stackoverflow.com/questions/8161617/how-can-i-specify-library-versions-in-setup-py
is from 2011 and affirms that buildout has known how to handle
install_requires= directives for at least that long, i.e. it is afaict
another situation where pure python dependencies will automagically Just
Work these days.
On Jul 22, 2015 11:54 AM, "Nathaniel Smith" njs@pobox.com wrote:

Fair enough.
.
It is true that all those install instructions are wrong and need fixing,
independently of this :-/.
.
It will be interesting to revisit this discussion later once sempervirens
is a real thing, since assuming everything goes to plan it will be both
extremely compelling and extremely difficult to vendor for technical
reasons.
On Jul 22, 2015 11:30 AM, "Ralf Gommers" notifications@github.com wrote:

I'm sorry, but a statement like "python setup.py install is broken
anyway" makes it kind of hard to argue. That's still the canonical way
you'll find in almost any install instruction. Anyway, I don't want to
argue about this at all, it's a waste of time. Even if pip were perfect and
everyone was using it (both very much not the case), I would be against
adding a dependency to numpy for one deprecation warning. It really does
not make any sense at all.

—
Reply to this email directly or view it on GitHub
#6103 (comment).

argriffing · 2015-07-22T19:51:48Z

I don't think I'd be a huge fan of having something like sempervirens attached onto raw numpy. If groups like Continuum or Enthought or Sourceforge or Berkeley's Social Sciences Data Laboratory want to make their own instrumented installers then of course that's allowed but I'd prefer to keep it out of the main numpy github repo.

njsmith · 2015-07-22T19:55:11Z

Let's defer that discussion until sempervirens is an actual thing, and in
its own thread...
On Jul 22, 2015 12:51 PM, "argriffing" notifications@github.com wrote:

I don't think I'd be a huge fan of having something like sempervirens
attached onto raw numpy. If groups like Continuum or Enthought or
Sourceforge or Berkeley's social sciences data laboratory want to make
their own instrumented installers then of course that's allowed but I'd
prefer to keep it out of the main numpy github repo.

—
Reply to this email directly or view it on GitHub
#6103 (comment).

As per discussion thread starting here: numpy#6103 (comment)

njsmith · 2015-07-23T02:05:21Z

Vendored metamodule.py.

njsmith · 2015-07-23T05:41:15Z

Hmm, sent email to the list a few hours ago, but it doesn't seem to have gone through (nor have I received several emails from the last few days that are in the archives). Who is it we bug about this again?

charris · 2015-07-23T05:48:57Z

@rkern I think Robert usually handles this.

rkern · 2015-07-23T12:35:21Z

I relay it to @infowolfe

jni · 2015-07-24T02:10:31Z

I'm by no means a beginner numpy user and I have definitely used np.float thinking it was somehow nicer than built-in float. So 👍 to this change. I will leave the discussion of vendoring, dependencies, and Python packaging to the experts. ;)

sebix · 2015-07-24T07:20:48Z

I'm in the same position as @jni. I also thought that np.float is a numpy-type as it is in the numpy namespace. And I also taught that other users. Just assumed that np.float defaults to np.float64 and np.int defaults to np.int32 and so on.

Also, the docstring does not mention that this is the python float, not the numpy float:

>>> np.float.__doc__
'float(x) -> floating point number\n\nConvert a string or number to a floating point number, if possible.'

I don't understand why this alias has been created in the beginning. If the alias would not exist, after a import * from numpy the builtin float would still exist?

EDIT: I also had the misconseption that np.int_ would be an internally used type, not some that should be used in programs.

njsmith · 2015-07-24T07:32:19Z

Also, the docstring does not mention that this is the python float, not the numpy float:

The docstring is a property of the object, not a property of the attribute, if that makes sense -- so the reason it doesn't say anything about numpy is that it's just the docstring for the regular builtin float, it has no idea that it is being referred to by this weird confusing extra name.

I don't understand why this alias has been created in the beginning. If the alias would not exist, after a import * from numpy the builtin float would still exist?

The alias wasn't created to make from numpy import * work -- in fact the way numpy/__init__.py is written, it historically wouldn't import float and friends (they're in the namespace, but not listed in np.__all__). The issue is that there was code doing both of these things:

np.array(..., dtype=np.float)

from numpy import *
array(..., dtype=float)

and it was important that (a) they both keep working (for backward compatibility), and (b) they should have identical meaning (because anything else would be even more confusing). And then there was also the requirement (c) that in the latter case, float should mean the builtin. Put all these requirements together, and you end up with the only solution being that you keep np.float around, but make it an alias to the builtin.

EDIT: I also had the misconseption that np.int_ would be an internally used type, not some that should be used in programs.

[NB: edits don't trigger email notifications, so many people may never see them]

I can see why you'd think that, given that similarity to the leading-underscore-means-private convention. But what's going on here is that this is the convention that a trailing underscore is used to avoid naming conflicts -- e.g. when people have a variable that refers to an arbitrary class object in python, then you can't name the variable class, because that would be a syntax error. So a common convention is to instead use the name class_.

If you can see a way to make this clearer in the docs then we'd certainly appreciate suggestions/contributions!

use int, float, bool instead of np.int, np.float, np.bool See numpy/numpy#6103 for rationale.

rlamy · 2015-07-27T16:39:24Z

FWIW, numpy depending on or vendoring an explicitly PyPy-incompatible module is likely to be a bit annoying for us in numpypy, though I guess that the solution will be to get metamodule supported on PyPy.

Relates to numpygh-6103

As discussed in astropy#6603 (and numpy/numpy#6103) it's unnecessary and misleading to use the Python types from the NumPy namespace. With this PR this uses the default NumPy dtype corresponding to these Python types.

seberg · 2020-01-29T01:09:35Z

I am going to close this, it is superseded by Erics work in #14882 now (although it coudl be easier to move forward with a small NEP maybe, which I wanted to write at some point, but did not do yet). In any case, I doubt that we will go the metamodule route, and instead just go slow and only deprecate on python 3.7 or higher where it is simple.

njsmith added 2 commits July 21, 2015 23:57

MAINT: When testing, only error out on numpy-caused warnings

d11d662

This is necessary since otherwise the new deprecations for np.int and friends cause an error to be raised while nose is scanning various modules looking for tests, before it even starts running our test suite.

argriffing mentioned this pull request Jul 22, 2015

use int, float, bool instead of np.int, np.float, np.bool scipy/scipy#5076

Merged

argriffing mentioned this pull request Jul 22, 2015

Add opt-in collection of user data to numpy. #6105

Closed

MAINT: vendor metamodule.py instead of depending on it

b974c20

As per discussion thread starting here: numpy#6103 (comment)

larsmans added a commit to scipy/scipy that referenced this pull request Jul 24, 2015

Merge pull request #5076 from argriffing/simplify-numpy-int

10ca036

use int, float, bool instead of np.int, np.float, np.bool See numpy/numpy#6103 for rationale.

larsmans mentioned this pull request Jul 25, 2015

[WIP] MAINT: get rid of np.int scikit-learn/scikit-learn#5032

Closed

rgommers added component: numpy._core 03 - Maintenance labels Jul 26, 2015

eric-wieser mentioned this pull request Nov 13, 2019

DEP: Deprecate aliases of builtin types in python 3.7+ #14882

Merged

eric-wieser mentioned this pull request Nov 13, 2019

MAINT: Remove uses of scalar aliases #14901

Merged

eric-wieser added a commit to eric-wieser/numpy that referenced this pull request Nov 13, 2019

MAINT: Remove uses of scalar aliases

6dfe331

Relates to numpygh-6103

eric-wieser mentioned this pull request Dec 3, 2019

BUG: add endfunction, endsubroutine to valid fortran end words #15035

Merged

eric-wieser mentioned this pull request Jan 16, 2020

MAINT: Remove sys.version checks #15335

Merged

mwtoews mentioned this pull request Jan 24, 2020

MAINT, DOC: Remove use of old Python __builtin__, now known as builtins #15418

Merged

seberg closed this Jan 29, 2020

eric-wieser mentioned this pull request Apr 13, 2020

np.repeat not accepting np.uint for repeats #15965

Open

thomasjpfan mentioned this pull request Aug 5, 2020

TST Change expected result type np.int64 -> np.int scikit-learn/scikit-learn#18089

Merged

99991 mentioned this pull request Aug 26, 2022

API changes for Resampling modes python-pillow/Pillow#6537

Closed

Uh oh!

DEP: deprecate np.bool, np.int, np.float, np.long, np.str, np.unicode, and np.complex, which are misleading aliases for builtin types #6103

DEP: deprecate np.bool, np.int, np.float, np.long, np.str, np.unicode, and np.complex, which are misleading aliases for builtin types #6103

Uh oh!

Conversation

njsmith commented Jul 22, 2015

Uh oh!

seberg commented Jul 22, 2015

Uh oh!

seberg commented Jul 22, 2015

Uh oh!

rkern commented Jul 22, 2015

Uh oh!

njsmith commented Jul 22, 2015

Uh oh!

rgommers commented Jul 22, 2015

Uh oh!

njsmith commented Jul 22, 2015

Uh oh!

rgommers commented Jul 22, 2015

Uh oh!

njsmith commented Jul 22, 2015

Uh oh!

rgommers commented Jul 22, 2015

Uh oh!

njsmith commented Jul 22, 2015

Uh oh!

njsmith commented Jul 22, 2015

Uh oh!

argriffing commented Jul 22, 2015

Uh oh!

njsmith commented Jul 22, 2015

Uh oh!

njsmith commented Jul 23, 2015

Uh oh!

njsmith commented Jul 23, 2015

Uh oh!

charris commented Jul 23, 2015

Uh oh!

rkern commented Jul 23, 2015

Uh oh!

jni commented Jul 24, 2015

Uh oh!

sebix commented Jul 24, 2015

Uh oh!

njsmith commented Jul 24, 2015

Uh oh!

rlamy commented Jul 27, 2015

Uh oh!

seberg commented Jan 29, 2020 • edited by eric-wieser Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

seberg commented Jan 29, 2020 •

edited by eric-wieser

Loading