-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
MAINT Use np.full when possible #12379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for your feedback @anntzer ! I was not aware of the change of behavior in numpy 1.12. I have specified the dtype manually in the few places it was missing -- I think it's more reliable in any case than relying on the dtype of the second argument (i.e. This PR is less comprehensive that the one you did earlier, as I only made the changes in places where I though this had a change to impact the run time performance of the library (e.g. excluding examples, and some tests where the array size is very small). |
lib/matplotlib/axes/_axes.py
Outdated
@@ -3978,7 +3978,7 @@ def dopatch(xs, ys, **kwargs): | |||
# maybe draw the fliers | |||
if showfliers: | |||
# fliers coords | |||
flier_x = np.ones(len(stats['fliers'])) * pos | |||
flier_x = np.full(len(stats['fliers']), pos, dtype='float64') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dtype=float is generally preferrable, unless you have a good reason to force float64 even on 32bit architectures (which at least we didn't do before).
same comment applies throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, no, dtype=float
is not preferable as it is referencing python floats. dtype='float64'
or dtype=np.float_
(if you don't want to specify the width) is generally considered preferable. Also note dtype=np.int_
and dtype=np.bool_
all exist for similar reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but dtype=float will give you a proper np.float_ array... at this place, it's literally adding 4 characters for no change in behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You stated "is generally preferrable". Meanwhile: https://stackoverflow.com/questions/6205020/numpy-types-with-underscore-int-float-etc
Travis's Numpy book states reasons for why he created those dtypes. Yes, in many cases, you end up with the same thing, but they do exist for a reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh... the link you quote literally says "Whenever a data type is required, as an argument, the standard Python types are recognized as well" and I don't think (?) there's a single sentence in this quote saying that the dtypes are preferable in this use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was, and still is, that the fact that python datatypes are accepted is merely a convenience, and is not declared as "generally preferable". I have never seen anyone before state that the python datatypes are "generally preferable", and instead, I have seen people state the opposite because "explicit is better than implicit". I think what the submitter has is perfectly fine. And if the submitter had it as dtype=float
in the original submission, I probably wouldn't have said anything anyway, either. I am only opposing the assertion that your suggestion was "generally preferable".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am only opposing the assertion that your suggestion was "generally preferable".
That's fine with me, let's agree to not derail this PR and postpone this discussion to, well, not now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To address the initial comment,
>>> import numpy as np
>>> np.dtype(float)
dtype('float64')
>>> np.dtype(np.float64)
dtype('float64')
>>> np.dtype(np.float_)
dtype('float64')
so the actual dtype when providing dtype=float
will be float64
. That will be true for both 32 bit and 64 bit systems (checked in a 32 bit Debian docker container), unlike np.dtype(int)
which can be int32
or int64
depending on the system.
I also think it's better to say float64
explicitly even if it's 3 char longer, because that's the actual numpy dtype
we get, and it removes the need for readers to think about the sizeof float on 32/64 bit architecture and about possible implications of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, np.float64
is a slightly better spelling than 'float64'
lib/matplotlib/hatch.py
Outdated
@@ -169,7 +169,8 @@ def __init__(self, hatch, density): | |||
self.num_rows = (hatch.count('*')) * density | |||
path = Path.unit_regular_star(5) | |||
self.shape_vertices = path.vertices | |||
self.shape_codes = np.ones(len(self.shape_vertices)) * Path.LINETO | |||
self.shape_codes = np.full(len(self.shape_vertices), Path.LINETO, | |||
dtype='float64') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that should be some integer type, i think?
lib/matplotlib/legend_handler.py
Outdated
@@ -231,7 +231,7 @@ def create_artists(self, legend, orig_handle, | |||
xdata, xdata_marker = self.get_xdata(legend, xdescent, ydescent, | |||
width, height, fontsize) | |||
|
|||
ydata = ((height - ydescent) / 2.) * np.ones(xdata.shape, float) | |||
ydata = np.full(xdata.shape, ((height - ydescent) / 2.), float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may as well drop the dot after "/2" while you're at it, the __future__
(division) is now
lib/matplotlib/path.py
Outdated
@@ -316,7 +316,7 @@ def make_compound_path_from_polys(cls, XY): | |||
stride = numsides + 1 | |||
nverts = numpolys * stride | |||
verts = np.zeros((nverts, 2)) | |||
codes = np.ones(nverts, int) * cls.LINETO | |||
codes = np.full(nverts, cls.LINETO, dtype=int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should perhaps have been cls.code_type?
There's a bunch of FutureWarnings on the numpy==1.10 build (related to the change in dtype) which we probably don't want. You can make the dtype explicit in these calls, or just see whether this is gone in numpy 1.11 and bump the dependency. (If numpy 1.12 is needed, that's allowed next january.) |
Thanks @anntzer! I know. Trying to track them down but so far unsuccessfully as the warning are not raised with I'll run the tests locally to investigate, and will fix it -- it may take a few days though. |
would be nice (for the future) for numpy to do that :) |
I rebased on the latest Python3 only master with a more recent numpy. CI seems happy now. |
lib/matplotlib/hatch.py
Outdated
@@ -169,7 +169,8 @@ def __init__(self, hatch, density): | |||
self.num_rows = (hatch.count('*')) * density | |||
path = Path.unit_regular_star(5) | |||
self.shape_vertices = path.vertices | |||
self.shape_codes = np.ones(len(self.shape_vertices)) * Path.LINETO | |||
self.shape_codes = np.full(len(self.shape_vertices), Path.LINETO, | |||
dtype=np.int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should be Path.code_type.
lib/matplotlib/legend_handler.py
Outdated
@@ -231,7 +231,7 @@ def create_artists(self, legend, orig_handle, | |||
xdata, xdata_marker = self.get_xdata(legend, xdescent, ydescent, | |||
width, height, fontsize) | |||
|
|||
ydata = ((height - ydescent) / 2.) * np.ones(xdata.shape, float) | |||
ydata = np.full(xdata.shape, ((height - ydescent) / 2), float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.full_like(xdata, (height - ydescent) / 2)
lib/matplotlib/path.py
Outdated
@@ -552,7 +552,8 @@ def interpolated(self, steps): | |||
vertices = simple_linear_interpolation(self.vertices, steps) | |||
codes = self.codes | |||
if codes is not None: | |||
new_codes = Path.LINETO * np.ones(((len(codes) - 1) * steps + 1, )) | |||
new_codes = np.full(((len(codes) - 1) * steps + 1, ), Path.LINETO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can pass the shape as scalar instead of 1-tuple
lib/matplotlib/path.py
Outdated
@@ -864,7 +865,7 @@ def arc(cls, theta1, theta2, n=None, is_wedge=False): | |||
if is_wedge: | |||
length = n * 3 + 4 | |||
vertices = np.zeros((length, 2), float) | |||
codes = cls.CURVE4 * np.ones((length, ), cls.code_type) | |||
codes = np.full((length, ), cls.CURVE4, dtype=cls.code_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can pass shape as scalar
lib/matplotlib/path.py
Outdated
@@ -873,7 +874,7 @@ def arc(cls, theta1, theta2, n=None, is_wedge=False): | |||
else: | |||
length = n * 3 + 1 | |||
vertices = np.empty((length, 2), float) | |||
codes = cls.CURVE4 * np.ones((length, ), cls.code_type) | |||
codes = np.full((length, ), cls.CURVE4, dtype=cls.code_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can pass shape as scalar
lib/matplotlib/tri/trirefine.py
Outdated
@@ -243,7 +243,7 @@ def _refine_triangulation_once(triangulation, ancestors=None): | |||
np.arange(ntri, dtype=np.int32)])) | |||
edge_apexes = np.ravel(np.vstack([np.zeros(ntri, dtype=np.int32), | |||
np.ones(ntri, dtype=np.int32), | |||
np.ones(ntri, dtype=np.int32)*2])) | |||
np.full(ntri, 2, dtype=np.int32)])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps replace the previous two lines to use np.full too? Looks nicer :)
Also I think the whole thing is just
np.concatenate([np.full(ntri, 0, np.uint32),
np.full(ntri, 1, np.uint32),
np.full(ntri, 2, np.uint32)])
right?
lib/matplotlib/tri/trirefine.py
Outdated
edge_apexes = np.ravel(np.vstack([np.zeros(ntri, dtype=np.int32), | ||
np.ones(ntri, dtype=np.int32), | ||
np.full(ntri, 2, dtype=np.int32)])) | ||
edge_elems = np.tile(np.arange(3, dtype=np.int32), ntri) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you got 3 and ntri swapped?
Thanks for the review @anntzer ! I think I addressed all your comments, waiting for CI now.
Yes, thanks! using |
@@ -231,7 +231,7 @@ def create_artists(self, legend, orig_handle, | |||
xdata, xdata_marker = self.get_xdata(legend, xdescent, ydescent, | |||
width, height, fontsize) | |||
|
|||
ydata = ((height - ydescent) / 2.) * np.ones(xdata.shape, float) | |||
ydata = np.full_like(xdata, ((height - ydescent) / 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify dtype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying xdata's dtype (which full_like does) seems normal.
This uses
np.full
in a few places instead ofcst*np.ones
which avoids a memory copy and should be ~2x faster.