boolean vs. scalar comparisons very recently broke in master #7284

jreback · 2016-02-19T02:48:34Z

between these 2 commits:

1.12.0.dev0+f4cc58c (good)

1.12.0.dev0+a2f5392 (bad)

pandas now started failing here:
https://travis-ci.org/pydata/pandas/jobs/110022540 (bad)

The text was updated successfully, but these errors were encountered:

jreback · 2016-02-19T02:49:09Z

@cpcloud would know more about what actually is tested here. But the point is that somehow np.int64 is now being passed rather than int. very odd.

ahaldane · 2016-02-19T03:07:48Z

Looks like it is related to #7254 which was just merged. We tried changing it so np.random.randint now returns numpy scalars rather than python ints, when returning a single value.

Maybe we will have to revert that, but maybe we can investigate a little first..

ahaldane · 2016-02-19T03:15:39Z

Thanks for the report also, it's nice to know pandas catches bugs so quickly!

jreback · 2016-02-19T03:17:00Z

hah we r using wheels from your master - so thank you!

ahaldane · 2016-02-19T03:42:21Z

So pandas is literally expecting that type(randint(10)) == int in order to determine which case to test, which is exaclt what #7254 changed.

Even though it's a rare case we should still try to make randint 100% backward compatible. I think we can do so while keeping #7254's nice behavior that np.random.randint(True, dtype=bool) returns bool instead of an int.

@gfyoung, what do you think of this: We could make the default value of dtype keyword be 'int', and randint would return a python int when size=1 in that case. That way we are back compatible, but we can still get the behavior of #7254 if dtype is supplied, eg when writing np.random.randint(10, dtype='l').

gfyoung · 2016-02-19T04:02:07Z

This is indeed a pity, but backwards compatibility must be respected. Here is what I suggest:

I think it is still correct to return numpy integers to maintain consistency with the cases when shape specifies an array to be returned. However, in light of these failing tests, I propose that the dtype NOT be enforced for np.int32 and np.int64 since np.dtype(int) or np.dtype('l') will be casted to those values. Instead, I would propose putting a FutureWarning so that downstream libraries like pandas have time to adjust to this change.

ahaldane · 2016-02-19T04:54:41Z

I'm not sure it's worth forcing people to adjust to a change, since there is so little benefit (python integers can represent any numpy integers).

If we make the signature be randint(low, high=None, size=None, dtype=int), and then at the end of randint replace

return randfunc(low, high - 1, size, self.state_address)

by

ret = randfunc(low, high - 1, size, self.state_address)
if dtype in [bool, int, long]:
    return dtype(ret)
return ret

then if the user requests a python type, they get a python type, and if they request a numpy type ,they get a numpy type, and we get to keep the old default. (remove long for python 3 though).

gfyoung · 2016-02-19T05:06:44Z

@ahaldane : Fair enough. np.long as a default would be a little more accurate because that will go to int in Python 3 and long in Python 2. I'll put up a PR for this shortly.

jreback · 2016-02-21T15:18:22Z

ok, this looks good in pandas master.

Have some new breakages though: pandas-dev/pandas#12406

seberg · 2016-02-21T15:24:11Z

@jreback it is due to gh-7215. Does this bother you/create incompatibilities with older pandas versions or is it just a note seen in tests? Numpy used to patch the error to be an index error in this case, but Nathaniel removed it to simplify code, it would be nice to not having to replace the error, but if it creates compat problems we have to think about it. Can't say I really expected problems there ;).

jreback · 2016-02-21T15:31:33Z

@seberg no the IndexError -> TypeError looks reasonable to me. We will just update the tests on that, no biggie. Its the second on where something is being returned as inf rather than NaN when doing a floordiv is more troubling. But only had a quick look.

seberg · 2016-02-21T15:34:08Z

Ah, ok, I did not realize it was two things when glancing over. Chuck is still busy with some floordiv stuff, he might have a quick idea what is going on.

charris · 2016-02-21T15:51:02Z

@jreback Could you specify what the arguments were?

jreback · 2016-02-21T16:26:16Z

I'll have to see if I can give u a self contained example in a bit

charris · 2016-02-21T16:48:05Z

@jreback I think it is 3. // 0.. Note two things,

in Python this raises a zero division error, so compatibility does not compel numpy behavior
// is different from the floor function, which I suspect is used in sparse.

Currently numpy computes the % // pair based on the result of fmod, which in this case is nan, and that is returned for both of the operators. However, in this case it would be possible to return the result of floor(a/b) for the integer part when b is zero if that would help backwards compatibility.

charris · 2016-02-21T17:08:59Z

Another case that probably differs and is specified by Python is

In [2]: 1. % inf
Out[2]: 1.0

njsmith · 2016-02-21T17:21:08Z

Wow, those are brain breaking examples. But I think 3.0 // 0.0 should be nan, on the principle that (1) it isn't well defined in its own right, (2) it can't be extended by continuity arguments, because lim 3.0 // x can diverge in different directions depending on the choice for the x sequence (in particular, positive versus negative, just like 3 / 0).

The 1.0 % inf == 1.0 sounds correct, through, because 1.0 % x == 1.0 for all x > 1.0, so the limit is well defined. But surely IEEE754 and/or Annex F have something more definitive to say about this, specifically in their comments on fmod?

charris · 2016-02-21T17:52:17Z

@njsmith I think it probably is specified, as MSVC 2008 got it wrong and it is corrected in later versions.

pv · 2016-02-21T18:03:11Z

Annex F:

fmod(±0, y) returns ±0 for y not zero.
fmod(x, y) returns a NaN and raises the ‘‘invalid’’ floating-point exception for x infinite or y zero.
fmod(x, ± ∞ ) returns x for x not infinite.

charris · 2016-02-21T18:16:19Z

Which makes me think I should explicitly return nan rather than the result of fmod in the case of zero division so that things work properly on MSVC 2008 also.

jreback · 2016-02-21T18:38:28Z

So here's a simple repro.

In [1]: np.__version__
Out[1]: '1.12.0.dev0+7d4d26a'

In [3]: np.array([3])//np.array([0],dtype='float64')
Out[3]: array([ nan])

In [2]: np.float64(3)//np.float64(0)
Out[2]: nan

In [3]: np.__version__
Out[3]: '1.10.4'

In [4]: np.float64(3)//np.float64(0)
Out[4]: inf

In [7]: np.array([3])//np.array([0],dtype='float64')
Out[7]: array([ inf])

jreback · 2016-02-21T18:43:27Z

so my test fails because we are expecting the infs (and the result is now NaN in numpy >= 1.11)

This is 1.10.4

In [15]: b = array([ nan,  nan,  nan,   0.,   1.,   2.,   3.,   4.,   5.,   6.])

In [16]: a = array([[ nan,  nan,  nan,   0.,   1.,   2.,   3.,   4.,   5.,   6.],
       [  0.,   1.,   2.,  nan,  nan,  nan,   3.,   4.,   5.,   6.],
       [  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.],
       [  0.,   1.,   2.,   3.,   4.,   5.,  nan,  nan,  nan,  nan]])

In [17]: a//b
Out[17]: 
array([[ nan,  nan,  nan,  nan,   1.,   1.,   1.,   1.,   1.,   1.],
       [ nan,  nan,  nan,  nan,  nan,  nan,   1.,   1.,   1.,   1.],
       [ nan,  nan,  nan,  inf,   4.,   2.,   2.,   1.,   1.,   1.],
       [ nan,  nan,  nan,  inf,   4.,   2.,  nan,  nan,  nan,  nan]])

charris · 2016-02-21T18:55:37Z

@jreback Yes, but what should we do about it. Nan is probably the most correct.

jreback · 2016-02-21T19:03:40Z

@charris yes agree NaN is the most logical here. ok, as long as you announce this as an bug fix I think you are ok. I will adjust my tests.

charris · 2016-02-21T19:53:07Z

@jreback OK, I will make sure the release notes mention it.

charris · 2016-02-21T19:54:43Z

Actually, MSVC 2008 gets the 1.0 % inf case wrong, not the zero division.

The 'pandas' library expects Python integers to be returned, so this commit changes the API so that the default is 'np.int' which converts to native Python integer types when a singleton is being generated with this function. Closes numpygh-7284.

gfyoung mentioned this issue Feb 19, 2016

BUG: Backport #7254 #7271

Merged

rgommers added 06 - Regression component: numpy.random labels Feb 19, 2016

rgommers added this to the 1.12.0 release milestone Feb 19, 2016

jreback mentioned this issue Feb 19, 2016

BUG: Make Randint Backwards Compatible with Pandas #7285

Merged

charris closed this as completed in 4f91544 Feb 19, 2016

charris modified the milestones: 1.11.0 release, 1.12.0 release Feb 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

boolean vs. scalar comparisons very recently broke in master #7284

boolean vs. scalar comparisons very recently broke in master #7284

jreback commented Feb 19, 2016

jreback commented Feb 19, 2016

ahaldane commented Feb 19, 2016

ahaldane commented Feb 19, 2016

jreback commented Feb 19, 2016

ahaldane commented Feb 19, 2016

gfyoung commented Feb 19, 2016

ahaldane commented Feb 19, 2016

gfyoung commented Feb 19, 2016

jreback commented Feb 21, 2016

seberg commented Feb 21, 2016

jreback commented Feb 21, 2016

seberg commented Feb 21, 2016

charris commented Feb 21, 2016

jreback commented Feb 21, 2016

charris commented Feb 21, 2016

charris commented Feb 21, 2016

njsmith commented Feb 21, 2016

charris commented Feb 21, 2016

pv commented Feb 21, 2016

charris commented Feb 21, 2016

jreback commented Feb 21, 2016

jreback commented Feb 21, 2016

charris commented Feb 21, 2016

jreback commented Feb 21, 2016

charris commented Feb 21, 2016

charris commented Feb 21, 2016

boolean vs. scalar comparisons very recently broke in master #7284

boolean vs. scalar comparisons very recently broke in master #7284

Comments

jreback commented Feb 19, 2016

jreback commented Feb 19, 2016

ahaldane commented Feb 19, 2016

ahaldane commented Feb 19, 2016

jreback commented Feb 19, 2016

ahaldane commented Feb 19, 2016

gfyoung commented Feb 19, 2016

ahaldane commented Feb 19, 2016

gfyoung commented Feb 19, 2016

jreback commented Feb 21, 2016

seberg commented Feb 21, 2016

jreback commented Feb 21, 2016

seberg commented Feb 21, 2016

charris commented Feb 21, 2016

jreback commented Feb 21, 2016

charris commented Feb 21, 2016

charris commented Feb 21, 2016

njsmith commented Feb 21, 2016

charris commented Feb 21, 2016

pv commented Feb 21, 2016

charris commented Feb 21, 2016

jreback commented Feb 21, 2016

jreback commented Feb 21, 2016

charris commented Feb 21, 2016

jreback commented Feb 21, 2016

charris commented Feb 21, 2016

charris commented Feb 21, 2016