-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Use fused type in inplace normalize #6539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Use fused type in inplace normalize #6539
Conversation
Thanks for the PR. I'll review this in a while :) |
"""Inplace row normalize using the l2 norm""" | ||
cdef unsigned int n_samples = X.shape[0] | ||
cdef unsigned int n_features = X.shape[1] | ||
_inplace_csr_row_normalize_l2(X.data, X.shape, X.indices, X.indptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why I put a wrapper here is because inplace_csr_row_normalize_l1
and inplace_csr_row_normalize_l2
only accepts one argument X
, it is not allowed to directly replace
cdef np.ndarray[DOUBLE, ndim=1] X_data = X.data
with
cdef np.ndarray[floating, ndim=1] X_data = X.data
in the original function.
However, above solution is available in case like #6430 where function also accepts some arguments that has type involves floating
. e.g. np.ndarray[floating, ndim=2]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Can you also check that the dtype of X is not changed in the normalize function in data.py when X is sparse and norm is l1 or l2? |
LGTM pending |
84decaf
to
8ad1688
Compare
@MechCoder No problem! |
1ce2f0d
to
cbe01a2
Compare
@@ -1202,6 +1207,7 @@ def test_normalizer_l2(): | |||
X = init(X_dense) | |||
X_norm = normalizer = Normalizer(norm='l2', copy=False).transform(X) | |||
|
|||
assert_equal(X_norm.dtype, X.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dtype of X here is float64
anyhow. I would add a separate test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean add
assert_equal(X_norm.dtype, np.float64)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I meant that this test will pass in master as well. Because the dype of X
is np.float64
. We want to check that the dtype of X is preserved. Hence I would add a separate test, testing that normalize
preserves dtype, for X of dtype npfloat32 as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not being clear. There are various other places that Sure, the sparse matrix is being checked separately in |
cbe01a2
to
23e1bdf
Compare
Hello @MechCoder , I've updated the code and check |
assert_array_almost_equal(row_sums, ones) | ||
|
||
|
||
def test_normalize_l2(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, sorry for the noise but you could probably add another for loop rather than another function since they vary only in the value of parameter norm
. Please ignore this if you had some reason for doing this way which I might be missing here !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @maniteja123 , thanks for your review.
there's also a difference here
But yeah, I can use a if
to do that.
I'll wait for reviewer's opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops my bad. Sorry !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't be. Still thanks for your review, it's helpful. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather use a if
and merge the tests
LGTM. @ogrisel quick second review? |
23e1bdf
to
16829dc
Compare
@jnothman quick review? |
@@ -364,7 +370,7 @@ def assign_rows_csr(X, | |||
"""Densify selected rows of a CSR matrix into a preallocated array. | |||
|
|||
Like out[out_rows] = X[X_rows].toarray() but without copying. | |||
Only supported for dtype=np.float64. | |||
No-copy supported for both dtype=np.float32 and dtype=np.float64. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's not clear enough.
Original code will copy np.float32
data into np.float64
.
Now it won't copy the data no matter it's np.float32
or np.float64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in assign_rows_csr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you are right.
Thanks for pointing out my big mistake!
That should be in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why it is good to have 2 reviewers ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha yeah now I know 😄
16829dc
to
43b54e2
Compare
Hello @TomDLT , |
for inplace_csr_row_normalize in (inplace_csr_row_normalize_l1, | ||
inplace_csr_row_normalize_l2): | ||
for dtype in (np.float64, np.float32): | ||
X = rs.rand(10, 5).astype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should use randn
to have negative values
That will show that you forgot the absolute value before the sum
43b54e2
to
08e1db4
Compare
Hello @TomDLT , |
LGTM |
Thanks @yenchenlin1994 |
This is a PR related to #5776 and to replace #5932 .
May @MechCoder @rvraghav93 @ogrisel @GaelVaroquaux please have a look? 🙏