Skip to content

[MRG+1] Use fused type in inplace normalize #6539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

yenchenlin
Copy link
Contributor

This is a PR related to #5776 and to replace #5932 .

May @MechCoder @rvraghav93 @ogrisel @GaelVaroquaux please have a look? 🙏

@raghavrv
Copy link
Member

Thanks for the PR. I'll review this in a while :)

"""Inplace row normalize using the l2 norm"""
cdef unsigned int n_samples = X.shape[0]
cdef unsigned int n_features = X.shape[1]
_inplace_csr_row_normalize_l2(X.data, X.shape, X.indices, X.indptr)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why I put a wrapper here is because inplace_csr_row_normalize_l1 and inplace_csr_row_normalize_l2 only accepts one argument X, it is not allowed to directly replace

cdef np.ndarray[DOUBLE, ndim=1] X_data = X.data

with

cdef np.ndarray[floating, ndim=1] X_data = X.data

in the original function.

However, above solution is available in case like #6430 where function also accepts some arguments that has type involves floating. e.g. np.ndarray[floating, ndim=2].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@MechCoder
Copy link
Member

Can you also check that the dtype of X is not changed in the normalize function in data.py when X is sparse and norm is l1 or l2?

@MechCoder
Copy link
Member

LGTM pending

@yenchenlin yenchenlin force-pushed the use-fused-type-in-inplace-normalize branch from 84decaf to 8ad1688 Compare March 16, 2016 03:47
@yenchenlin
Copy link
Contributor Author

@MechCoder No problem!
Code updated, please have a look again.

@yenchenlin yenchenlin force-pushed the use-fused-type-in-inplace-normalize branch 3 times, most recently from 1ce2f0d to cbe01a2 Compare March 16, 2016 07:02
@@ -1202,6 +1207,7 @@ def test_normalizer_l2():
X = init(X_dense)
X_norm = normalizer = Normalizer(norm='l2', copy=False).transform(X)

assert_equal(X_norm.dtype, X.dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dtype of X here is float64 anyhow. I would add a separate test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean add

assert_equal(X_norm.dtype, np.float64)

here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I meant that this test will pass in master as well. Because the dype of X is np.float64. We want to check that the dtype of X is preserved. Hence I would add a separate test, testing that normalize preserves dtype, for X of dtype npfloat32 as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I change X to npfloat.32 here, it will result in

AssertionError:
Arrays are not almost equal to 7 decimals
 ACTUAL: 0.99999994
 DESIRED: 1.0

caused by this line.

But it will pass

assert_equal(X_norm.dtype, X.dtype) 

Maybe we can leave dtype test here which also checks dtype of X is preserved?

@MechCoder
Copy link
Member

Sorry for not being clear. There are various other places that normalize is being used, other than Normalizer. Hence I would prefer to add it as a separate test.

Sure, the sparse matrix is being checked separately in test_inplace_normalize_l1 and test_inplace_normalize_l2, but normalize handles the dense case as well and it may be better to have a sanity check.

@yenchenlin yenchenlin force-pushed the use-fused-type-in-inplace-normalize branch from cbe01a2 to 23e1bdf Compare March 22, 2016 09:31
@yenchenlin
Copy link
Contributor Author

Hello @MechCoder ,
ah sorry for my misunderstanding before!

I've updated the code and check normalize function which is defined in data.py.
Is this what you mean?

assert_array_almost_equal(row_sums, ones)


def test_normalize_l2():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry for the noise but you could probably add another for loop rather than another function since they vary only in the value of parameter norm. Please ignore this if you had some reason for doing this way which I might be missing here !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @maniteja123 , thanks for your review.
there's also a difference here

But yeah, I can use a if to do that.
I'll wait for reviewer's opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops my bad. Sorry !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't be. Still thanks for your review, it's helpful. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather use a if and merge the tests

@MechCoder
Copy link
Member

LGTM. @ogrisel quick second review?

@MechCoder MechCoder changed the title [MRG] Use fused type in inplace normalize [MRG+1] Use fused type in inplace normalize Mar 23, 2016
@yenchenlin yenchenlin force-pushed the use-fused-type-in-inplace-normalize branch from 23e1bdf to 16829dc Compare March 23, 2016 02:58
@MechCoder
Copy link
Member

@jnothman quick review?

@@ -364,7 +370,7 @@ def assign_rows_csr(X,
"""Densify selected rows of a CSR matrix into a preallocated array.

Like out[out_rows] = X[X_rows].toarray() but without copying.
Only supported for dtype=np.float64.
No-copy supported for both dtype=np.float32 and dtype=np.float64.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's not clear enough.
Original code will copy np.float32 data into np.float64.
Now it won't copy the data no matter it's np.float32 or np.float64

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in assign_rows_csr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you are right.
Thanks for pointing out my big mistake!
That should be in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why it is good to have 2 reviewers ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha yeah now I know 😄

@yenchenlin yenchenlin force-pushed the use-fused-type-in-inplace-normalize branch from 16829dc to 43b54e2 Compare March 25, 2016 17:44
@yenchenlin
Copy link
Contributor Author

Hello @TomDLT ,
I've merged the tests and address the comment issue, please have a look thanks!

for inplace_csr_row_normalize in (inplace_csr_row_normalize_l1,
inplace_csr_row_normalize_l2):
for dtype in (np.float64, np.float32):
X = rs.rand(10, 5).astype(dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should use randn to have negative values
That will show that you forgot the absolute value before the sum

@yenchenlin yenchenlin force-pushed the use-fused-type-in-inplace-normalize branch from 43b54e2 to 08e1db4 Compare March 26, 2016 02:38
@yenchenlin
Copy link
Contributor Author

Hello @TomDLT ,
I've addressed the problems you mentioned, code updated.

@jnothman
Copy link
Member

LGTM

@jnothman
Copy link
Member

Thanks @yenchenlin1994

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants