[MRG+1] Use fused type in inplace normalize #6539

yenchenlin · 2016-03-14T12:42:02Z

This is a PR related to #5776 and to replace #5932 .

May @MechCoder @rvraghav93 @ogrisel @GaelVaroquaux please have a look? 🙏

raghavrv · 2016-03-14T12:45:17Z

Thanks for the PR. I'll review this in a while :)

yenchenlin · 2016-03-14T12:47:12Z

sklearn/utils/sparsefuncs_fast.pyx

-    """Inplace row normalize using the l2 norm"""
-    cdef unsigned int n_samples = X.shape[0]
-    cdef unsigned int n_features = X.shape[1]
+    _inplace_csr_row_normalize_l2(X.data, X.shape, X.indices, X.indptr)


Why I put a wrapper here is because inplace_csr_row_normalize_l1 and inplace_csr_row_normalize_l2 only accepts one argument X, it is not allowed to directly replace

cdef np.ndarray[DOUBLE, ndim=1] X_data = X.data

with

cdef np.ndarray[floating, ndim=1] X_data = X.data

in the original function.

However, above solution is available in case like #6430 where function also accepts some arguments that has type involves floating. e.g. np.ndarray[floating, ndim=2].

MechCoder · 2016-03-15T22:17:34Z

Can you also check that the dtype of X is not changed in the normalize function in data.py when X is sparse and norm is l1 or l2?

MechCoder · 2016-03-15T22:18:10Z

LGTM pending

yenchenlin · 2016-03-16T03:51:41Z

@MechCoder No problem!
Code updated, please have a look again.

MechCoder · 2016-03-16T19:56:41Z

sklearn/preprocessing/tests/test_data.py

@@ -1202,6 +1207,7 @@ def test_normalizer_l2():
        X = init(X_dense)
        X_norm = normalizer = Normalizer(norm='l2', copy=False).transform(X)

+        assert_equal(X_norm.dtype, X.dtype)


The dtype of X here is float64 anyhow. I would add a separate test.

You mean add

assert_equal(X_norm.dtype, np.float64)

here?

Oh, I meant that this test will pass in master as well. Because the dype of X is np.float64. We want to check that the dtype of X is preserved. Hence I would add a separate test, testing that normalize preserves dtype, for X of dtype npfloat32 as well

If I change X to npfloat.32 here, it will result in

AssertionError: Arrays are not almost equal to 7 decimals ACTUAL: 0.99999994 DESIRED: 1.0

caused by this line.

But it will pass

assert_equal(X_norm.dtype, X.dtype)

Maybe we can leave dtype test here which also checks dtype of X is preserved?

MechCoder · 2016-03-21T19:11:04Z

Sorry for not being clear. There are various other places that normalize is being used, other than Normalizer. Hence I would prefer to add it as a separate test.

Sure, the sparse matrix is being checked separately in test_inplace_normalize_l1 and test_inplace_normalize_l2, but normalize handles the dense case as well and it may be better to have a sanity check.

yenchenlin · 2016-03-22T10:07:46Z

Hello @MechCoder ,
ah sorry for my misunderstanding before!

I've updated the code and check normalize function which is defined in data.py.
Is this what you mean?

maniteja123 · 2016-03-22T10:15:31Z

sklearn/preprocessing/tests/test_data.py

+            assert_array_almost_equal(row_sums, ones)
+
+
+def test_normalize_l2():


Hi, sorry for the noise but you could probably add another for loop rather than another function since they vary only in the value of parameter norm. Please ignore this if you had some reason for doing this way which I might be missing here !

Hello @maniteja123 , thanks for your review.
there's also a difference here

But yeah, I can use a if to do that.
I'll wait for reviewer's opinion.

Oops my bad. Sorry !

Don't be. Still thanks for your review, it's helpful. 😄

I would rather use a if and merge the tests

MechCoder · 2016-03-23T02:52:48Z

LGTM. @ogrisel quick second review?

MechCoder · 2016-03-24T18:03:55Z

@jnothman quick review?

TomDLT · 2016-03-25T16:55:55Z

sklearn/utils/sparsefuncs_fast.pyx

@@ -364,7 +370,7 @@ def assign_rows_csr(X,
    """Densify selected rows of a CSR matrix into a preallocated array.

    Like out[out_rows] = X[X_rows].toarray() but without copying.
-    Only supported for dtype=np.float64.
+    No-copy supported for both dtype=np.float32 and dtype=np.float64.


why did you change this?

Maybe it's not clear enough.
Original code will copy np.float32 data into np.float64.
Now it won't copy the data no matter it's np.float32 or np.float64

in assign_rows_csr?

Ah you are right.
Thanks for pointing out my big mistake!
That should be in this PR.

That's why it is good to have 2 reviewers ;)

haha yeah now I know 😄

yenchenlin · 2016-03-25T17:53:03Z

Hello @TomDLT ,
I've merged the tests and address the comment issue, please have a look thanks!

TomDLT · 2016-03-25T18:27:34Z

sklearn/utils/tests/test_sparsefuncs.py

+    for inplace_csr_row_normalize in (inplace_csr_row_normalize_l1,
+                                      inplace_csr_row_normalize_l2):
+        for dtype in (np.float64, np.float32):
+            X = rs.rand(10, 5).astype(dtype)


you should use randn to have negative values
That will show that you forgot the absolute value before the sum

yenchenlin · 2016-03-26T03:22:03Z

Hello @TomDLT ,
I've addressed the problems you mentioned, code updated.

jnothman · 2016-03-26T12:42:14Z

LGTM

jnothman · 2016-03-26T12:42:25Z

Thanks @yenchenlin1994

yenchenlin reviewed Mar 14, 2016
View reviewed changes

yenchenlin force-pushed the use-fused-type-in-inplace-normalize branch from 84decaf to 8ad1688 Compare March 16, 2016 03:47

yenchenlin force-pushed the use-fused-type-in-inplace-normalize branch 3 times, most recently from 1ce2f0d to cbe01a2 Compare March 16, 2016 07:02

MechCoder reviewed Mar 16, 2016
View reviewed changes

yenchenlin force-pushed the use-fused-type-in-inplace-normalize branch from cbe01a2 to 23e1bdf Compare March 22, 2016 09:31

maniteja123 reviewed Mar 22, 2016
View reviewed changes

MechCoder changed the title ~~[MRG] Use fused type in inplace normalize~~ [MRG+1] Use fused type in inplace normalize Mar 23, 2016

yenchenlin force-pushed the use-fused-type-in-inplace-normalize branch from 23e1bdf to 16829dc Compare March 23, 2016 02:58

TomDLT reviewed Mar 25, 2016
View reviewed changes

yenchenlin force-pushed the use-fused-type-in-inplace-normalize branch from 16829dc to 43b54e2 Compare March 25, 2016 17:44

TomDLT reviewed Mar 25, 2016
View reviewed changes

yenchenlin added 2 commits March 26, 2016 10:38

Use fused type in inplace normalize

e34bbc1

Test normalize function in data.py

08e1db4

yenchenlin force-pushed the use-fused-type-in-inplace-normalize branch from 43b54e2 to 08e1db4 Compare March 26, 2016 02:38

jnothman merged commit ea9896e into scikit-learn:master Mar 26, 2016

This was referenced Apr 14, 2016

[MRG+2] Use fused types in sparse mean variance functions #6593

Merged

[MRG] Make subfunctions of sparsefuncs private #6659

Closed

		assert_array_almost_equal(row_sums, ones)


		def test_normalize_l2():

Uh oh!

[MRG+1] Use fused type in inplace normalize #6539

[MRG+1] Use fused type in inplace normalize #6539

Uh oh!

Conversation

yenchenlin commented Mar 14, 2016

Uh oh!

raghavrv commented Mar 14, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MechCoder commented Mar 15, 2016

Uh oh!

MechCoder commented Mar 15, 2016

Uh oh!

yenchenlin commented Mar 16, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MechCoder commented Mar 21, 2016

Uh oh!

yenchenlin commented Mar 22, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MechCoder commented Mar 23, 2016

Uh oh!

MechCoder commented Mar 24, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yenchenlin commented Mar 25, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yenchenlin commented Mar 26, 2016

Uh oh!

jnothman commented Mar 26, 2016

Uh oh!

jnothman commented Mar 26, 2016

Uh oh!

Uh oh!