[MRG+1] Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types #6430

ssaeger · 2016-02-23T10:53:34Z

As mentioned in #5973, #5776 or #5464 we could use fused types in Cython to work with float32 inputs without wasting memory by converting them internally to float64.

This PR implements fused types for KMeans/MiniBatchKMeans so that float32 input will result in using only float32 internally. Additionally it adds tests to ensure the desired data types are used.

Memory usage for float32 inputs with dimensions 500k x 20:
Before the code changes of this commit:

and after the code changes:

Unfortunately I was not able to add support to handle sparse float32 input data as float32 internally, so that this is still converted to float64. This would require many changes in sparsefuncs_fast.pyx.

ogrisel · 2016-03-14T19:18:23Z

Based on the cython doc, floating can only be float or double, therefore the else: raise ValueError blocks can never be reached.

Please replace those if floating is float / elif floating is double / else constructs by if floating is float / else constructs instead.

ogrisel · 2016-03-14T19:18:58Z

Please also feel free to squash those commits.

ogrisel · 2016-03-14T19:21:23Z

sklearn/cluster/_k_means.pyx

+    elif floating is double:
+        centers = np.zeros((n_clusters, n_features), dtype=np.float64)
+    else:
+        raise ValueError("Unknown floating type.")


The following should work no?

centers = np.zeros((n_clusters, n_features), dtype=X.dtype)

Similar remark for the allocation of the center_squared_norms and centers arrays in other functions of this file.

@ogrisel This change works on my machine with Ubuntu, but leads to failing appveyor builds.

ogrisel · 2016-03-14T19:23:27Z

Other than that, LGTM, +1 for merge with a new entry in doc/whats_new.rst.

ssaeger · 2016-03-15T18:56:40Z

Thanks for your review and the comments.
I squashed the old commits and addressed your comments.

ogrisel · 2016-03-16T16:09:57Z

Please squash the new commit as well. Commits with a message such as "Adresses several comments" are not interesting when reviewing the history of a file 6 months from now :)

ssaeger · 2016-03-16T16:42:59Z

Ok, thanks, I squashed them. :)

ogrisel · 2016-03-16T17:05:39Z

sklearn/cluster/_k_means.pyx

+    if floating is float:
+        centers = np.zeros((n_clusters, n_features), dtype=np.float32)
+    else:
+        centers = np.zeros((n_clusters, n_features), dtype=np.float64)


This could be further simplified as:

centers = np.zeros((n_clusters, n_features), dtype=X.dtype)

no?

I tried that and it works on my machine with Ubuntu, but leads to failing appveyor builds.
I don't know exactly why this happens.

That is really weird. It would be worth investigating with a debugger under windows. This might be a bug in Cython / MSVC or even numpy. But we can leave it as it is for now in this PR.

yenchenlin · 2016-03-19T15:23:05Z

Hello @ssaeger ,
would you please elaborate on

Unfortunately I was not able to add support to handle sparse float32 input data as float32 internally, so that this is still converted to float64. This would require many changes in sparsefuncs_fast.pyx.

What do you mean by internally?
Thanks

ssaeger · 2016-03-19T15:46:43Z

Hi @yenchenlin1994 ,
at the moment sparsefuncs_fast.pyx only accepts np.float64 and I did not want to make changes to that in this PR since it appeared to me that this would be a change that affects other parts too and is not only related to this PR so that a separate PR would be simpler to review.

With internally I mean that if the user inputs sparse data with dtype np.float32 it will be converted to np.float64 in order to use the functionality of sparsefuncs_fast.pyx. So it is not visible to the user and happens internally without the user noticing it.

I hope this makes it a bit more clear. :)

…2 to save memory

yenchenlin · 2016-03-20T02:30:06Z

@ssaeger great thanks for your explanation.
Super clear :) 👍

MechCoder · 2016-03-29T01:05:35Z

It would be great to wait for #6593

MechCoder · 2016-04-17T06:41:23Z

Seems like we still need add_row_csr to support both dtypes.

yenchenlin · 2016-04-17T06:48:18Z

Yeah ... I'll do that.

yenchenlin · 2016-04-17T07:15:59Z

Hello @MechCoder ,
It seems like add_row_csr is a cdef function right now.
And base on the discussion here, we can only make a function support fused types by turning it into a def function.

Am I on the right track?
Or we need to do a benchmark to evaluate this memory-speed trade-off?

jnothman · 2016-04-17T09:32:37Z

It seems like add_row_csr is a cdef function right now. And base on the discussion here, we can only make a function support fused types by turning it into a def function.

It's actually possible that we could have used a cdef there (though the benefit is not as great), but I'm fairly sure we can here if we used typed memoryviews instead of np.ndarrays. For example, this compiles:

from cython cimport floating

cdef floating a1(floating[:] x):
    return x[0]

def b(X, floating y):
    cdef floating[:] X_data = X.data
    return a1(X_data)

jnothman · 2016-04-17T09:33:50Z

(Also, should we be doing the fused typing of CSR/CSC sparse matrix indices and indptr at the same time as we make these changes? Should we be adding nogil where appropriate?)

yenchenlin · 2016-04-17T12:58:42Z

Hello @jnothman , I see the code you paste compiles.

though the benefit is not as great

But I wonder why the benefit there is not as great as here?
Isn't cdef faster than def?

jnothman · 2016-04-17T13:07:29Z

Because you're going from Cython to Cython here not from Python to Cython as there; and because here it is called repeatedly, while calculating means is not so often repeated.

yenchenlin · 2016-04-17T16:19:36Z

Because you're going from Cython to Cython here not from Python to Cython as there; and because here it is called repeatedly, while calculating means is not so often repeated

Thanks a lot! Now I understand 😄

So, do you mean I can simply replace np.ndarray[np.float64_t, ndim=1] data with floating[:] data in add_row_csr?
However, that didn't compile, I think I probably miss something.

(I just learned memoryviews, sorry if this is obvious)

MechCoder · 2016-04-18T23:42:43Z

I actually removed the function here (#6676)

ogrisel · 2016-04-19T07:36:39Z

Note about memory views: they tend to not work correctly with readonly memory buffers:

https://mail.python.org/pipermail/cython-devel/2013-February/003384.html

See the discussion in pandas-dev/pandas#10043 for an example on how this can cause problems when working with readonly mapped data.

If we can I think we should try to stick to the ndarray cython dtype whenever this is a user-provided numpy array to make it possible to use memory mapped data (and thus joblib parallel).

MechCoder · 2016-04-21T04:18:57Z

@yenchenlin1994 Since Olivier just said that using memoryviews would break joblib parallelism. You have two options.

Make the cdef in sparsefuncs_fast.pyx a def. Intuitively this should cause speed regressions. You should benchmark thoroughly to see the differences in speed if any.
Continue my work at [MRG+1] Disable cython checks in _centers_sparse #6677 and [MRG+2] MAINT: Remove add_row_csr #6676 to remove the function all together. In that case Joel's comments on the PR's would be more than helpful.

yenchenlin · 2016-04-21T04:25:46Z

@MechCoder thanks for your help!
Will first try 1 and adopt 2 if speed decline alot.
What do you think?

MechCoder · 2016-04-21T04:26:45Z

The first option seems to be the easier to try out.

MechCoder · 2016-05-26T02:59:49Z

@ssaeger It seems we have fused types support for all sparsefuncs. Would you be able to rebase or would you prefer @yenchenlin1994 to do the rebase and testing for the sparse case as part of GSoC?

ssaeger · 2016-05-26T15:05:57Z

@MechCoder At the moment I'm very busy. Therefore I would prefer @yenchenlin1994 to continue the work on this.

yenchenlin · 2016-05-26T23:45:31Z

Sure!

@ssaeger thanks for your hard work.

yenchenlin · 2016-05-30T11:20:33Z

Hello @ssaeger , really sorry to interrupt you.
Can you provide the memory profiling script which generates the figures above?

ssaeger · 2016-05-30T12:30:41Z

hey, no problem 😃

import numpy as np
from sklearn.cluster import KMeans

@profile
def fit_est():
    estimator.fit(X)

np.random.seed(5)
X = np.random.rand(500000, 20)
X = np.float32(X)

estimator = KMeans()
fit_est()

I used following memory profiler: https://pypi.python.org/pypi/memory_profiler
Just install it and use it with:
mprof run <executable>
mprof plot

jnothman · 2016-05-30T16:35:56Z

memory_profiler is covered at http://scikit-learn.org/dev/developers/performance.html. Perhaps you should familiarise yourself with the rest of the tips there (including some for Cython, but perhaps outdated).

yenchenlin · 2016-05-30T16:46:29Z

Thanks @ssaeger and @jnothman for the inputs.
Will go through those tips!

yenchenlin · 2016-05-31T15:15:26Z

Hello guys, I've created #6846 to replace this PR, maybe we can close this one?

MechCoder changed the title ~~Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types~~ [MRG] Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types Feb 29, 2016

This was referenced Mar 14, 2016

[MRG+1] Use fused type in inplace normalize #6539

Merged

Coordinate descent should work on float32 data (in addition to float64 data) #5464

Closed

ogrisel reviewed Mar 14, 2016
View reviewed changes

ogrisel changed the title ~~[MRG] Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types~~ [MRG+1] Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types Mar 14, 2016

ssaeger force-pushed the fused_types branch 2 times, most recently from ec80989 to cebb687 Compare March 15, 2016 18:53

ssaeger force-pushed the fused_types branch 2 times, most recently from 8ae6d1f to 07dcec5 Compare March 16, 2016 09:02

ssaeger force-pushed the fused_types branch from 07dcec5 to 05c5af5 Compare March 16, 2016 16:39

ogrisel added the Waiting for Reviewer label Mar 16, 2016

ogrisel reviewed Mar 16, 2016
View reviewed changes

Adds support and tests for KMeans/MiniBatchKMeans to work with float3…

f278d9f

…2 to save memory

ssaeger force-pushed the fused_types branch from 05c5af5 to f278d9f Compare March 19, 2016 20:21

yenchenlin mentioned this pull request Apr 16, 2016

[MRG+2] Use fused types in sparse mean variance functions #6593

Merged

yenchenlin mentioned this pull request May 22, 2016

[MRG+2] MAINT: Remove add_row_csr #6676

Merged

This was referenced May 31, 2016

[WIP] Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types yenchenlin/scikit-learn#1

Closed

[MRG+1] Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types #6846

Merged

TomDLT closed this May 31, 2016

[MRG+1] Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types #6430

[MRG+1] Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types #6430

Conversation

ssaeger commented Feb 23, 2016

ogrisel commented Mar 14, 2016

ogrisel commented Mar 14, 2016

ogrisel Mar 14, 2016

Choose a reason for hiding this comment

ogrisel Mar 14, 2016

Choose a reason for hiding this comment

ssaeger Mar 16, 2016

Choose a reason for hiding this comment

ogrisel commented Mar 14, 2016

ssaeger commented Mar 15, 2016

ogrisel commented Mar 16, 2016

ssaeger commented Mar 16, 2016

ogrisel Mar 16, 2016

Choose a reason for hiding this comment

ssaeger Mar 16, 2016

Choose a reason for hiding this comment

ogrisel Apr 17, 2016

Choose a reason for hiding this comment

yenchenlin commented Mar 19, 2016

ssaeger commented Mar 19, 2016

yenchenlin commented Mar 20, 2016

MechCoder commented Mar 29, 2016

MechCoder commented Apr 17, 2016

yenchenlin commented Apr 17, 2016

yenchenlin commented Apr 17, 2016 • edited Loading

jnothman commented Apr 17, 2016

jnothman commented Apr 17, 2016

yenchenlin commented Apr 17, 2016 • edited Loading

jnothman commented Apr 17, 2016

yenchenlin commented Apr 17, 2016 • edited Loading

MechCoder commented Apr 18, 2016

ogrisel commented Apr 19, 2016

MechCoder commented Apr 21, 2016

yenchenlin commented Apr 21, 2016

MechCoder commented Apr 21, 2016

MechCoder commented May 26, 2016

ssaeger commented May 26, 2016

yenchenlin commented May 26, 2016

yenchenlin commented May 30, 2016

ssaeger commented May 30, 2016

jnothman commented May 30, 2016

yenchenlin commented May 30, 2016

yenchenlin commented May 31, 2016 • edited Loading

yenchenlin commented Apr 17, 2016 •

edited

Loading

yenchenlin commented Apr 17, 2016 •

edited

Loading

yenchenlin commented Apr 17, 2016 •

edited

Loading

yenchenlin commented May 31, 2016 •

edited

Loading