Skip to content

[MRG+1] Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types #6430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ssaeger
Copy link

@ssaeger ssaeger commented Feb 23, 2016

As mentioned in #5973, #5776 or #5464 we could use fused types in Cython to work with float32 inputs without wasting memory by converting them internally to float64.

This PR implements fused types for KMeans/MiniBatchKMeans so that float32 input will result in using only float32 internally. Additionally it adds tests to ensure the desired data types are used.

Memory usage for float32 inputs with dimensions 500k x 20:
Before the code changes of this commit:
mem_prof_500k_20_32_old_2
and after the code changes:
mem_prof_500k_20_32_new_2

Unfortunately I was not able to add support to handle sparse float32 input data as float32 internally, so that this is still converted to float64. This would require many changes in sparsefuncs_fast.pyx.

@MechCoder MechCoder changed the title Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types [MRG] Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types Feb 29, 2016
@ogrisel
Copy link
Member

ogrisel commented Mar 14, 2016

Based on the cython doc, floating can only be float or double, therefore the else: raise ValueError blocks can never be reached.

Please replace those if floating is float / elif floating is double / else constructs by if floating is float / else constructs instead.

@ogrisel
Copy link
Member

ogrisel commented Mar 14, 2016

Please also feel free to squash those commits.

elif floating is double:
centers = np.zeros((n_clusters, n_features), dtype=np.float64)
else:
raise ValueError("Unknown floating type.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following should work no?

centers = np.zeros((n_clusters, n_features), dtype=X.dtype)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar remark for the allocation of the center_squared_norms and centers arrays in other functions of this file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel This change works on my machine with Ubuntu, but leads to failing appveyor builds.

@ogrisel
Copy link
Member

ogrisel commented Mar 14, 2016

Other than that, LGTM, +1 for merge with a new entry in doc/whats_new.rst.

@ogrisel ogrisel changed the title [MRG] Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types [MRG+1] Allows KMeans/MiniBatchKMeans to use float32 internally by using cython fused types Mar 14, 2016
@ssaeger ssaeger force-pushed the fused_types branch 2 times, most recently from ec80989 to cebb687 Compare March 15, 2016 18:53
@ssaeger
Copy link
Author

ssaeger commented Mar 15, 2016

Thanks for your review and the comments.
I squashed the old commits and addressed your comments.

@ssaeger ssaeger force-pushed the fused_types branch 2 times, most recently from 8ae6d1f to 07dcec5 Compare March 16, 2016 09:02
@ogrisel
Copy link
Member

ogrisel commented Mar 16, 2016

Please squash the new commit as well. Commits with a message such as "Adresses several comments" are not interesting when reviewing the history of a file 6 months from now :)

@ssaeger
Copy link
Author

ssaeger commented Mar 16, 2016

Ok, thanks, I squashed them. :)

if floating is float:
centers = np.zeros((n_clusters, n_features), dtype=np.float32)
else:
centers = np.zeros((n_clusters, n_features), dtype=np.float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be further simplified as:

centers = np.zeros((n_clusters, n_features), dtype=X.dtype)

no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that and it works on my machine with Ubuntu, but leads to failing appveyor builds.
I don't know exactly why this happens.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is really weird. It would be worth investigating with a debugger under windows. This might be a bug in Cython / MSVC or even numpy. But we can leave it as it is for now in this PR.

@yenchenlin
Copy link
Contributor

Hello @ssaeger ,
would you please elaborate on

Unfortunately I was not able to add support to handle sparse float32 input data as float32 internally, so that this is still converted to float64. This would require many changes in sparsefuncs_fast.pyx.

What do you mean by internally?
Thanks

@ssaeger
Copy link
Author

ssaeger commented Mar 19, 2016

Hi @yenchenlin1994 ,
at the moment sparsefuncs_fast.pyx only accepts np.float64 and I did not want to make changes to that in this PR since it appeared to me that this would be a change that affects other parts too and is not only related to this PR so that a separate PR would be simpler to review.

With internally I mean that if the user inputs sparse data with dtype np.float32 it will be converted to np.float64 in order to use the functionality of sparsefuncs_fast.pyx. So it is not visible to the user and happens internally without the user noticing it.

I hope this makes it a bit more clear. :)

@yenchenlin
Copy link
Contributor

@ssaeger great thanks for your explanation.
Super clear :) 👍

@MechCoder
Copy link
Member

It would be great to wait for #6593

@MechCoder
Copy link
Member

Seems like we still need add_row_csr to support both dtypes.

@yenchenlin
Copy link
Contributor

Yeah ... I'll do that.

@yenchenlin
Copy link
Contributor

yenchenlin commented Apr 17, 2016

Hello @MechCoder ,
It seems like add_row_csr is a cdef function right now.
And base on the discussion here, we can only make a function support fused types by turning it into a def function.

Am I on the right track?
Or we need to do a benchmark to evaluate this memory-speed trade-off?

@jnothman
Copy link
Member

It seems like add_row_csr is a cdef function right now. And base on the discussion here, we can only make a function support fused types by turning it into a def function.

It's actually possible that we could have used a cdef there (though the benefit is not as great), but I'm fairly sure we can here if we used typed memoryviews instead of np.ndarrays. For example, this compiles:

from cython cimport floating

cdef floating a1(floating[:] x):
    return x[0]

def b(X, floating y):
    cdef floating[:] X_data = X.data
    return a1(X_data)

@jnothman
Copy link
Member

(Also, should we be doing the fused typing of CSR/CSC sparse matrix indices and indptr at the same time as we make these changes? Should we be adding nogil where appropriate?)

@yenchenlin
Copy link
Contributor

yenchenlin commented Apr 17, 2016

Hello @jnothman , I see the code you paste compiles.

though the benefit is not as great

But I wonder why the benefit there is not as great as here?
Isn't cdef faster than def?

@jnothman
Copy link
Member

Because you're going from Cython to Cython here not from Python to Cython as there; and because here it is called repeatedly, while calculating means is not so often repeated.

@yenchenlin
Copy link
Contributor

yenchenlin commented Apr 17, 2016

Because you're going from Cython to Cython here not from Python to Cython as there; and because here it is called repeatedly, while calculating means is not so often repeated

Thanks a lot! Now I understand 😄

So, do you mean I can simply replace np.ndarray[np.float64_t, ndim=1] data with floating[:] data in add_row_csr?
However, that didn't compile, I think I probably miss something.

(I just learned memoryviews, sorry if this is obvious)

@MechCoder
Copy link
Member

I actually removed the function here (#6676)

@ogrisel
Copy link
Member

ogrisel commented Apr 19, 2016

Note about memory views: they tend to not work correctly with readonly memory buffers:

https://mail.python.org/pipermail/cython-devel/2013-February/003384.html

See the discussion in pandas-dev/pandas#10043 for an example on how this can cause problems when working with readonly mapped data.

If we can I think we should try to stick to the ndarray cython dtype whenever this is a user-provided numpy array to make it possible to use memory mapped data (and thus joblib parallel).

@MechCoder
Copy link
Member

@yenchenlin1994 Since Olivier just said that using memoryviews would break joblib parallelism. You have two options.

  1. Make the cdef in sparsefuncs_fast.pyx a def. Intuitively this should cause speed regressions. You should benchmark thoroughly to see the differences in speed if any.
  2. Continue my work at [MRG+1] Disable cython checks in _centers_sparse #6677 and [MRG+2] MAINT: Remove add_row_csr #6676 to remove the function all together. In that case Joel's comments on the PR's would be more than helpful.

@yenchenlin
Copy link
Contributor

@MechCoder thanks for your help!
Will first try 1 and adopt 2 if speed decline alot.
What do you think?

@MechCoder
Copy link
Member

The first option seems to be the easier to try out.

@MechCoder
Copy link
Member

@ssaeger It seems we have fused types support for all sparsefuncs. Would you be able to rebase or would you prefer @yenchenlin1994 to do the rebase and testing for the sparse case as part of GSoC?

@ssaeger
Copy link
Author

ssaeger commented May 26, 2016

@MechCoder At the moment I'm very busy. Therefore I would prefer @yenchenlin1994 to continue the work on this.

@yenchenlin
Copy link
Contributor

Sure!

@ssaeger thanks for your hard work.

@yenchenlin
Copy link
Contributor

Hello @ssaeger , really sorry to interrupt you.
Can you provide the memory profiling script which generates the figures above?

@ssaeger
Copy link
Author

ssaeger commented May 30, 2016

hey, no problem 😃

import numpy as np
from sklearn.cluster import KMeans

@profile
def fit_est():
    estimator.fit(X)

np.random.seed(5)
X = np.random.rand(500000, 20)
X = np.float32(X)

estimator = KMeans()
fit_est()

I used following memory profiler: https://pypi.python.org/pypi/memory_profiler
Just install it and use it with:
mprof run <executable>
mprof plot

@jnothman
Copy link
Member

memory_profiler is covered at http://scikit-learn.org/dev/developers/performance.html. Perhaps you should familiarise yourself with the rest of the tips there (including some for Cython, but perhaps outdated).

@yenchenlin
Copy link
Contributor

Thanks @ssaeger and @jnothman for the inputs.
Will go through those tips!

@yenchenlin
Copy link
Contributor

yenchenlin commented May 31, 2016

Hello guys, I've created #6846 to replace this PR, maybe we can close this one?

@TomDLT TomDLT closed this May 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants