-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
MAINT Remove -Wcpp
warnings when compiling sklearn.manifold._utils
#24925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/manifold/_utils.pyx
Outdated
def _binary_search_perplexity( | ||
cnp.float32_t[:, :] sqdistances, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _binary_search_perplexity( | |
cnp.float32_t[:, :] sqdistances, | |
cpdef float[:, :] _binary_search_perplexity( | |
const float[:, :] sqdistances, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this function is never used in cython but always in python, it's better to make it a pure def
@@ -63,7 +63,7 @@ cpdef cnp.ndarray[cnp.float32_t, ndim=2] _binary_search_perplexity( | |||
|
|||
# This array is later used as a 32bit array. It has multiple intermediate | |||
# floating point additions that benefit from the extra precision | |||
cdef cnp.ndarray[cnp.float64_t, ndim=2] P = np.zeros( | |||
cdef cnp.float64_t[:, :] P = np.zeros( | |||
(n_samples, n_neighbors), dtype=np.float64) | |||
|
|||
for i in range(n_samples): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to change the return line by:
return P.astype(np.float32)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the current behavior. Currently it returns a float64 array (this is because it's using the def part of the function which doesn't care about the return type)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather keep the current behavior
Included the suggested changes. Edit: Some tests are still failing as the memoryviewslice has no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @rprkh.
Apart from a few comments regarding dtype preservation, everything LGTM.
sklearn/manifold/_utils.pyx
Outdated
@@ -118,4 +116,4 @@ cpdef cnp.ndarray[cnp.float32_t, ndim=2] _binary_search_perplexity( | |||
if verbose: | |||
print("[t-SNE] Mean sigma: %f" | |||
% np.mean(math.sqrt(n_samples / beta_sum))) | |||
return P | |||
return np.array(P, dtype=np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think preserving dtype might better be treated in another PR.
For now, I propose just returning the numpy array using the base
attribute of the memoryview.
return np.array(P, dtype=np.float32) | |
return P.base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was discussed somewhere that np.asarray should be preferred because .base
can point to a larger array than expected (we can have a view on only a subset of the array)
70e5412
to
6505d3d
Compare
Included the suggestions. |
sklearn/manifold/_utils.pyx
Outdated
cdef double[:, :] P = np.zeros( | ||
(n_samples, n_neighbors), dtype=np.float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the failing tests are related to the dtype of P over here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @rprkh is right in https://github.com/scikit-learn/scikit-learn/pull/24925/files#r1028804071: P is allocated with float64
but returned (and implicitly cast to float32
).
That's the first time I stumble upon this behavior in Cython.
For now, I would simply cast it explicitly using what @glemaitre proposed earlier and add a TODO
comment indicating that this function both support float32
and float64
and preserves inputs' dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM given that the CI passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @rprkh
This was my first time working with Cython. Thanks for the reviews and suggestions @glemaitre @jjerphan @jeremiedbb. I look forward to contributing more in the future. |
Glad to read that! Feel free to pick other similar issues or to ask if you want to work on something more involved. |
scikit-learn#24925) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr>
scikit-learn#24925) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr>
Reference Issues/PRs
Towards #24875
What does this implement/fix? Explain your changes.
Using memory views in place of the deprecated cnp.ndarray.
Any other comments?