Skip to content

Conversation

OmarManzoor
Copy link
Contributor

Reference Issues/PRs

Towards #24875

What does this implement/fix? Explain your changes.

  • Fix compilation warnings in sklearn.cluster._dbscan_inner by using memory views in place of the deprecated cnp.ndarray.

Any other comments?

@glemaitre glemaitre self-requested a review November 10, 2022 18:09
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to include sklearn.clusters._dbscan_inner in the setup.py and more precisely in the tuple: USE_NEWEST_NUMPY_C_API

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you, @OmarManzoor.

Was this accessible or quite easy to you?

Feel free to create similar PRs for #24875 or more advanced ones (we might have little Cython-related issues, yet many things can be improved in implementations, e.g. resolving TODOs).

@OmarManzoor
Copy link
Contributor Author

OmarManzoor commented Nov 11, 2022

LGTM! Thank you, @OmarManzoor.

Was this accessible or quite easy to you?

Feel free to create similar PRs for #24875 or more advanced ones (we might have little Cython-related issues, yet many things can be improved in implementations, e.g. resolving TODOs).

@jjerphan I think this was a good issue to start with. Thank you for creating it. Also thank you for sharing the document link. I did review two other pages on Cython along the lines of memory views.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

@@ -5,14 +5,12 @@
from libcpp.vector cimport vector
cimport numpy as cnp

cnp.import_array()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we still use cnp in this file, it is safer to keep cnp.import_array(). This is the recommendation from Cython as well:

https://github.com/cython/cython/blob/363780535e7bfe8adb1b858feb3fccbdf3434d97/docs/examples/tutorial/numpy/convolve2.pyx#L12-L16

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this!

As of Cython>=3.0, it looks like not calling cnp.import_array() won't be a problem because this piece of C-code (which should be injected at runtime takes care of calling import_array from NumPy. Still not calling cnp.import_array() might raise warnings, so let's keep this instruction as you propose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we bring back the cnp.import_array() should we remove sklearn.clusters._dbscan_inner from the USE_NEWEST_NUMPY_C_API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not calling cnp.import_array is legal and doable if we only use cnp to access types (e.g. cnp.float64_t), which we should be able to here.

Moreover, dbscan_inner is still using NumPy array via cnp.ndarray. Is it possible to use memoryview instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjerphan Sorry about that. When I reverted the removal of cnp.import_array I accidentally reverted the other changes too.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually changing my review based on the comment made by @thomasjpfan: to me, the reliance on calling cnp.import_array still needs to be discussed in the last unresolved thread.

@@ -5,14 +5,12 @@
from libcpp.vector cimport vector
cimport numpy as cnp

cnp.import_array()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not calling cnp.import_array is legal and doable if we only use cnp to access types (e.g. cnp.float64_t), which we should be able to here.

Moreover, dbscan_inner is still using NumPy array via cnp.ndarray. Is it possible to use memoryview instead.

@jjerphan
Copy link
Member

I leave @thomasjpfan or @glemaitre merge since they also have reviewed this PR.

@glemaitre glemaitre self-requested a review November 17, 2022 10:47
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan merged commit 29242d1 into scikit-learn:main Nov 22, 2022
@OmarManzoor OmarManzoor deleted the cython_dbscan_inner branch November 23, 2022 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants