-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MNT Centralize common cython compiler directives #21512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNT Centralize common cython compiler directives #21512
Conversation
@@ -358,8 +341,7 @@ cdef class UnionFind(object): | |||
|
|||
return | |||
|
|||
@cython.boundscheck(False) | |||
@cython.nonecheck(False) | |||
@cython.wraparound(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method does negative indexing
@@ -423,6 +404,7 @@ cpdef np.ndarray[DTYPE_t, ndim=2] _single_linkage_label( | |||
return result_arr | |||
|
|||
|
|||
@cython.wraparound(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function does negative indexing
@@ -92,7 +90,7 @@ def transform(raw_X, Py_ssize_t n_features, dtype, | |||
indices_a = np.frombuffer(indices, dtype=np.int32) | |||
indptr_a = np.frombuffer(indptr, dtype=indices_np_dtype) | |||
|
|||
if indptr[-1] > np.iinfo(np.int32).max: # = 2**31 - 1 | |||
if indptr[len(indptr) - 1] > np.iinfo(np.int32).max: # = 2**31 - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function used to do negative indexing but since there's only one and it's for indexing the last element I find it better to replace by a usual indexing instead of adding wraparound for the whole function
@@ -278,7 +275,7 @@ cpdef sample_without_replacement(np.int_t n_population, | |||
|
|||
all_methods = ("auto", "tracking_selection", "reservoir_sampling", "pool") | |||
|
|||
ratio = n_samples / n_population if n_population != 0.0 else 1.0 | |||
ratio = <double> n_samples / n_population if n_population != 0.0 else 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function does a division between 2 integers. Setting cdivision=True would make it an integer division which is not what we want, so we use explicit casting.
I checked each division in each file where we didn't have the cdivision=True directive to make sure that there's no other case like this one.
@@ -76,7 +76,14 @@ def cythonize_extensions(top_path, config): | |||
compile_time_env={ | |||
"SKLEARN_OPENMP_PARALLELISM_ENABLED": sklearn._OPENMP_SUPPORTED | |||
}, | |||
compiler_directives={"language_level": 3}, | |||
compiler_directives={ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super cool thing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trusting the tests. LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When looking at Cython code, I often like to run:
cython -a sklearn/ensemble/_hist_gradient_boosting/_loss.pyx
to get an annotated HTML with Python interactions. With the directives moved to a python file, I see the following warnings:
FutureWarning: Cython directive 'language_level'
not set, using 2 for now (Py2). This will change in a later release! File: /Users/thomasfan/Repos/scikit-learn-3/sklearn/ensemble/_hist_gradi
ent_boosting/_loss.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
warning: sklearn/ensemble/_hist_gradient_boosting/_loss.pyx:32:38: Use boundscheck(False) for faster access
...
Do you see a good way to run cython -a
without getting these warnings?
@thomasjpfan: you can specify directives for Cython using the In scikit-learn case (and using the file you mentioned), one can use: cython -X language_level=3 \
-X boundscheck=False \
-X wraparound=False \
-X initializedcheck=False \
-X nonecheck=False \
-X cdivision=True \
-a sklearn/ensemble/_hist_gradient_boosting/_loss.pyx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a clear improvement for maintenance. 👍
There are still a few directives that are set locally, namely:
$ ag "cython:"
# ... Results stripped to only show relevant source files
sklearn/metrics/_pairwise_fast.pyx:1:#cython: boundscheck=False
sklearn/metrics/_pairwise_fast.pyx:2:#cython: cdivision=True
sklearn/metrics/_pairwise_fast.pyx:3:#cython: wraparound=False
sklearn/neighbors/_ball_tree.pyx:2:#cython: boundscheck=False
sklearn/neighbors/_ball_tree.pyx:3:#cython: wraparound=False
sklearn/neighbors/_ball_tree.pyx:4:#cython: cdivision=True
sklearn/neighbors/_ball_tree.pyx:5:#cython: initializedcheck=False
sklearn/neighbors/_kd_tree.pyx:2:#cython: boundscheck=False
sklearn/neighbors/_kd_tree.pyx:3:#cython: wraparound=False
sklearn/neighbors/_kd_tree.pyx:4:#cython: cdivision=True
sklearn/neighbors/_kd_tree.pyx:5:#cython: initializedcheck=False
sklearn/utils/_logistic_sigmoid.pyx:1:#cython: boundscheck=False
sklearn/utils/_logistic_sigmoid.pyx:2:#cython: cdivision=True
sklearn/utils/_logistic_sigmoid.pyx:3:#cython: wraparound=False
sklearn/utils/_weight_vector.pyx.tp:21:# cython: binding=False
good catch. I looked for |
I naively looked at some file and saw this. :) ba20b7c is treating all but |
This is on purpose. In this file I only left the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, assuming the CI is green.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you, @jeremiedbb.
There are a bunch of compiler directives that we almost always use and a few that we don't always use but could. This PR avoids repeating these directives in all cython files.
It's still possible to set or override a specific compiler directive at an extension level with the
# cython: <directive>
at the top of the file or at a function level with the@cython.<directive>
decorator.