Skip to content

MNT Centralize common cython compiler directives #21512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 3, 2021

Conversation

jeremiedbb
Copy link
Member

There are a bunch of compiler directives that we almost always use and a few that we don't always use but could. This PR avoids repeating these directives in all cython files.

It's still possible to set or override a specific compiler directive at an extension level with the # cython: <directive> at the top of the file or at a function level with the @cython.<directive> decorator.

@@ -358,8 +341,7 @@ cdef class UnionFind(object):

return

@cython.boundscheck(False)
@cython.nonecheck(False)
@cython.wraparound(True)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method does negative indexing

@@ -423,6 +404,7 @@ cpdef np.ndarray[DTYPE_t, ndim=2] _single_linkage_label(
return result_arr


@cython.wraparound(True)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does negative indexing

@@ -92,7 +90,7 @@ def transform(raw_X, Py_ssize_t n_features, dtype,
indices_a = np.frombuffer(indices, dtype=np.int32)
indptr_a = np.frombuffer(indptr, dtype=indices_np_dtype)

if indptr[-1] > np.iinfo(np.int32).max: # = 2**31 - 1
if indptr[len(indptr) - 1] > np.iinfo(np.int32).max: # = 2**31 - 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function used to do negative indexing but since there's only one and it's for indexing the last element I find it better to replace by a usual indexing instead of adding wraparound for the whole function

@@ -278,7 +275,7 @@ cpdef sample_without_replacement(np.int_t n_population,

all_methods = ("auto", "tracking_selection", "reservoir_sampling", "pool")

ratio = n_samples / n_population if n_population != 0.0 else 1.0
ratio = <double> n_samples / n_population if n_population != 0.0 else 1.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function does a division between 2 integers. Setting cdivision=True would make it an integer division which is not what we want, so we use explicit casting.

I checked each division in each file where we didn't have the cdivision=True directive to make sure that there's no other case like this one.

@@ -76,7 +76,14 @@ def cythonize_extensions(top_path, config):
compile_time_env={
"SKLEARN_OPENMP_PARALLELISM_ENABLED": sklearn._OPENMP_SUPPORTED
},
compiler_directives={"language_level": 3},
compiler_directives={
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super cool thing :)

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trusting the tests. LGTM

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When looking at Cython code, I often like to run:

cython -a sklearn/ensemble/_hist_gradient_boosting/_loss.pyx 

to get an annotated HTML with Python interactions. With the directives moved to a python file, I see the following warnings:

FutureWarning: Cython directive 'language_level'
 not set, using 2 for now (Py2). This will change in a later release! File: /Users/thomasfan/Repos/scikit-learn-3/sklearn/ensemble/_hist_gradi
ent_boosting/_loss.pyx                                                                                                                        
  tree = Parsing.p_module(s, pxd, full_module_name)                                                                                           
warning: sklearn/ensemble/_hist_gradient_boosting/_loss.pyx:32:38: Use boundscheck(False) for faster access
...

Do you see a good way to run cython -a without getting these warnings?

@jjerphan
Copy link
Member

jjerphan commented Nov 2, 2021

@thomasjpfan: you can specify directives for Cython using the --directive flag or its shorthand: -X.

In scikit-learn case (and using the file you mentioned), one can use:

cython -X language_level=3 \
       -X boundscheck=False \
       -X wraparound=False \
       -X initializedcheck=False \
       -X nonecheck=False \
       -X cdivision=True \
       -a sklearn/ensemble/_hist_gradient_boosting/_loss.pyx

@jjerphan jjerphan self-requested a review November 2, 2021 14:24
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clear improvement for maintenance. 👍

There are still a few directives that are set locally, namely:

$ ag "cython:"
# ... Results stripped to only show relevant source files
sklearn/metrics/_pairwise_fast.pyx:1:#cython: boundscheck=False
sklearn/metrics/_pairwise_fast.pyx:2:#cython: cdivision=True
sklearn/metrics/_pairwise_fast.pyx:3:#cython: wraparound=False
sklearn/neighbors/_ball_tree.pyx:2:#cython: boundscheck=False
sklearn/neighbors/_ball_tree.pyx:3:#cython: wraparound=False
sklearn/neighbors/_ball_tree.pyx:4:#cython: cdivision=True
sklearn/neighbors/_ball_tree.pyx:5:#cython: initializedcheck=False
sklearn/neighbors/_kd_tree.pyx:2:#cython: boundscheck=False
sklearn/neighbors/_kd_tree.pyx:3:#cython: wraparound=False
sklearn/neighbors/_kd_tree.pyx:4:#cython: cdivision=True
sklearn/neighbors/_kd_tree.pyx:5:#cython: initializedcheck=False
sklearn/utils/_logistic_sigmoid.pyx:1:#cython: boundscheck=False
sklearn/utils/_logistic_sigmoid.pyx:2:#cython: cdivision=True
sklearn/utils/_logistic_sigmoid.pyx:3:#cython: wraparound=False
sklearn/utils/_weight_vector.pyx.tp:21:# cython: binding=False

@jeremiedbb
Copy link
Member Author

jeremiedbb commented Nov 3, 2021

There are still a few directives that are set locally, namely:

good catch. I looked for # cython but not for #cython :)

@jjerphan
Copy link
Member

jjerphan commented Nov 3, 2021

There are still a few directives that are set locally, namely:

good catch. I looked for # cython but not for #cython :)

I naively looked at some file and saw this. :)

ba20b7c is treating all but sklearn/utils/_weight_vector.pyx.tp. Apart from that, this looks good to me.

@jeremiedbb
Copy link
Member Author

ba20b7c is treating all but sklearn/utils/_weight_vector.pyx.tp. Apart from that, this looks good to me.

This is on purpose. In this file I only left the binding=False directive which was introduced by you. It's the only place we have it so I did not set it globally.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, assuming the CI is green.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you, @jeremiedbb.

@jeremiedbb jeremiedbb merged commit 1f8825c into scikit-learn:main Nov 3, 2021
lorentzenchr added a commit to lorentzenchr/scikit-learn that referenced this pull request Nov 21, 2021
lorentzenchr added a commit to lorentzenchr/scikit-learn that referenced this pull request Nov 21, 2021
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Nov 29, 2021
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants