Skip to content

Conversation

OmarManzoor
Copy link
Contributor

@OmarManzoor OmarManzoor commented Nov 11, 2022

Reference Issues/PRs

Towards #24875

What does this implement/fix? Explain your changes.

  • Removed -Wsign-compare warnings when compiling sklearn.linear_model._cd_fast

Any other comments?

@OmarManzoor
Copy link
Contributor Author

@jjerphan For the warning https://gist.github.com/jjerphan/8cfbb5349f5680856460ff80152ab69d#file-full-trace-log-L429
it is basically related to this line of code https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_random.pxd#L43. It is mentioned here that we should be careful when casting this. Could you kindly guide on this one? Thank you.

@glemaitre glemaitre self-requested a review November 12, 2022 10:05
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @OmarManzoor for this PR and the observation you made.

I think it is preferable to restrain the scope of this PR to only remove -Wsign-compare warnings for two reasons:

  • this PR addresses two goals at the moment (removal of two kind of warning). For scikit-learn, we prefer having independent and atomic goals (and changes) is more readable and understanble in logs, and is more easily reviewable.
  • all the -Woverflow warnings when compiling the whole code-base are coming from the current inline sklearn.utils._random.our_rand_r function, and their removal might better be addressed in a dedicated PR.

Can you please restrain the scope of this PR to the removal of -Wsign-compare warnings solely? Thank you.

Here is a comment to address it better.

@@ -171,7 +171,7 @@ def enet_coordinate_descent(
# tol *= np.dot(y, y)
tol *= _dot(n_samples, &y[0], 1, &y[0], 1)

for n_iter in range(max_iter):
for n_iter in range(<Py_ssize_t>max_iter):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is preferable to change the type of n_iter to the one of max_iter (i.e. int) instead of casting max_iter to Py_ssize_t here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for SGD, max_iter will be validated by the Python class to be strictly positive.
In terms of semantic, it is more correct to use unsigned and the casting will be safe.
It can be done once for all in the signature of the function.

@OmarManzoor OmarManzoor changed the title MAINT Remove -Wsign-compare and -Woverflow warnings when compiling sklearn.linear_model._cd_fast MAINT Remove -Wsign-compare when compiling sklearn.linear_model._cd_fast Nov 14, 2022
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Nov 14, 2022
This removes the -Woverflow warnings observed
when building scikit-learn.

RAND_R_MAX is the max value for uint8, incrementing
it causes an overflow (hence the warning).

I think this commit fixes the implementation, yet
I comes with a backwards incompatible results and
tests for implementation relying on `our_rand_r`
fails because results are now different.

I see several alternatives to remove the warning while
having tests pass

  - prefered solution: adapt the test suite using the
    new results so that all tests pass and ackowledge
    the change of behavior for impacted user-facing APIs
    in the changelog
  - accept the quirk of this implementation but hardcode
    and rename the effective constant
  - silent the -Woverflow warning by another mean

Relates to:
scikit-learn#13422
scikit-learn#24895
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Nov 14, 2022
This removes the -Woverflow warnings observed
when building scikit-learn.

RAND_R_MAX is the max value for uint8, incrementing
it causes an overflow (hence the warning).

Elements were originaly mentionned in
scikit-learn#13422 (comment)
but left unreviewed, it seems.

I think this commit fixes the implementation, yet
I comes with a backwards incompatible results and
tests for implementation relying on `our_rand_r`
fails because results are now different.

I see several alternatives to remove the warning while
having tests pass

  - prefered solution: adapt the test suite using the
    new results so that all tests pass and ackowledge
    the change of behavior for impacted user-facing APIs
    in the changelog
  - accept the quirk of this implementation but hardcode
    and rename the effective constant
  - silent the -Woverflow warning by another mean

Relates to:
scikit-learn#13422
scikit-learn#24895
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Nov 14, 2022
This removes the -Woverflow warnings observed
when building scikit-learn.

RAND_R_MAX is the max value for uint8, incrementing
it causes an overflow (hence the warning).

Elements were originally mentioned but seem to
have been left unreviewed, see:
scikit-learn#13422 (comment)

I think this commit fixes the implementation, yet
I comes with a backwards incompatible results and
tests for implementation relying on `our_rand_r`
fails because results are now different.

I see several alternatives to remove the warning while
having tests pass

  - prefered solution: adapt the test suite using the
    new results so that all tests pass and ackowledge
    the change of behavior for impacted user-facing APIs
    in the changelog
  - accept the quirk of this implementation but hardcode
    and rename the effective constant
  - silent the -Woverflow warning by another mean

Relates to:
scikit-learn#13422
scikit-learn#24895
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Nov 14, 2022
This removes the -Woverflow warnings observed
when building scikit-learn.

RAND_R_MAX is the max value for uint8, incrementing
it causes an overflow (hence the warning).

Elements were originally mentioned but seem to
have been left unreviewed, see:
scikit-learn#13422 (comment)

I think this commit fixes the implementation, yet
I comes with a backwards incompatible results and
tests for implementation relying on `our_rand_r`
fails because results are now different.

I see several alternatives to remove the warning while
having tests pass

  - preferred solution: adapt the test suite using the
    new results so that all tests pass and acknowledge
    the change of behavior for impacted user-facing APIs
    in the change-log
  - accept the quirk of this implementation but hardcode
    and rename the effective constant
  - silent the -Woverflow warning by another mean

Relates to:
scikit-learn#13422
scikit-learn#24895
@jjerphan
Copy link
Member

#24919 has been created to treat -Woverflow separately.

@OmarManzoor OmarManzoor requested review from jjerphan and removed request for glemaitre November 15, 2022 09:01
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@OmarManzoor
Copy link
Contributor Author

@jjerphan, @glemaitre
Similar to the sgd PR I also changed the max_iter parameter to an unsigned int over here. I also noted that we are returning np.asarray(w) in the functions. I was thinking would it make sense to replace this with w.base?

@jjerphan
Copy link
Member

Thanks for mentioning that. I think we better remove np.asarray calls (and cnp.ndarray) in the PR removing the warning due to the use of the deprecated NumPy API (via Cython) (-Wcpp).

@jjerphan
Copy link
Member

@OmarManzoor: I meant that it's better to integrate changes made in 5e66af3 in another PR (one removing -Wcpp warning).

@OmarManzoor
Copy link
Contributor Author

@OmarManzoor: I meant that it's better to integrate changes made in 5e66af3 in another PR (one removing -Wcpp warning).

Oh I understand. However I don't think we will be able to remove the -Wcpp warning from this module completely because the const fused types complication. I think most of the other applicable instances already use memory views?

@jjerphan
Copy link
Member

We can try to use fused-typed memoryview which aren't const-qualified.

@glemaitre glemaitre self-requested a review November 17, 2022 09:52
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@glemaitre glemaitre merged commit b9137b4 into scikit-learn:main Nov 17, 2022
@OmarManzoor OmarManzoor deleted the cython_cd_fast branch November 17, 2022 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants