Skip to content

MAINT Replace unsigned char with uint8_t in codebase #29588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
9 tasks
adam2392 opened this issue Jul 30, 2024 · 6 comments · Fixed by #29597
Closed
9 tasks

MAINT Replace unsigned char with uint8_t in codebase #29588

adam2392 opened this issue Jul 30, 2024 · 6 comments · Fixed by #29597
Labels
cython good first issue Easy with clear instructions to resolve help wanted

Comments

@adam2392
Copy link
Member

adam2392 commented Jul 30, 2024

In #25572, we defined typedefs for commonly used Cython types throughout the codebase. Running grep -rl "unsigned char" ./sklearn, I found the following files contain unsigned char, which could be replaced with uint8_t from https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_typedefs.pxd.

  • ./sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors_classmode.pyx.tp
  • ./sklearn/ensemble/_hist_gradient_boosting/common.pxd
  • ./sklearn/ensemble/_hist_gradient_boosting/_bitset.pyx
  • ./sklearn/ensemble/_hist_gradient_boosting/_predictor.pyx
  • ./sklearn/ensemble/_hist_gradient_boosting/splitting.pyx
  • ./sklearn/ensemble/_hist_gradient_boosting/histogram.pyx
  • ./sklearn/ensemble/_hist_gradient_boosting/_binning.pyx
  • ./sklearn/ensemble/_hist_gradient_boosting/_bitset.pxd
  • ./sklearn/linear_model/_sgd_fast.pyx.tp

Is there any reason to leave these as unsigned char vs replacing them with the uint8_t to consolidate our types?

To address this issue, it would be best to open up a PR one-by-one and implement the said change and cimport statement, and then link to the issue here.

@adam2392 adam2392 added good first issue Easy with clear instructions to resolve help wanted cython labels Jul 30, 2024
@ogrisel
Copy link
Member

ogrisel commented Jul 31, 2024

Is there any reason to leave these as unsigned char vs replacing them with the uint8_t to consolidate our types?

I see no reason and I find uint8_t more explicit and therefore easier to reason about the range of admissible values [0-255]. So +1 for the change on my end.

@norgera
Copy link
Contributor

norgera commented Jul 31, 2024

Hello, I've changed the first file according to: "It would be best to open up a PR one-by-one and implement the said change and cimport statement, and then link to the issue here."

This is my first time contributing, let me know if there are any issues. Thanks

@adam2392
Copy link
Member Author

@norgera you can change it all at once actually. I noticed the diff shouldn't be too big.

@adam2392
Copy link
Member Author

@norgera
Copy link
Contributor

norgera commented Jul 31, 2024

Yes I realized that quickly, thank you for the clarification. I will change the rest of the files now

@norgera
Copy link
Contributor

norgera commented Jul 31, 2024

@adam2392 I believe I have made the necessary changes. Let me know if there's anything I missed

Sorry making 2 pr's and multiple commits . I am new to open source and have learned a lot from this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cython good first issue Easy with clear instructions to resolve help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants