Skip to content

[MRG] FIX index overflow error in sparse matrix polynomial expansion #16831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jianlingzhong
Copy link

Reference Issues/PRs

Fixes #16803

What does this implement/fix? Explain your changes.

Change index type from np.int32_ to a fused type and explicitly cast indices to np.int64 before passed into the polynomial expansion function.

Any other comments?

# use np.int64 for index datatype to prevent overflow
# in case X has a large dimension
Xp_next = _csr_polynomial_expansion(X.data,
X.indices.astype(np.int64),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of error do you get when using the fused type without casting to np.int64?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my case the overflow error still occurs if no casting was used, even when the type is a fused type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A self-contained example code would make it easier to reproduce your issue. May you expand on the code snippet in #16803 (comment) with code to create x as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasjpfan I have updated the code snippet in #16803 to be self-contained and can reproduce this issue.

@cmarmo
Copy link
Contributor

cmarmo commented Jun 12, 2020

Hi @jianlingzhong, thanks for your work. Some checks didn't pass and there are conflicts with the upstream version. Do you mind having a look and fixing those issues? Thanks!

Base automatically changed from master to main January 22, 2021 10:52
@wdevazelhes
Copy link
Contributor

Hi @jianlingzhong , are you still working on this ? (Having this PR merged would help resolve one of the tests for sparse inputs I'm working on cf. #13246 (comment)) If not, I could give it a shot, by just resolving conflicts, however I'm not familiar with cython, so I could try to reproduce the overflow error you talk about, but I don't know how hard it is to solve it...

@cmarmo cmarmo added the Stalled label Mar 1, 2021
@jianlingzhong
Copy link
Author

@wdevazelhes Hey sorry I was not able to spend time on this at moment. Feel free to give it a shot.

wdevazelhes added a commit to wdevazelhes/scikit-learn that referenced this pull request Mar 14, 2021
@wdevazelhes
Copy link
Contributor

Thanks @jianlingzhong , I'll give it a try !

@cmarmo cmarmo added the Superseded PR has been replace by a newer PR label Mar 14, 2021
@thomasjpfan
Copy link
Member

Closing because the work is being continued at #19676

@thomasjpfan thomasjpfan closed this Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:preprocessing Stalled Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

index type np.int32_t causes issue in _csr_polynomial_expansion
4 participants