Skip to content

[MRG] FIX index overflow error in sparse matrix polynomial expansion #16831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions sklearn/preprocessing/_csr_polynomial_expansion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ from scipy.sparse import csr_matrix
from numpy cimport ndarray
cimport numpy as np

ctypedef np.int32_t INDEX_T
ctypedef fused INDEX_T:
np.int32_t
np.int64_t

ctypedef fused DATA_T:
np.float32_t
Expand Down Expand Up @@ -119,7 +121,7 @@ def _csr_polynomial_expansion(ndarray[DATA_T, ndim=1] data,

cdef INDEX_T expanded_index = 0, row_starts, row_ends, i, j, k, \
i_ptr, j_ptr, k_ptr, num_cols_in_row, \
expanded_column
expanded_column, col

with nogil:
expanded_indptr[0] = indptr[0]
Expand Down
8 changes: 6 additions & 2 deletions sklearn/preprocessing/_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -1560,8 +1560,12 @@ def transform(self, X):
to_stack.append(np.ones(shape=(n_samples, 1), dtype=X.dtype))
to_stack.append(X)
for deg in range(2, self.degree+1):
Xp_next = _csr_polynomial_expansion(X.data, X.indices,
X.indptr, X.shape[1],
# use np.int64 for index datatype to prevent overflow
# in case X has a large dimension
Xp_next = _csr_polynomial_expansion(X.data,
X.indices.astype(np.int64),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of error do you get when using the fused type without casting to np.int64?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my case the overflow error still occurs if no casting was used, even when the type is a fused type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A self-contained example code would make it easier to reproduce your issue. May you expand on the code snippet in #16803 (comment) with code to create x as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasjpfan I have updated the code snippet in #16803 to be self-contained and can reproduce this issue.

X.indptr.astype(np.int64),
np.int64(X.shape[1]),
self.interaction_only,
deg)
if Xp_next is None:
Expand Down