Skip to content

Updating PolynomialFeatures.Transform docstring #13755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 1, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 14 additions & 10 deletions sklearn/preprocessing/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -1475,17 +1475,21 @@ def transform(self, X):

Parameters
----------
X : array-like or sparse matrix, shape [n_samples, n_features]
X : array-like or CSR/CSC sparse matrix, shape [n_samples, n_features]
The data to transform, row by row.
Sparse input should preferably be in CSR format (for speed),
but must be in CSC format if the degree is 4 or higher.

If the input matrix is in CSR format and the expansion is of
degree 2 or 3, the method described in the work "Leveraging
Sparsity to Speed Up Polynomial Feature Expansions of CSR
Matrices Using K-Simplex Numbers" by Andrew Nystrom and
John Hughes is used, which is much faster than the method
used on CSC input.

Prefer CSR over CSC for sparse input (for speed), but CSC is
required if the degree is 4 or higher. If the degree is less than
4 and the input format is CSC, it will be converted to CSR, have
its polynomial features generated, then converted back to CSC.

If the degree is 2 or 3, the method described in "Leveraging
Sparsity to Speed Up Polynomial Feature Expansions of CSR Matrices
Using K-Simplex Numbers" by Andrew Nystrom and John Hughes is
used, which is much faster than the method used on CSC input. For
this reason, a CSC input will be converted to CSR, and the output
will be converted back to CSC prior to being returned, hence the
preference of CSR.

Returns
-------
Expand Down