-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Fix MissingIndicator explicit zeros & output shape #13562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Fix MissingIndicator explicit zeros & output shape #13562
Conversation
@@ -1144,26 +1144,30 @@ def _get_missing_features_info(self, X): | |||
imputer_mask = sparse_constructor( | |||
(mask, X.indices.copy(), X.indptr.copy()), | |||
shape=X.shape, dtype=bool) | |||
imputer_mask.eliminate_zeros() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix is correct and it's already an improvement, but we're still building this potentially huge imputer_mask
matrix (with a lot of explicit zeros).
Do you think it'd be possible to directly build it correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presumably you'd just use imputer_mask = sparse_constructor((mask[mask], X.indices[mask], X.indptr[mask]), shape=...)
or similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not so simple. mask[mask]
and indices[mask]
are correct but X.indpr
needs to be updating in a non trivial way (essentially what eliminate_zeros
does).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Since the intermediate imputer_mask
can only be as big as X
, maybe it's not that much of an issue after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise lgtm
@@ -1144,26 +1144,30 @@ def _get_missing_features_info(self, X): | |||
imputer_mask = sparse_constructor( | |||
(mask, X.indices.copy(), X.indptr.copy()), | |||
shape=X.shape, dtype=bool) | |||
imputer_mask.eliminate_zeros() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presumably you'd just use imputer_mask = sparse_constructor((mask[mask], X.indices[mask], X.indptr[mask]), shape=...)
or similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still has a typo but LGTM
Thanks @jeremiedbb |
…arn#13562)" This reverts commit ed50381.
…arn#13562)" This reverts commit ed50381.
Fixes 2 bugs in
MissingIndicator
when X is sparse all non-zero non missing values would become explicit False in the transformed mask.
when the are no missing values at all and
features='missing-only'
, the transformed mask would contain all features instead of none.