-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX: normalizer l_inf should take maximum of absolute values #16633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Please add a non regression test (e.g. adapting the example you provided in the issue) to sklearn/preprocessing/tests/test_preprocessing.py
.
Also please add an entry to the change log at doc/whats_new/v0.23.rst
. Like the other entries there, please reference this pull request with :pr:
and credit yourself (and other contributors if applicable) with :user:
.
@rth I fixed what you suggested. To be honest, it is my first contribution, so let me know if I did something wrong! Any feedback is more than welcome 😉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One remaining comment otherwise LGTM. Thanks @Maupin1991 !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be max(abs(X)) or abs(max(X))?
abs(max(X)) as is now makes fewer copies and element operations, I think, and is faster,
|
By definition, it should be the maximum absolute value, not the abs value of the max! Good catch @jnothman ! I'm fixing that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me that the intention of norm='max' is equivalent to norm='Linf'. Though I can see why having a negative norm is unhelpful. Do you think it would be better to raise an error if the norm is negative and provide the Linf option separately? Or if we keep this as proposed we need to update the parameter documentation to clarify that it's the maximum absolute value being used
sklearn/preprocessing/_data.py
Outdated
@@ -1728,7 +1728,7 @@ def normalize(X, norm='l2', axis=1, copy=True, return_norm=False): | |||
elif norm == 'l2': | |||
norms = row_norms(X) | |||
elif norm == 'max': | |||
norms = np.max(X, axis=1) | |||
norms = np.max(np.abs(X), axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead avoid a copy with something like np.maximum(abs(np.min(X, axis=1)), np.max(X, axis=1))
? Or is that too crazy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, for vectors, max norm and infinity norm are equal and correspond to taking the maximum of the absolute values: https://en.wikipedia.org/wiki/Norm_(mathematics)#Maximum_norm_(special_case_of:_infinity_norm,_uniform_norm,_or_supremum_norm)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point here is that abs(X)
copies X and transforms it. For large data, np.maximum(abs(np.min(X, axis=1)), np.max(X, axis=1))
would be more memory efficient but give the same result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you that avoiding a copy is worth here, I was just answering to your question on norms on top of this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for now we have kept it copying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my bad. Should be good now.
Renaming this parameter to I can't think of a practical application where one would want to have Also I think a norm can't be negative by definition https://math.stackexchange.com/a/1901893 while with the current
|
improvement: more efficient computation of the max norm Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>
sklearn/preprocessing/_data.py
Outdated
@@ -1728,7 +1728,7 @@ def normalize(X, norm='l2', axis=1, copy=True, return_norm=False): | |||
elif norm == 'l2': | |||
norms = row_norms(X) | |||
elif norm == 'max': | |||
norms = np.max(X, axis=1) | |||
norms = np.max(np.abs(X), axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for now we have kept it copying?
Can we repurpose the MaxAbsScaler here? |
If I am not mistaken, the MaxAbsScaler works per-feature, while the Normalizer works per-sample. I would keep this behavior separate, even though I agree that some code would be inevitably replicated. What is your suggestion @jnothman? Do you mean we should treat the cases as the same, applying the normalization in the transpose? |
I mean you can apply it to the transposed X |
I'm still unsure about whether we should consider making this more efficient. But let's merge as is for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this isn't ready for merge yet. Please update the docstring of norm
to make the behaviour clearer.
Thanks! |
Reference Issues/PRs
Fixes #16632
What does this implement/fix? Explain your changes.
Takes the max of the absolute values as a norm before normalizing the vectors.
See here the norm definition.