-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Fixes #13173, implements faster polynomial features for dense matrices #13290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you add the benchmark script that you used in the benchmarks folder of the repo and paste here the graph that it outputs. |
X-axis is number of observations
1 2 True 1 10 C 0.000299 0.000100 |
The benchmark script needs a couple of quick fixes:
|
Can you please do a benchmark for a larger number of features? The new code is significantly more complex than the old one so I would like to see case where the new code is significantly faster than the old code on a case that lasts more than 10s. |
Arguably it might still be interesting to be significantly faster on a small number of samples at prediction time to reduce the single sample prediction latency but I am not sure that a change from 0.3ms to 0.1ms is important enough to justify the increase in code complexity. |
Full results: https://github.com/sdpython/_benchmarks/tree/master/scikit-learn/results You are probably right about the speed up. However, the new version is a better inspiration for anybody who needs to implement a runtime for fast predictions. That was also one of my objectives by proposing this change. I don't know if many people search for numerical recipes inside scikit-learn code. I do that sometimes. |
Thank you for the additional benchmark results. So indeed for a large number of features the difference starts to be significant. Maybe this code a good candidate for cythonization but I am not saying we should wait for cythonization to merge this PR. I would like to have other people opinions in the complexity vs performance tradeoff of this PR. |
sklearn/preprocessing/data.py
Outdated
# the matrix first to multiply contiguous memory segments. | ||
transpose = X.size <= 1e7 and self.order == 'C' | ||
n = X.shape[1] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the size is larger, is transposing slower or equivalent ?
And, when the matrix is small does the transposing bring a significant part of the speed up ?
My point is that the transpose switch makes the code twice as long, and I'm wondering if we could always do one or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are trying to evaluate whether the speedups in this PR are important
or not for application (because they are a constant offset, and not a
proportional benefit.
We came up with a potentially useful situation: in an online learning
setting, where the polynomial features are applied on minibatches
(typically of size 100), which are then fed into an SGDClassifier with
"partial_fit".
The question is then: what is the relative cost of on call to the
partial_fit of SGDClassifier versus the cost of the transform of the
polynomial features.
Could you run a benchmark on such an example?
Thanks!
|
Hi, I created a benchmark which tests the four following functions:
It tests the combination of PolynomialFeatures + SGDClassifier. On the following graph: SGD = SGDClassifier only, SGD-SKL = SGD + POLY 0.20.2 , SGD-FAST is the new implementation, SGD-SLOW is 0.20.2 but implemented in the bench to make sure I'm not sure my branch to make the test. Is that what you expected? |
This is really confusing. Can you please rephrase? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments:
I added the second benchmark to the sources:
|
I hate to say, but I find the benchmark code very hard to read. It's very factored and generic. I would personally write much simpler code, even if it repeats a bit more. |
Both benchmarks I assume? |
I updated the second benchmark and the code. Below what I measure written in a different way.
|
In line of the benchmark results (and the relative simplicity of the code) I am +1 for merging this change but I don't like the complexity of the benchmark scripts either. So I would suggest to just remove the benchmark script for this PR. |
The last benchmark #13290 (comment) seems to show that the fast version is only really much faster when the current one is already pretty fast. The improvement isn't as clear on the right side of the x axis. @sdpython , do you have a specific use-case for this? I agree with the others that we need to make sure we are addressing an actual problem here, to justify the complexity. |
One use case is to decrease the latency of individual predictions in a production deployment (the red dots on the right hand side figure). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just made a few comment, mostly cosmetics.
Besides that, LGTM
sklearn/preprocessing/data.py
Outdated
else: | ||
new_index = [] | ||
end = index[-1] | ||
for feature_idx in range(0, n_features): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
range(n_features)
is enough.
sklearn/preprocessing/data.py
Outdated
for feature_idx in range(0, n_features): | ||
a = index[feature_idx] | ||
new_index.append(current_col) | ||
start = a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that you don't need a
. Simply write start = index[feature_idx]
.
sklearn/preprocessing/data.py
Outdated
np.multiply(XP[:, start:end], | ||
X[:, feature_idx:feature_idx + 1], | ||
out=XP[:, current_col:next_col], | ||
where=True, casting='no') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where=True
is the default so is not necessary
sklearn/preprocessing/data.py
Outdated
next_col = current_col + end - start | ||
if next_col <= current_col: | ||
break | ||
np.multiply(XP[:, start:end], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of breaking loops but ok.
sklearn/preprocessing/data.py
Outdated
@@ -1,10 +1,12 @@ | |||
# coding: utf-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated to this PR. Just looked at other files in sklearn and it's sometimes there and sometimes not. It seems there's no strong convention about that.
sklearn/preprocessing/data.py
Outdated
# Authors: Alexandre Gramfort <alexandre.gramfort@inria.fr> | ||
# Mathieu Blondel <mathieu@mblondel.org> | ||
# Olivier Grisel <olivier.grisel@ensta.org> | ||
# Andreas Mueller <amueller@ais.uni-bonn.de> | ||
# Eric Martin <eric@ericmart.in> | ||
# Giorgio Patrini <giorgio.patrini@anu.edu.au> | ||
# Eric Chang <ericchang2017@u.northwestern.edu> | ||
# Xavier Dupré <xadupre@microsoft.com> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these lists are not maintained any more :)
I modified the code based on the comment. The build is failing due to unrelated changes (as others PR). |
Do I need to do new changes to the PR? |
sklearn/preprocessing/data.py
Outdated
|
||
current_col = 1 if self.include_bias else 0 | ||
for d in range(0, self.degree): | ||
if d == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this before the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this deserves some small readability improvements. Otherwise LGTM ;)
sklearn/preprocessing/data.py
Outdated
index.append(current_col) | ||
|
||
# d >= 1 | ||
for d in range(1, self.degree): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for d in range(1, self.degree): | |
for _ in range(1, self.degree): |
index = list(range(current_col, | ||
current_col + n_features)) | ||
current_col += n_features | ||
index.append(current_col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think somewhere you should describe what the index
variable is for.
for i, comb in enumerate(combinations): | ||
XP[:, i] = X[:, comb].prod(1) | ||
|
||
# What follows is a faster implementation of: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment I don't find the algorithm easy to read. I think if you explicitly say "dynamic programming" it might be clearer that your algorithm is building subsequent columns from precomputed partial solutions.
new_index = [] | ||
end = index[-1] | ||
for feature_idx in range(n_features): | ||
start = index[feature_idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment like: # XP[start:end] are terms of degree d - 1 that exclude feature #feature_idx
sklearn/preprocessing/data.py
Outdated
@@ -1548,6 +1548,15 @@ def transform(self, X): | |||
# What follows is a faster implementation of: | |||
# for i, comb in enumerate(combinations): | |||
# XP[:, i] = X[:, comb].prod(1) | |||
# This new implementation uses two optimisations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "new" belongs in merged code :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# This new implementation uses two optimisations. | |
# This implementation uses two optimisations. |
Looks like there is a +2..+3 for merging on this PR, and most of the comments were addressed (except for the last minor wording improvement)? Is someone among the reviewers willing to merge it then? |
Thanks @sdpython! |
Reference Issues/PRs
Fixes #13173.
What does this implement/fix? Explain your changes.
This PR implements a faster version of polynomial features for dense matrices. Instead of computing each features independently by going through all combinations, it broadcasts as much as possible the multiple by one vector. The speed up is significant (3 times faster) for matrices with less than 10.000 rows.
Any other comments?
More details about speed up can be found here: https://github.com/sdpython/mlinsights/blob/master/_doc/notebooks/sklearn/faster_polynomial_features.ipynb.