Skip to content

Conversation

sdpython
Copy link
Contributor

Reference Issues/PRs

Fixes #13173.

What does this implement/fix? Explain your changes.

This PR implements a faster version of polynomial features for dense matrices. Instead of computing each features independently by going through all combinations, it broadcasts as much as possible the multiple by one vector. The speed up is significant (3 times faster) for matrices with less than 10.000 rows.

Any other comments?

More details about speed up can be found here: https://github.com/sdpython/mlinsights/blob/master/_doc/notebooks/sklearn/faster_polynomial_features.ipynb.

@GaelVaroquaux
Copy link
Member

Can you add the benchmark script that you used in the benchmarks folder of the repo and paste here the graph that it outputs.

@sdpython
Copy link
Contributor Author

sdpython commented Feb 27, 2019

image

X-axis is number of observations
Y-axis is time in seconds

degree  interaction_only      n  nfeat order  time_0_20_2  time_current

1 2 True 1 10 C 0.000299 0.000100
25 2 True 10 10 C 0.000399 0.000100
49 2 True 100 10 C 0.000399 0.000100
73 2 True 1000 10 C 0.000898 0.000299
97 2 True 10000 10 C 0.007676 0.006383
degree interaction_only n nfeat order time_0_20_2 time_current
13 2 True 1 20 C 0.001596 0.000100
37 2 True 10 20 C 0.001396 0.000199
61 2 True 100 20 C 0.001496 0.000100
85 2 True 1000 20 C 0.004189 0.002495
109 2 True 10000 20 C 0.031014 0.031117

@ogrisel
Copy link
Member

ogrisel commented Feb 27, 2019

The benchmark script needs a couple of quick fixes:

benchmarks/bench_plot_polynomial_features.py:10:1: F401 'sys' imported but unused
import sys
^
benchmarks/bench_plot_polynomial_features.py:11:1: F401 'warnings' imported but unused
import warnings
^
benchmarks/bench_plot_polynomial_features.py:12:1: F401 'numbers' imported but unused
import numbers
^
benchmarks/bench_plot_polynomial_features.py:24:1: F401 'sklearn.utils.testing.ignore_warnings' imported but unused
from sklearn.utils.testing import ignore_warnings
^
benchmarks/bench_plot_polynomial_features.py:87:80: E501 line too long (94 > 79 characters)
                                                              degree, interaction_only, order)
                                                                               ^
benchmarks/bench_plot_polynomial_features.py:89:80: E501 line too long (87 > 79 characters)
                                                       degree, interaction_only, order)
                                                                               ^
benchmarks/bench_plot_polynomial_features.py:105:38: E261 at least two spaces before inline comment
                                break # stops if longer than a second

@ogrisel
Copy link
Member

ogrisel commented Feb 27, 2019

Can you please do a benchmark for a larger number of features? The new code is significantly more complex than the old one so I would like to see case where the new code is significantly faster than the old code on a case that lasts more than 10s.

@ogrisel
Copy link
Member

ogrisel commented Feb 27, 2019

Arguably it might still be interesting to be significantly faster on a small number of samples at prediction time to reduce the single sample prediction latency but I am not sure that a change from 0.3ms to 0.1ms is important enough to justify the increase in code complexity.

@sdpython
Copy link
Contributor Author

image

image

Full results: https://github.com/sdpython/_benchmarks/tree/master/scikit-learn/results

You are probably right about the speed up. However, the new version is a better inspiration for anybody who needs to implement a runtime for fast predictions. That was also one of my objectives by proposing this change. I don't know if many people search for numerical recipes inside scikit-learn code. I do that sometimes.

@ogrisel
Copy link
Member

ogrisel commented Feb 28, 2019

Thank you for the additional benchmark results. So indeed for a large number of features the difference starts to be significant. Maybe this code a good candidate for cythonization but I am not saying we should wait for cythonization to merge this PR.

I would like to have other people opinions in the complexity vs performance tradeoff of this PR.

@sdpython
Copy link
Contributor Author

sdpython commented Feb 28, 2019

I just made a fix and remove one data copy (no copy of np.multiply result) and the improvment is significant even for 100.000 rows.

image

Basically, the implementation is almost twice faster than the current one for matrices with 100.000 rows.

# the matrix first to multiply contiguous memory segments.
transpose = X.size <= 1e7 and self.order == 'C'
n = X.shape[1]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the size is larger, is transposing slower or equivalent ?
And, when the matrix is small does the transposing bring a significant part of the speed up ?

My point is that the transpose switch makes the code twice as long, and I'm wondering if we could always do one or the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

I remove the transpose and it seemed better. So i'll keep it. I added it because it was better before I removed one unnecessary copy this morning. transpose is inefficient when order=='F' so the only option to simplify is to remove it. Updating the PR.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Feb 28, 2019 via email

@sdpython
Copy link
Contributor Author

sdpython commented Feb 28, 2019

Hi,

I created a benchmark which tests the four following functions:

def fcts_model(X, y):
    
    model1 = SGDClassifier()
    model2 = make_pipeline(PolynomialFeatures(), SGDClassifier())
    model3 = make_pipeline(ExtendedFeatures(kind='poly'), SGDClassifier())
    model4 = make_pipeline(ExtendedFeatures(kind='poly-slow'), SGDClassifier())

    model1.fit(PolynomialFeatures().fit_transform(X), y)
    model2.fit(X, y)
    model3.fit(X, y)
    model4.fit(X, y)
    
    def partial_fit_model1(X, y, model=model1):
        return model.partial_fit(X, y)
    
    def partial_fit_model2(X, y, model=model2):
        X2 = model.steps[0][1].transform(X)
        return model.steps[1][1].partial_fit(X2, y)
    
    def partial_fit_model3(X, y, model=model3):
        X2 = model.steps[0][1].transform(X)
        return model.steps[1][1].partial_fit(X2, y)
    
    def partial_fit_model4(X, y, model=model4):
        X2 = model.steps[0][1].transform(X)
        return model.steps[1][1].partial_fit(X2, y)

    return partial_fit_model1, partial_fit_model2, partial_fit_model3, partial_fit_model4

It tests the combination of PolynomialFeatures + SGDClassifier. On the following graph: SGD = SGDClassifier only, SGD-SKL = SGD + POLY 0.20.2 , SGD-FAST is the new implementation, SGD-SLOW is 0.20.2 but implemented in the bench to make sure I'm not sure my branch to make the test. Is that what you expected?

image

@ogrisel
Copy link
Member

ogrisel commented Feb 28, 2019

It tests the combination of PolynomialFeatures + SGDClassifier. On the following graph: SGD = SGDClassifier only, SGD-SKL = SGD + POLY 0.20.2 , SGD-FAST is the new implementation, SGD-SLOW is 0.20.2 but implemented in the bench to make sure I'm not sure my branch to make the test.

This is really confusing. Can you please rephrase?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments:

@sdpython
Copy link
Contributor Author

sdpython commented Mar 1, 2019

I added the second benchmark to the sources:

  • SGD=SGDClasifier only
  • SGD-SKL=PolynomialFeatures from scikit-learn (no matter what it is)
  • SGD-FAST=new implementation copy-pasted in the benchmark source file
  • SGD-SLOW=implementation of 0.20.2 copy-pasted in the benchmark source file

@GaelVaroquaux
Copy link
Member

I hate to say, but I find the benchmark code very hard to read. It's very factored and generic. I would personally write much simpler code, even if it repeats a bit more.

@sdpython
Copy link
Contributor Author

sdpython commented Mar 2, 2019

Both benchmarks I assume?

@sdpython
Copy link
Contributor Author

sdpython commented Mar 3, 2019

I updated the second benchmark and the code. Below what I measure written in a different way.

def measure_sgd_only(X_train, y_train, Xs, ys):
    model = SGDClassifier()
    model.fit(PolynomialFeatures().fit_transform(X_train), y_train)
    Xs = [PolynomialFeatures().fit_transform(x) for x in Xs]
    st = time()
    for X, y in zip(Xs, ys):
        model.partial_fit(X, y)
    end = time()
    return end - st


def measure_sgd_poly_skl(X_train, y_train, Xs, ys):
    model = make_pipeline(PolynomialFeatures(), SGDClassifier())
    model.fit(X_train, y_train)
    st = time()
    for X, y in zip(Xs, ys):
        X2 = model.steps[0][1].transform(X)
        model.steps[1][1].partial_fit(X2, y)
    end = time()
    return end - st


def measure_sgd_poly_fast(X_train, y_train, Xs, ys):
    model = make_pipeline(CustomPolynomialFeatures(kind='poly-fast'),
                          SGDClassifier())
    model.fit(X_train, y_train)
    st = time()
    for X, y in zip(Xs, ys):
        X2 = model.steps[0][1].transform(X)
        model.steps[1][1].partial_fit(X2, y)
    end = time()
    return end - st


def measure_sgd_poly_slow(X_train, y_train, Xs, ys):
    model = make_pipeline(CustomPolynomialFeatures(kind='poly-slow'),
                          SGDClassifier())
    model.fit(X_train, y_train)
    st = time()
    for X, y in zip(Xs, ys):
        X2 = model.steps[0][1].transform(X)
        model.steps[1][1].partial_fit(X2, y)
    end = time()
    return end - st

@sdpython
Copy link
Contributor Author

sdpython commented Mar 3, 2019

I was looking for a way to show how the improvment works. The following graph plots the ratio (new execution time / execution time of 0.20.2) accross all configurations. This shows how it moves against the number of features and observations.

image

@ogrisel
Copy link
Member

ogrisel commented Apr 19, 2019

In line of the benchmark results (and the relative simplicity of the code) I am +1 for merging this change but I don't like the complexity of the benchmark scripts either.

So I would suggest to just remove the benchmark script for this PR.

@NicolasHug
Copy link
Member

The last benchmark #13290 (comment) seems to show that the fast version is only really much faster when the current one is already pretty fast. The improvement isn't as clear on the right side of the x axis.

@sdpython , do you have a specific use-case for this? I agree with the others that we need to make sure we are addressing an actual problem here, to justify the complexity.

@ogrisel
Copy link
Member

ogrisel commented May 10, 2019

One use case is to decrease the latency of individual predictions in a production deployment (the red dots on the right hand side figure).

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made a few comment, mostly cosmetics.
Besides that, LGTM

else:
new_index = []
end = index[-1]
for feature_idx in range(0, n_features):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

range(n_features) is enough.

for feature_idx in range(0, n_features):
a = index[feature_idx]
new_index.append(current_col)
start = a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that you don't need a. Simply write start = index[feature_idx].

np.multiply(XP[:, start:end],
X[:, feature_idx:feature_idx + 1],
out=XP[:, current_col:next_col],
where=True, casting='no')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where=True is the default so is not necessary

next_col = current_col + end - start
if next_col <= current_col:
break
np.multiply(XP[:, start:end],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of breaking loops but ok.

@@ -1,10 +1,12 @@
# coding: utf-8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this PR. Just looked at other files in sklearn and it's sometimes there and sometimes not. It seems there's no strong convention about that.

# Authors: Alexandre Gramfort <alexandre.gramfort@inria.fr>
# Mathieu Blondel <mathieu@mblondel.org>
# Olivier Grisel <olivier.grisel@ensta.org>
# Andreas Mueller <amueller@ais.uni-bonn.de>
# Eric Martin <eric@ericmart.in>
# Giorgio Patrini <giorgio.patrini@anu.edu.au>
# Eric Chang <ericchang2017@u.northwestern.edu>
# Xavier Dupré <xadupre@microsoft.com>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these lists are not maintained any more :)

@sdpython
Copy link
Contributor Author

I modified the code based on the comment. The build is failing due to unrelated changes (as others PR).

@sdpython
Copy link
Contributor Author

sdpython commented Jun 7, 2019

Do I need to do new changes to the PR?


current_col = 1 if self.include_bias else 0
for d in range(0, self.degree):
if d == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this before the loop?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this deserves some small readability improvements. Otherwise LGTM ;)

index.append(current_col)

# d >= 1
for d in range(1, self.degree):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for d in range(1, self.degree):
for _ in range(1, self.degree):

index = list(range(current_col,
current_col + n_features))
current_col += n_features
index.append(current_col)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think somewhere you should describe what the index variable is for.

for i, comb in enumerate(combinations):
XP[:, i] = X[:, comb].prod(1)

# What follows is a faster implementation of:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment I don't find the algorithm easy to read. I think if you explicitly say "dynamic programming" it might be clearer that your algorithm is building subsequent columns from precomputed partial solutions.

new_index = []
end = index[-1]
for feature_idx in range(n_features):
start = index[feature_idx]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment like: # XP[start:end] are terms of degree d - 1 that exclude feature #feature_idx

@@ -1548,6 +1548,15 @@ def transform(self, X):
# What follows is a faster implementation of:
# for i, comb in enumerate(combinations):
# XP[:, i] = X[:, comb].prod(1)
# This new implementation uses two optimisations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "new" belongs in merged code :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# This new implementation uses two optimisations.
# This implementation uses two optimisations.

@rth
Copy link
Member

rth commented Jul 25, 2019

Looks like there is a +2..+3 for merging on this PR, and most of the comments were addressed (except for the last minor wording improvement)?

Is someone among the reviewers willing to merge it then?

@jnothman jnothman merged commit ce869d8 into scikit-learn:master Jul 29, 2019
@jnothman
Copy link
Member

Thanks @sdpython!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Faster PolynomialFeatures
8 participants