-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Labels
Description
The current implementation of PolynomialFeatures could be faster. The gain is significant for matrices with less than 10.000 rows, twice faster for less than 100 rows. This is for dense matrices, not measured for sparse.
Description
PolynomialFeatures independently computes every feature but it is possible to broadcast some multiplications to be faster. Code is below. It produces the same results.
Steps/Code to Reproduce
XP = numpy.empty(
(X.shape[0], self.n_output_features_), dtype=X.dtype)
def multiply(A, B):
return numpy.multiply(A, B)
XP[:, 0] = 1
pos = 1
n = X.shape[1]
for d in range(0, self.poly_degree):
if d == 0:
XP[:, pos:pos + n] = X
index = list(range(pos, pos + n))
pos += n
index.append(pos)
else:
new_index = []
end = index[-1]
for i in range(0, n):
a = index[i]
new_index.append(pos)
new_pos = pos + end - a
XP[:, pos:new_pos] = multiply(XP[:, a:end], X[:, i:i + 1])
pos = new_pos
new_index.append(pos)
index = new_index
Full code is here: https://github.com/sdpython/mlinsights/blob/master/src/mlinsights/mlmodel/extended_features.py#L160.
Expected Results
same as before but faster
Versions
0.20.2