Skip to content

Faster PolynomialFeatures #13173

@xadupre

Description

@xadupre

The current implementation of PolynomialFeatures could be faster. The gain is significant for matrices with less than 10.000 rows, twice faster for less than 100 rows. This is for dense matrices, not measured for sparse.

Description

PolynomialFeatures independently computes every feature but it is possible to broadcast some multiplications to be faster. Code is below. It produces the same results.

Steps/Code to Reproduce

XP = numpy.empty(
    (X.shape[0], self.n_output_features_), dtype=X.dtype)

def multiply(A, B):
    return numpy.multiply(A, B)

XP[:, 0] = 1
pos = 1
n = X.shape[1]
for d in range(0, self.poly_degree):
    if d == 0:
        XP[:, pos:pos + n] = X
        index = list(range(pos, pos + n))
        pos += n
        index.append(pos)
    else:
        new_index = []
        end = index[-1]
        for i in range(0, n):
            a = index[i]
            new_index.append(pos)
            new_pos = pos + end - a
            XP[:, pos:new_pos] = multiply(XP[:, a:end], X[:, i:i + 1])
            pos = new_pos

        new_index.append(pos)
        index = new_index

Full code is here: https://github.com/sdpython/mlinsights/blob/master/src/mlinsights/mlmodel/extended_features.py#L160.

Expected Results

same as before but faster

Versions

0.20.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions