Skip to content

SGD classification unnecessarily slow #6186

@piskvorky

Description

@piskvorky

Multiclass prediction using SGD on sparse data is unnecessarily slow. It seems to be copying large arrays on every call to predict, predict_proba etc, which kills its performance.

I think the main culprit is safe_sparse_dot, which uses scipy's "CSR * dense" routine:

def safe_sparse_dot(a, b, dense_output=False):
    if issparse(a) or issparse(b):
        ret = a * b  # <== this line here: `a` is CSR, `b` is dense clf.coef_.T
        if dense_output and hasattr(ret, "toarray"):
            ...

Because b is transposed and scipy's multiplication invokes b.ravel(), this is very slow (copies clf.coef_ internally).

Keeping clf.coef_.T as a C-contiguous array here improved the prediction performance of SGD classifier 7300x for us (1s vs 137µs per call):

def decision_function(self, X):
    ...
    # before:
    # scores = safe_sparse_dot(X, self.coef_.T, dense_output=True) + self.intercept_

    # after quick fix: self.coef_T = np.ascontiguousarray(self.coef_.T)
    scores = safe_sparse_dot(X, self.coef_T, dense_output=True) + self.intercept_
    ...

The exact speedup numbers will vary, depending on coef_ size (~the number of SGD classes and features).

This could be raised as an issue in scipy as well (I see no good reason for such inefficiency -- that ravel() is just too generous), but since the fix seems trivial, maybe it's worth addressing on sklearn side as well?

This is using scipy 0.16.1 and sklearn 0.17.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions