-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Description
Multiclass prediction using SGD on sparse data is unnecessarily slow. It seems to be copying large arrays on every call to predict
, predict_proba
etc, which kills its performance.
I think the main culprit is safe_sparse_dot
, which uses scipy's "CSR * dense" routine:
def safe_sparse_dot(a, b, dense_output=False):
if issparse(a) or issparse(b):
ret = a * b # <== this line here: `a` is CSR, `b` is dense clf.coef_.T
if dense_output and hasattr(ret, "toarray"):
...
Because b
is transposed and scipy's multiplication invokes b.ravel()
, this is very slow (copies clf.coef_
internally).
Keeping clf.coef_.T
as a C-contiguous array here improved the prediction performance of SGD classifier 7300x for us (1s vs 137µs per call):
def decision_function(self, X):
...
# before:
# scores = safe_sparse_dot(X, self.coef_.T, dense_output=True) + self.intercept_
# after quick fix: self.coef_T = np.ascontiguousarray(self.coef_.T)
scores = safe_sparse_dot(X, self.coef_T, dense_output=True) + self.intercept_
...
The exact speedup numbers will vary, depending on coef_
size (~the number of SGD classes and features).
This could be raised as an issue in scipy as well (I see no good reason for such inefficiency -- that ravel()
is just too generous), but since the fix seems trivial, maybe it's worth addressing on sklearn side as well?
This is using scipy 0.16.1 and sklearn 0.17.