ENH save memory with LinearLoss #23090

lorentzenchr · 2022-04-09T15:49:25Z

Reference Issues/PRs

Follow up of #21808 and #22548.

What does this implement/fix? Explain your changes.

This PR enables to allocate ndarrays an reuse them in LinearModelLoss. This improves memory footprint of:

LogisticRegression with solver "lbfgs" and "newton-cg"
TweedieRegressor, PoissonRegressor, GammaRegressor

Any other comments?

One could also provide pre-allocated arrays for the actual gradient (wrt the coefficients). This has one has shape=coef.shape.

If lbfgs, for instance, does 100 interations, then the current implementations allocates 2*100 temporary arrays for gradient and loss. In particular for multiclass problems, these gradient arrays have shape=(n_samples, n_classes).

thomasjpfan

Thanks for the PR!

Are there benchmarks to showing the memory improvements?

sklearn/linear_model/_logistic.py

sklearn/linear_model/_glm/glm.py

lorentzenchr · 2022-04-10T08:44:44Z

An alternative to explicitly calling the methods with such temporary arrays like

def gradient(..., per_sample_gradient_out=None):

would be to let class LinearModelLoss handle this. For example, one could implement a method:

def set_temporary_arrays(self, n_samples, n_classes, type):
    self.per_sample_gradient_out = np.empty(...)

And then use those temporaries implicitly.

thomasjpfan · 2022-04-10T13:54:41Z

I'm +0 on having LinearModelLoss handle the temporary arrays. I think adding state complicates the object a little and will be slightly harder for future maintainers.

Also, we would need to be careful about the size of LinearModelLoss itself. The loss object is an private attribute in GLM:

scikit-learn/sklearn/linear_model/_glm/glm.py

Line 233 in bd9336d

self._linear_loss = LinearModelLoss(

The temporary arrays would need be to be removed after they are not needed anymore to not take up unnecessary memory after fitting.

lorentzenchr · 2022-04-11T13:05:08Z

I'm +0 on having LinearModelLoss handle the temporary arrays.

I just wanted to point out other options. Thanks for your insights on the trade-offs.

ogrisel · 2022-04-13T08:12:26Z

Do we have any reason to keep a long lived _linear_loss attribute on an estimator after a call to fit?

ogrisel · 2022-04-13T10:00:57Z

I am not sure it will have that of memory usage impact as I expect malloc to recycle recently freed memory buffers anyway. However it could improve the speed by avoiding doing too many calls to mallocs.

However, in a LBFGS call I expect that there should be ~100 of calls to the loss function object, so 100 extra mallocs + free might be invisible.

Could you please run a quick benchmarking with %timeit and memory_profiler mprof run / mprof plot to see the impact of this PR on a typical use case?

lorentzenchr · 2022-04-14T07:29:46Z

I ran the simple script under details. %timeit gives not significant difference. But I'm puzzled by the result of mrpof run. I would have expected the opposite.

This PR

main

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression


alpha = 0.01
n_samples, n_features, n_classes = 100_000, 100, 50

X, y = make_classification(
    n_samples=n_samples,
    n_features=n_features,
    n_informative=n_features,
    n_redundant=0,
    n_classes=n_classes,
)
clf = LogisticRegression(C = 1/alpha)
clf.fit(X, y)

ogrisel · 2022-04-14T17:13:13Z

Is this behavior deterministic?

lorentzenchr · 2022-04-14T18:54:25Z

Is this behavior deterministic?

Yes. It seems so.

jjerphan

Thanks for exploring this, @lorentzenchr!

Here are a few commented that I started months ago. I guess this PR can be closed as it seems memory performance are currently better on main. Or do you think there might be another pattern improving memory usage, what do you think?

jjerphan · 2022-06-16T20:30:54Z

sklearn/linear_model/_logistic.py

+    if solver == "lbfgs":
+        # To save some memory, we preallocate a ndarray used as per row loss and
+        # gradient inside od LinearLoss, e.g. by LinearLoss.base_loss.gradient (and
+        # others).
+        per_sample_loss_out = np.empty_like(target)
+        if linear_loss.base_loss.is_multiclass:
+            per_sample_gradient_out = np.empty(
+                shape=(X.shape[0], classes.size), dtype=X.dtype, order="C"
+            )
+        else:
+            per_sample_gradient_out = np.empty_like(target, order="C")
+
+        func = functools.partial(
+            linear_loss.loss_gradient,
+            per_sample_loss_out=per_sample_loss_out,
+            per_sample_gradient_out=per_sample_gradient_out,
+        )
+    elif solver == "newton-cg":
+        # To save some memory, we preallocate a ndarray used as per row loss and
+        # gradient inside od LinearLoss, e.g. by LinearLoss.base_loss.gradient (and
+        # others).
+        per_sample_loss_out = np.empty_like(target)
+        if linear_loss.base_loss.is_multiclass:
+            per_sample_gradient_out = np.empty(
+                shape=(X.shape[0], classes.size), dtype=X.dtype, order="C"
+            )
+        else:
+            per_sample_gradient_out = np.empty_like(target, order="C")


Can this be boiled down to this? Note that I have also specified dtype=X.dtype when creating per_sample_gradient_out and changed the comment.

Suggested change

if solver == "lbfgs":

# To save some memory, we preallocate a ndarray used as per row loss and

# gradient inside od LinearLoss, e.g. by LinearLoss.base_loss.gradient (and

# others).

per_sample_loss_out = np.empty_like(target)

if linear_loss.base_loss.is_multiclass:

per_sample_gradient_out = np.empty(

shape=(X.shape[0], classes.size), dtype=X.dtype, order="C"

)

else:

per_sample_gradient_out = np.empty_like(target, order="C")

func = functools.partial(

linear_loss.loss_gradient,

per_sample_loss_out=per_sample_loss_out,

per_sample_gradient_out=per_sample_gradient_out,

)

elif solver == "newton-cg":

# To save some memory, we preallocate a ndarray used as per row loss and

# gradient inside od LinearLoss, e.g. by LinearLoss.base_loss.gradient (and

# others).

per_sample_loss_out = np.empty_like(target)

if linear_loss.base_loss.is_multiclass:

per_sample_gradient_out = np.empty(

shape=(X.shape[0], classes.size), dtype=X.dtype, order="C"

)

else:

per_sample_gradient_out = np.empty_like(target, order="C")

# To save some memory, we preallocate two ndarrays used respectively

# as per row loss, gradient inside of LinearLoss by several methods

# e.g. by LinearLoss.base_loss.{loss,gradient,gradient_hessian_product}.

per_sample_loss_out = np.empty_like(target)

if linear_loss.base_loss.is_multiclass:

per_sample_gradient_out = np.empty(

shape=(X.shape[0], classes.size), dtype=X.dtype, order="C"

)

else:

per_sample_gradient_out = np.empty_like(target, dtype=X.dtype, order="C")

if solver == "lbfgs":

func = functools.partial(

linear_loss.loss_gradient,

per_sample_loss_out=per_sample_loss_out,

per_sample_gradient_out=per_sample_gradient_out,

)

elif solver == "newton-cg":

jjerphan · 2022-06-16T20:30:55Z

sklearn/linear_model/_logistic.py

+        hess = functools.partial(
+            linear_loss.gradient_hessian_product,  # hess = [gradient, hessp]
+            per_sample_gradient_out=per_sample_gradient_out,
+            per_sample_hessian_out=per_sample_hessian_out,
+        )


Suggested change

hess = functools.partial(

linear_loss.gradient_hessian_product, # hess = [gradient, hessp]

per_sample_gradient_out=per_sample_gradient_out,

per_sample_hessian_out=per_sample_hessian_out,

)

# hess = [gradient, hessp]

hess = functools.partial(

linear_loss.gradient_hessian_product,

per_sample_gradient_out=per_sample_gradient_out,

per_sample_hessian_out=per_sample_hessian_out,

)

jjerphan · 2022-06-16T20:34:48Z

sklearn/linear_model/_glm/glm.py

+        # To save some memory, we preallocate a ndarray used as per row loss and
+        # gradient inside of LinearLoss, e.g. by LinearLoss.base_loss.gradient (and
+        # others).


It is worth being a bit more explicit?

Suggested change

# To save some memory, we preallocate a ndarray used as per row loss and

# gradient inside of LinearLoss, e.g. by LinearLoss.base_loss.gradient (and

# others).

# To save some memory, we preallocate two ndarrays used respectively

# as per row loss, gradient inside of LinearLoss by several methods

# e.g. by LinearLoss.base_loss.{loss,gradient,gradient_hessian_product}.

Oops, I only wanted to comment but I miscliked.

ENH save memory with LinearLoss

7a6743a

github-actions bot added the module:linear_model label Apr 9, 2022

thomasjpfan reviewed Apr 9, 2022

View reviewed changes

sklearn/linear_model/_logistic.py Outdated Show resolved Hide resolved

sklearn/linear_model/_glm/glm.py Outdated Show resolved Hide resolved

CLN use functools

eac91e1

thomasjpfan mentioned this pull request Apr 13, 2022

MNT Removes _linear_loss attribute in GLMs #23126

Merged

Merge branch 'main' into linear_loss_array_reuse

3e8deea

jjerphan previously approved these changes Nov 18, 2022

View reviewed changes

lorentzenchr closed this Nov 18, 2022

lorentzenchr deleted the linear_loss_array_reuse branch November 18, 2022 10:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH save memory with LinearLoss #23090

ENH save memory with LinearLoss #23090

lorentzenchr commented Apr 9, 2022 •

edited

Loading

thomasjpfan left a comment

lorentzenchr commented Apr 10, 2022

thomasjpfan commented Apr 10, 2022 •

edited

Loading

lorentzenchr commented Apr 11, 2022

ogrisel commented Apr 13, 2022

ogrisel commented Apr 13, 2022

lorentzenchr commented Apr 14, 2022

ogrisel commented Apr 14, 2022

lorentzenchr commented Apr 14, 2022 •

edited

Loading

jjerphan left a comment

jjerphan Jun 16, 2022

jjerphan Jun 16, 2022

jjerphan Jun 16, 2022

ENH save memory with LinearLoss #23090

ENH save memory with LinearLoss #23090

Conversation

lorentzenchr commented Apr 9, 2022 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

thomasjpfan left a comment

Choose a reason for hiding this comment

lorentzenchr commented Apr 10, 2022

thomasjpfan commented Apr 10, 2022 • edited Loading

lorentzenchr commented Apr 11, 2022

ogrisel commented Apr 13, 2022

ogrisel commented Apr 13, 2022

lorentzenchr commented Apr 14, 2022

ogrisel commented Apr 14, 2022

lorentzenchr commented Apr 14, 2022 • edited Loading

jjerphan left a comment

Choose a reason for hiding this comment

jjerphan Jun 16, 2022

Choose a reason for hiding this comment

jjerphan Jun 16, 2022

Choose a reason for hiding this comment

jjerphan Jun 16, 2022

Choose a reason for hiding this comment

lorentzenchr commented Apr 9, 2022 •

edited

Loading

thomasjpfan commented Apr 10, 2022 •

edited

Loading

lorentzenchr commented Apr 14, 2022 •

edited

Loading