EFF Optimize memory usage for sparse matrices in LLE (Hessian, Modified and LTSA) #28096

giorgioangel · 2024-01-10T14:39:50Z

What does this implement/fix? Explain your changes.

This PR optimizes memory management with sparse matrices when using Modified Locally Linear Embedding.

Before this PR, a numpy NxN array was created, filled, and then converted to sparse. The creation of the said array can require huge memory when dealing with a large dataset.
On the dataset I was working with, the algorithm tried to allocate 400GB of RAM lol...

In the current PR, when M_sparse is true, the algorithm creates directly a sparse matrix, greatly reducing the memory requirements.

github-actions · 2024-01-10T14:41:01Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: c2e82b3. Link to the linter CI: here}

glemaitre · 2024-01-11T20:21:06Z

Could you add an entry in the changelog doc/whats_new/v1.5.rst to acknowledge the change.
Could you also have the same approach for the other method to be consistent. We should also make sure that we test all the combination in the tests.

glemaitre

I am quite worry that with a bug, we did not have any test failing. We should make sure to have something minimal here.

doc/whats_new/v1.5.rst

sklearn/manifold/_locally_linear.py

removing double loop Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

resolving loop for sparse matrix Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

glemaitre · 2024-05-20T09:57:24Z

Sorry @giorgioangel I did not have time to follow-up before the release. I'll add the milestone for 1.6 and make a review. I'll sort out any conflict and ping someone else for a second review.

glemaitre

LGTM. Since we have a test that check for all solvers and all methods, then it means that we did not introduced a regression. So I would advocate that we don't need any additional tests

OmarManzoor

LGTM. Thanks @giorgioangel

Optimization of memory for sparse Modified LLE

ee31bef

github-actions bot added the module:manifold label Jan 10, 2024

Optimization of memory for sparse Modified LLE, black & ruff fixes

0c30c18

glemaitre changed the title ~~Optimization of memory for sparse matrices in Modified LLE~~ EFF Optimize memory usage for sparse matrices in Modified LLE Jan 11, 2024

Optimize memory usage for sparse matrices in LLE

97132d2

giorgioangel changed the title ~~EFF Optimize memory usage for sparse matrices in Modified LLE~~ EFF Optimize memory usage for sparse matrices in LLE (Hessian, Modified and LTSA) Jan 12, 2024

Giorgio Angelotti added 2 commits January 12, 2024 10:27

Memory usage LLE sparse matrices: numpy optimization

d2acf4f

fixing doc

cfe5b38

glemaitre self-requested a review January 13, 2024 11:47

glemaitre reviewed Jan 13, 2024

View reviewed changes

doc/whats_new/v1.5.rst Outdated Show resolved Hide resolved

sklearn/manifold/_locally_linear.py Outdated Show resolved Hide resolved

sklearn/manifold/_locally_linear.py Outdated Show resolved Hide resolved

glemaitre reviewed Jan 13, 2024

View reviewed changes

sklearn/manifold/_locally_linear.py Outdated Show resolved Hide resolved

glemaitre reviewed Jan 13, 2024

View reviewed changes

sklearn/manifold/_locally_linear.py Outdated Show resolved Hide resolved

giorgioangel and others added 4 commits January 13, 2024 21:24

Update sklearn/manifold/_locally_linear.py

17bd558

removing double loop Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Update sklearn/manifold/_locally_linear.py

7729d37

resolving loop for sparse matrix Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

adding container

de630df

final changes

15c04c8

giorgioangel requested a review from glemaitre February 4, 2024 18:36

Merge remote-tracking branch 'origin/main' into pr/giorgioangel/28096

5654d45

glemaitre added this to the 1.6 milestone May 20, 2024

glemaitre approved these changes May 20, 2024

View reviewed changes

Merge branch 'main' into lle_sparse_memory_optimization

c2e82b3

OmarManzoor approved these changes Jun 27, 2024

View reviewed changes

OmarManzoor enabled auto-merge (squash) June 27, 2024 13:54

OmarManzoor merged commit 9590c07 into scikit-learn:main Jun 27, 2024
28 checks passed

jeremiedbb mentioned this pull request Jul 2, 2024

Release 1.5.1 #29382

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

EFF Optimize memory usage for sparse matrices in LLE (Hessian, Modified and LTSA) #28096

EFF Optimize memory usage for sparse matrices in LLE (Hessian, Modified and LTSA) #28096

Uh oh!

giorgioangel commented Jan 10, 2024

Uh oh!

github-actions bot commented Jan 10, 2024 •

edited

Loading

Uh oh!

glemaitre commented Jan 11, 2024

Uh oh!

glemaitre left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre commented May 20, 2024

Uh oh!

glemaitre left a comment

Uh oh!

OmarManzoor left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EFF Optimize memory usage for sparse matrices in LLE (Hessian, Modified and LTSA) #28096

EFF Optimize memory usage for sparse matrices in LLE (Hessian, Modified and LTSA) #28096

Uh oh!

Conversation

giorgioangel commented Jan 10, 2024

What does this implement/fix? Explain your changes.

Uh oh!

github-actions bot commented Jan 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

glemaitre commented Jan 11, 2024

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre commented May 20, 2024

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 10, 2024 •

edited

Loading