Skip to content

EFF Optimize memory usage for sparse matrices in LLE (Hessian, Modified and LTSA) #28096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

giorgioangel
Copy link
Contributor

What does this implement/fix? Explain your changes.

This PR optimizes memory management with sparse matrices when using Modified Locally Linear Embedding.

Before this PR, a numpy NxN array was created, filled, and then converted to sparse. The creation of the said array can require huge memory when dealing with a large dataset.
On the dataset I was working with, the algorithm tried to allocate 400GB of RAM lol...

In the current PR, when M_sparse is true, the algorithm creates directly a sparse matrix, greatly reducing the memory requirements.

Copy link

github-actions bot commented Jan 10, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: c2e82b3. Link to the linter CI: here

@glemaitre glemaitre changed the title Optimization of memory for sparse matrices in Modified LLE EFF Optimize memory usage for sparse matrices in Modified LLE Jan 11, 2024
@glemaitre
Copy link
Member

Could you add an entry in the changelog doc/whats_new/v1.5.rst to acknowledge the change.
Could you also have the same approach for the other method to be consistent. We should also make sure that we test all the combination in the tests.

@giorgioangel giorgioangel changed the title EFF Optimize memory usage for sparse matrices in Modified LLE EFF Optimize memory usage for sparse matrices in LLE (Hessian, Modified and LTSA) Jan 12, 2024
@glemaitre glemaitre self-requested a review January 13, 2024 11:47
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am quite worry that with a bug, we did not have any test failing. We should make sure to have something minimal here.

giorgioangel and others added 4 commits January 13, 2024 21:24
removing double loop

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
resolving loop for sparse matrix

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@glemaitre glemaitre added this to the 1.6 milestone May 20, 2024
@glemaitre
Copy link
Member

Sorry @giorgioangel I did not have time to follow-up before the release. I'll add the milestone for 1.6 and make a review. I'll sort out any conflict and ping someone else for a second review.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Since we have a test that check for all solvers and all methods, then it means that we did not introduced a regression. So I would advocate that we don't need any additional tests

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @giorgioangel

@OmarManzoor OmarManzoor enabled auto-merge (squash) June 27, 2024 13:54
@OmarManzoor OmarManzoor merged commit 9590c07 into scikit-learn:main Jun 27, 2024
28 checks passed
@jeremiedbb jeremiedbb mentioned this pull request Jul 2, 2024
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

3 participants