-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
EFF Optimize memory usage for sparse matrices in LLE (Hessian, Modified and LTSA) #28096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EFF Optimize memory usage for sparse matrices in LLE (Hessian, Modified and LTSA) #28096
Conversation
Could you add an entry in the changelog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am quite worry that with a bug, we did not have any test failing. We should make sure to have something minimal here.
removing double loop Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
resolving loop for sparse matrix Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sorry @giorgioangel I did not have time to follow-up before the release. I'll add the milestone for 1.6 and make a review. I'll sort out any conflict and ping someone else for a second review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Since we have a test that check for all solvers and all methods, then it means that we did not introduced a regression. So I would advocate that we don't need any additional tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @giorgioangel
What does this implement/fix? Explain your changes.
This PR optimizes memory management with sparse matrices when using Modified Locally Linear Embedding.
Before this PR, a numpy NxN array was created, filled, and then converted to sparse. The creation of the said array can require huge memory when dealing with a large dataset.
On the dataset I was working with, the algorithm tried to allocate 400GB of RAM lol...
In the current PR, when M_sparse is true, the algorithm creates directly a sparse matrix, greatly reducing the memory requirements.