-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH Preserving dtype for np.float32 in LatentDirichletAllocation #22113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR is tackling to the same issue with a quite old PR #13275 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @takoika !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM once the review suggestions are handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @takoika !
I gave this PR a quick pass and left a question regarding fused dtypes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment and this LGTM.
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you, @takoika.
PS: As @thomasjfox said, let's wait for @ogrisel review.
PPS: 🤦 oops, as @thomasjpfan indicated, let's wait for @ogrisel review.*
thomasjfox is especially unrelated to the scikit-learn project ;) -> wrong guy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick benchmark and noticed that this PR leads to a runtime regression:
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.datasets import make_multilabel_classification
from time import perf_counter
X, _ = make_multilabel_classification(random_state=0, n_samples=2_000,
n_features=20)
lda = LatentDirichletAllocation(n_components=5, random_state=0)
start = perf_counter()
lda.fit(X)
delta = perf_counter() - start
print(f"{delta} s")
With this PR I get 7.3 s and on main
I get 4.4 s. We would need to investigate before merging.
Thanks! I have reproduce the same slow down issue. I will investigate it. |
You might want to try using a profiler such as |
I identified and fixed the regression if PR #24528 |
Reference Issues/PRs
This PR is part of #11000 .
Closes #13275
What does this implement/fix? Explain your changes.
This PR makes LatentDirichletAllocation preserve numpy.float32 when input data is numpy.float32 in order to preserve input data type.
Any other comments?
I used #20155 and #13275 as references to make this.