Skip to content

CLN _hdbscan/_tree.pyx algorithms refactor #26011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 19, 2023

Conversation

Micky774
Copy link
Contributor

Reference Issues/PRs

Addresses #24686

What does this implement/fix? Explain your changes.

Simplifies algorithms with reorganization of logic, removing/de-nesting extraneous if statements, and pruning unnecessary variables

Any other comments?

This is the final PR in the long series of backend cleanup PRs :)

CC: @thomasjpfan @jjerphan

You two may be interested in taking a look

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for simplifying the logic, @Micky774.

@Micky774
Copy link
Contributor Author

Micky774 commented Apr 6, 2023

@lmcinnes I wonder if you could take a look and offer your thoughts as well, to potentially help reviewers make their decisions with confidence. I tried to avoid altering any necessary logic and instead focus on reducing/simplifying conditions as far as possible.

lambda_val = sorted_lambdas[idx]

if child == current_child:
min_lambda = min(min_lambda, lambda_val)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that child != current_child in this algorithm since each child is unique to a single parent and hence only appears once. I suspect the removal of this line also simplifies the algorithm enough to remove the need for sorting entirely.

@lmcinnes
Copy link
Contributor

LGTM

There are probably a number of things that could also be simplified, but it is probably best to keep things as close to the earlier implementation as possible for now.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @Micky774.

Also thank you @lmcinnes for your approval.

@jjerphan jjerphan merged commit 3041b48 into scikit-learn:hdbscan Apr 19, 2023
@Micky774 Micky774 deleted the hdbscan_tree_algo branch April 19, 2023 15:55
Micky774 added a commit to Micky774/scikit-learn that referenced this pull request May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants