DOC Adds `HDBSCAN.dbscan_clustering` section to `plot_hdbscan.py` #24879

Micky774 · 2022-11-10T00:04:38Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Adds HDBSCAN.dbscan_clustering section to plot_hdbscan.py. Fixes a small typo.

Any other comments?

…tures` (scikit-learn#24630)

…mple (scikit-learn#24374) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>

…cikit-learn#24598)

…-learn#24693)

…#24689) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* DOC Improve docstring around set_output * DOC Improve docs around set_output * DOC Address comments * DOC Better grammar * DOC Improve wording * DOC Improves docstring in set_config

…blic (scikit-learn#24688) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

…-learn#24682) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

)

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

…learn#24726)

…n#24722)

…earn#24692) Co-authored-by: Tim Head <betatim@gmail.com>

…). (scikit-learn#24727)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* FEA add NewtonSolver, CholeskyNewtonSolver and QRCholeskyNewtonSolver * ENH better singular hessian special solve * CLN fix some typos found by reviewer * TST assert ConvergenceWarning is raised * MNT add BaseCholeskyNewtonSolver * WIP colinear design in GLMs * FIX _solve_singular * FIX false unpacking in * TST add tests for unpenalized GLMs * TST fix solutions of glm_dataset * ENH add SVDFallbackSolver * CLN remove SVDFallbackSolver * ENH use gradient step for singular hessians * ENH print iteration number in warnings * TST improve test_linalg_warning_with_newton_solver * CLN LinAlgWarning fron scipy.linalg * ENH more robust hessian * ENH increase maxls for lbfgs to make it more robust * ENH add hessian_warning for too many negative hessian values * CLN some warning messages * ENH add lbfgs_step * ENH use lbfgs_step for hessian_warning * TST make them pass * TST tweek rtol for lbfgs * TST add rigoros test for GLMs * TST improve test_warm_start * ENH improve lbfgs options for better convergence * CLN fix test_warm_start * TST fix assert singular values in datasets * CLN address most review comments * ENH enable more vebosity levels for lbfgs * DOC add whatsnew * CLN remove xfail and clean a bit * CLN docstring about minimum norm * More informative repr for the glm_dataset fixture cases * Forgot to run black * CLN remove unnecessary filterwarnings * CLN address review comments * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * CLN add comment for lbfgs ftol=64 * machine precision * CLN XXX code comment * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * CLN link issue and remove code snippet in comment * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * CLN add catch_warnings * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * [all random seeds] test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * Trigger with -Werror [all random seeds] test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * ENH increase maxls to 50 * [all random seeds] test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * Revert "Trigger with -Werror [all random seeds]" This reverts commit 99f4cf9. * TST add catch_warnings to filterwarnings * TST adapt tests for newton solvers * CLN cleaner gradient step with gradient_times_newton * DOC add whatsnew * ENH always use lbfgs as fallback * TST adapt rtol * TST fix test_linalg_warning_with_newton_solver * CLN address some review comments * Improve tests related to convergence warning on collinear data * overfit -> fit * Typo in comment * Apply suggestions from code review * ENH fallback_lbfgs_solve - Do not use lbfgs steps, fall back complete to lbfgs * ENH adapt rtol * Improve test_linalg_warning_with_newton_solver * Better comments * Fixed Hessian casing and improved warning messages * [all random seeds] test_linalg_warning_with_newton_solver * Ignore ConvergenceWarnings for now if convergence is good * CLN remove counting of warnings * ENH fall back to lbfgs if line search did not converge * DOC better comment on performance bottleneck * Update GLM related examples to use the new solver * CLN address reviewer comments * EXA improve some wordings * CLN do not pop "solver in parameter constraints * CLN fix typos * DOC fix docstring * CLN remove solver newton-qr-cholesky * DOC update PR number in whatsnew * CLN address review comments * CLN remove unnecessary catch_warnings * CLN address some review comments * DOC more precise whatsnew * CLN use init_zero_coef * CLN use and test init_zero_coef * CLN address some review comments * CLN mark NewtonSolver as private by leading underscore * CLN exact comments for inner_solve * TST add test_newton_solver_verbosity * TST extend test_newton_solver_verbosity * TST logic in test_glm_regression_unpenalized * TST use count_nonzero * CLN remove super rare line search checks * MNT move Newton solver to new file _newton_solver.py Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

…in Windows (scikit-learn#24742) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

…scikit-learn#24071) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

…n#25104)

thomasjpfan · 2022-12-03T15:48:21Z

I think we need sync the hdbscan branch with main. The CI in this PR is failing and the fix is in main (#25017). Syncing with main will give us the opportunity to move the setup.py configuration to the root directory:

scikit-learn/setup.py

Line 253 in d52e946

extension_config = {

@Micky774 Can you open a PR to sync up the hdbscan branch with main?

Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

…ions (scikit-learn#25114) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> closes scikit-learn#25113

…_score (scikit-learn#24683) Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

…n#25093) Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>

…n#25083) * Doc changed n_init to n_jobs in mean_shift.py

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

…n#25131)

scikit-learn#24965)

Fixes scikit-learn#25135

…n#25139)

…learn#25146)

…n#25132)

thomasjpfan

I think something went wrong here or the sync with hdbscan and upstream/main.

Micky774 · 2023-02-04T00:40:27Z

I think something went wrong here or the sync with hdbscan and upstream/main.

Opened #25538 to fix!

rusdes and others added 30 commits October 14, 2022 16:05

API Remove sklearn.metrics.manhattan_distances option `sum_over_fea…

7cf938c

…tures` (scikit-learn#24630)

EFF avoid computing inertia in KMeans' predict (scikit-learn#24666)

8610e14

DOC make plot_agglomerative_clustering_metrics.py colorblind friend…

935f7e6

…ly (scikit-learn#24655)

DOC use KBinsDiscretizer in lieu of KMeans in vector quantization exa…

5af75b9

…mple (scikit-learn#24374) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>

TST use global_random_seed in sklearn/cluster/tests/test_dbscan.py (s…

625650f

…cikit-learn#24598)

HOTFIX Temporarily disable py38_conda_defaults_openblas build (scikit…

69ca8d5

…-learn#24693)

DOC Fix typo and adjust wording in set_output example (scikit-learn…

98a3fdf

…#24689) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

DOC Improve docstring around set_output (scikit-learn#24672)

d4306ba

* DOC Improve docstring around set_output * DOC Improve docs around set_output * DOC Address comments * DOC Better grammar * DOC Improve wording * DOC Improves docstring in set_config

ENH Makes OneToOneFeatureMixin and ClassNamePrefixFeaturesOutMixin pu…

1dc23d7

…blic (scikit-learn#24688) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

DOC Fix typo in plot_set_output.py example (scikit-learn#24704)

6eb4633

MAINT Fix build when SKLEARN_OPENMP_PARALLELISM_ENABLED=False (scikit…

d7af20c

…-learn#24682) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

MAINT renable Linux + Python 3.8 build with OpenBLAS (scikit-learn#24705

3e6a39a

)

CI Add wheel builds for Python 3.11 (scikit-learn#24446)

38c34af

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

CI Remove remaining windows 32 references (scikit-learn#24657)

7c2a58d

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

DOC fix typo inside Pipeline docstring (scikit-learn#24730)

b849ce8

DOC fix title underline too short in Gaussian Process kernel (scikit-…

6cd44db

…learn#24726)

DOC correct bound of sum LinearSVR in formula in svm.rst (scikit-lear…

8654471

…n#24722)

DOC fix sphinx directive in function (scikit-learn#24733)

0a48116

DOC fix deprecation warning raised by KMeans and Matplotlib (scikit-l…

8c384da

…earn#24692) Co-authored-by: Tim Head <betatim@gmail.com>

Add sphinx_highlight.js to the search page (needed since sphinx 5.2.0…

551ced2

…). (scikit-learn#24727)

DOC fix a missing final fullstop in docstring (scikit-learn#24739)

303d712

DOC Improve narrative of plot_roc_crossval example (scikit-learn#24710)

2335a8e

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

MAINT force NumPy version for building scikit-learn for CPython 3.10 …

a71c535

…in Windows (scikit-learn#24742) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

API add named_transformers attribute to FeatureUnion (scikit-learn#20331

ff33ffb

) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

DOC fix deprecated log loss argument in user guide (scikit-learn#24753)

55b55af

FIX check_estimator fails when validating SGDClassifier with log_loss (…

0e4e418

…scikit-learn#24071) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

DOC Add links to DBSCAN references. (scikit-learn#24758)

27d3899

FIX Fixes common test for requires_positive_X (scikit-learn#24667)

7dc7f8a

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

DOC add entries for the 1.1.3 release (scikit-learn#24744)

64876ca

glemaitre and others added 2 commits December 2, 2022 16:58

DOC fix missing return line in docstring (scikit-learn#25099)

07958e6

MNT Require matplotlib in test_input_data_dimension test (scikit-lear…

d52e946

…n#25104)

ArturoAmorQ and others added 22 commits December 5, 2022 11:39

DOC Rework k-means assumptions example (scikit-learn#24970)

cbfb6ab

Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

FIX ignore *args and **kwargs in parameter validation of public funct…

b01f018

…ions (scikit-learn#25114) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> closes scikit-learn#25113

FIX always expose best_loss_, validation_scores_, and best_validation…

1f9dc71

…_score (scikit-learn#24683) Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

FEA Adds FeatureUnion.__getitem__ to access transformers (scikit-lear…

f7e6977

…n#25093) Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>

MAINT Allow partial param validation for functions (scikit-learn#25087)

14130f4

update release checklist regarding SECURITY.md (scikit-learn#25122)

981e728

DOC add more details for n_jobs in MeanShift docstring (scikit-lear…

9a98487

…n#25083) * Doc changed n_init to n_jobs in mean_shift.py

MAINT Parameters validation for metrics.roc_curve (scikit-learn#25108)

6643c2c

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

DOC Correcting some small documentation typos (scikit-learn#25125)

d949e7f

FIX pos_label constraint in roc_curve (param validation) (scikit-lear…

929b3dc

…n#25131)

FIX Remove spurious UserWarning (scikit-learn#25129)

e99cd11

BLD Reduces size of wheels by stripping symbols (scikit-learn#25123)

9527920

MAINT Remove -Wcpp warnings when compiling _kd_tree and _ball_tree (

5564541

scikit-learn#24965)

CI Fixes Azure atlas CI job (scikit-learn#25136)

29b4cca

Fixes scikit-learn#25135

MAINT update SECURITY.md (scikit-learn#25138)

0057118

DOC Add missing step to the "making a release" checklist (scikit-lear…

0266481

…n#25139)

MNT parameter validation for covariance.empirical_covariance (scikit-…

17b8278

…learn#25146)

DOC Use notebook style in plot_gpr_on_structured_data.py (scikit-lear…

d8592a6

…n#25132)

DOC fix nightly build installation verbatim (scikit-learn#25153)

86fbdb1

DOC Fix spelling mistake (scikit-learn#25156)

54f621a

MAINT validate parameters of Pipeline (scikit-learn#25133)

754bd52

Merge branch 'main' into hdbscan_dbscan_plotting

32244a2

thomasjpfan reviewed Dec 19, 2022

View reviewed changes

Micky774 mentioned this pull request Feb 4, 2023

DOC Adds HDBSCAN.dbscan_clustering section to plot_hdbscan.py #25538

Merged

Micky774 closed this Feb 4, 2023

Micky774 deleted the hdbscan_dbscan_plotting branch February 4, 2023 00:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC Adds `HDBSCAN.dbscan_clustering` section to `plot_hdbscan.py` #24879

DOC Adds `HDBSCAN.dbscan_clustering` section to `plot_hdbscan.py` #24879

Micky774 commented Nov 10, 2022

thomasjpfan commented Dec 3, 2022

thomasjpfan left a comment

Micky774 commented Feb 4, 2023

DOC Adds HDBSCAN.dbscan_clustering section to plot_hdbscan.py #24879

DOC Adds HDBSCAN.dbscan_clustering section to plot_hdbscan.py #24879

Conversation

Micky774 commented Nov 10, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

thomasjpfan commented Dec 3, 2022

thomasjpfan left a comment

Choose a reason for hiding this comment

Micky774 commented Feb 4, 2023

DOC Adds `HDBSCAN.dbscan_clustering` section to `plot_hdbscan.py` #24879

DOC Adds `HDBSCAN.dbscan_clustering` section to `plot_hdbscan.py` #24879