Fix n_jobs for DBSCAN #16384

JohanWork · 2020-02-04T20:12:57Z

Reference Issues/PRs

Fixes #16299

What does this implement/fix? Explain your changes.

The implementation updates documentation for sklearn.cluster.DBSCAN regarding n_jobs for precomputed distances

Any other comments?

My first commit to open source, happy for any feedback!

…ecomputed

rth · 2020-02-04T20:19:11Z

sklearn/cluster/_dbscan.py

@@ -86,7 +86,9 @@ def dbscan(X, eps=0.5, min_samples=5, metric='minkowski', metric_params=None,
        The number of parallel jobs to run for neighbors search.
        ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
        ``-1`` means using all processors. See :term:`Glossary <n_jobs>`
-        for more details.
+        for more details. Parallel execution is implemented for computing the 
+        distances. If precomputed distance are used, parallel execution is not 


Thanks you. Probably better to remove "Parallel execution is implemented for computing the distances." That's redundant with the first sentence in this paragraph. Also "If precomputed distance is used"? Otherwise LGTM.

Thanks wil remove the redundancy. The ticket is about parallel execution not making a difference when precomputed distances are used and clarifying that in the documentation. Thats way "If precomputed distance are used" is in there but should perhaps be updated?

Yes, sounds good.

…earn#16347)

…learn#16305)

…arn#16382)

…abetes dataset (scikit-learn#16341) * exchanged the boston for diabetes dataset loading * updated the feature names * exchanged remaining feature names * cleaning up * changed features used to age and bmi for easier understanding

Fixes scikit-learn#15604 This is more computationally expensive than the previous implementation, but should reduce memory costs substantially in common use cases.

accidentally pushed to master This reverts commit ae9eaf8.

…learn#16245)

… .… (4) (scikit-learn#16320) * Update random_state doc for mutual_info.py, unsupervised.py, testing.py and utils/init.py * Update comments from pr * Update sklearn/metrics/cluster/_unsupervised.py Co-Authored-By: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * Update sklearn/utils/__init__.py Co-Authored-By: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * Update sklearn/utils/__init__.py Co-Authored-By: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* fix max_leaf_node max_depth interaction * Added test * comment * what's new * simpler solution * moved and simplified test * typo

…ForestClassifier (scikit-learn#15971)

* dropping python-3.5 * fix install guide * fix setup.py * fix circleci * advanced_installation * index.html * readme * pyparsing.py * remove clean_warning_registry * 1.13.1 and 1.19.1 * don't use 16.04 * 18.04 libatlas-dev -> libatlas-base-dev * min pillow version for 3.6 is 4.2.1 * echo commands * fix conflict, and 32 bit * fix conflict for circle * fix scikit-image version dep * mark tests as xfail on 32bit py3.6 * move to 1.13.3 min numpy version, and simplify old code * remove the rest of _object_dtype_isnan usages * Revert "remove the rest of _object_dtype_isnan usages" This reverts commit c6e867e. * fix issues raised by jeremy * minor fix * minor fixes mentioned by Olivier

…6411)

* DOC promote shallow copy in the docs * a shorter version

…16244) * add pyproject.toml * do it in CI * cln * no build isolation in the doc * bump dependencies * no-build-isolation everywhere needed

* add Bunch to public docs and API * address Thomas's suggestions * use Thomas's description

…kit-learn#14675) * Add context to tutorial example * try to keep line length under 80 characters * make CI pass * MNT Skips unsupervised_learning when scikit-image is not installed * DOC make code comments as text * solve issue plotting * fix * center images Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>

amueller · 2020-02-28T18:40:56Z

Hey @JohanWork. Are you still interested in following up on this? I think it's nearly ready to go!

…it-learn#16112)

…t lack of parallelism in the method (scikit-learn#16599)

…egressor (scikit-learn#15151)

JohanWork · 2020-03-01T19:12:42Z

@amueller Hi sorry! So that @hugolmn approved but @rth do you still want the update if so, I will fixit tomorrow! Sorry for being slow in the response, will improve!

rth · 2020-03-01T21:52:55Z

do you still want the update if so, I will fixit tomorrow!

Yes, please. Thanks!

rth · 2020-03-01T21:53:44Z

Also flake8 linting is failing (probably too long line, see the logs).

* Added MLPRegressor and MLPClassifier examples * shortened line due to test failure * changed way to calculate score due to approximation error in tests * removed comments, used predict instead of cross validation * DOC Simplified make_regression and make_classification arguments * DOC fix for linting * DOC Update * DOC Less precision Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>

* MNT Users pull request labeler * MNT Fix * labeler fix * MNT Update * MNT Uses single quotes * BUG Uses number * MN Removes Build / CI auto labeling * MNT Updates to labeler v2.4.1 * ENH Updates version to include logs

…ecomputed

JohanWork · 2020-03-02T19:42:24Z

I made a misstake, trying to rebase.

Johan Hansson added 2 commits February 4, 2020 16:49

Adding comment about number of jobs not effecting if distances are pr…

06e1350

…ecomputed

updating the text after feedback

ad9176c

rth reviewed Feb 4, 2020

View reviewed changes

DatenBiene and others added 27 commits February 5, 2020 10:36

DOC Update random_state entry for dummy / random_projection (scikit-l…

32cce97

…earn#16347)

EXA diabetes instead of Boston dataset for feature selection (scikit-…

6324e40

…learn#16305)

DOC More explicit warnings about impurity based feat. imp. (scikit-le…

54c3a1f

…arn#16382)

TST Fixes test for California housing (scikit-learn#16389)

87a5930

MNT Adds filters to jinja template (scikit-learn#16133)

b8768a0

FIX Elkan k-means does not stop if tol=0 (scikit-learn#16075)

91261c2

ENH Perform KNN imputation without O(n^2) memory cost

ae9eaf8

Fixes scikit-learn#15604 This is more computationally expensive than the previous implementation, but should reduce memory costs substantially in common use cases.

Revert "ENH Perform KNN imputation without O(n^2) memory cost"

0c4252c

accidentally pushed to master This reverts commit ae9eaf8.

ENH Add 'if_binary' option to drop argument of OneHotEncoder (scikit-…

7e7e115

…learn#16245)

FIX max_leaf_node and max_depth interaction in GBDT (scikit-learn#16183)

98b3c7c

* fix max_leaf_node max_depth interaction * Added test * comment * what's new * simpler solution * moved and simplified test * typo

ENH Improve error message for sparse multilabel-indicator y in Random…

3f0b6c0

…ForestClassifier (scikit-learn#15971)

DOC: Mark the sentence end in classification_report (scikit-learn#1…

09bd9ee

…6411)

DOC Fix syntax in model_evaluation UG (scikit-learn#16410)

14e597c

DOC promote shallow copy in the docs (scikit-learn#16423)

c79a5b4

* DOC promote shallow copy in the docs * a shorter version

BLD Specify build time dependencies via pyproject.toml (scikit-learn#…

97d49f2

…16244) * add pyproject.toml * do it in CI * cln * no build isolation in the doc * bump dependencies * no-build-isolation everywhere needed

MNT/CI install scikit-image if we test doc on azure (scikit-learn#15065)

933b4cf

DOC add Bunch to public docs and API (scikit-learn#16404)

2821abc

* add Bunch to public docs and API * address Thomas's suggestions * use Thomas's description

DOC Docstring example of classifier should import classifier (scikit-…

32d3335

…learn#16430)

DOC improve random state docstring in model_selection/split (scikit-l…

0904058

…earn#15575)

DOC improve the documentation of OneHotEncoder for if_binary (scikit-…

db85b12

…learn#16428)

DOC follow doc formatting guideline for module gaussian_process (scik…

62ce1ba

…it-learn#16415)

MNT Deprecate public attributes in SGD and PassiveAggresive classes (s…

4913037

…cikit-learn#16261)

FEA Turn on early stopping in histogram GBDT by default (scikit-learn…

ee6b369

…#14516)

NicolasHug and others added 7 commits February 29, 2020 09:05

MNT Introduction of n_features_in_ attr with _validate_data mtd (scik…

d205638

…it-learn#16112)

DOC Rename clf to regr in SVR examples (scikit-learn#16598)

eb540f3

DOC Adds example for RandomTreesEmbedding (scikit-learn#15202)

1c74490

MNT rename _parallel_fit_estimator to _fit_single_estimator to reflec…

68a639e

…t lack of parallelism in the method (scikit-learn#16599)

DOC Adds examples to GradientBoostingClassifier and GradientBoostingR…

d86f8fd

…egressor (scikit-learn#15151)

Fix format of values in confusion matrix plot. (scikit-learn#16159)

94f877b

DOC Add formula for binary balanced accuracy in UG (scikit-learn#16604)

8868ec7

hugolmn approved these changes Mar 1, 2020

View reviewed changes

VarIr and others added 3 commits March 2, 2020 00:41

MNT Download and test datasets in cron job (scikit-learn#16348)

cd622df

MNT Periodic adds labels based on module (scikit-learn#16596)

0e4f85f

* MNT Users pull request labeler * MNT Fix * labeler fix * MNT Update * MNT Uses single quotes * BUG Uses number * MN Removes Build / CI auto labeling * MNT Updates to labeler v2.4.1 * ENH Updates version to include logs

github-actions bot added the module:cluster label Mar 2, 2020

Johan Hansson added 11 commits March 2, 2020 20:04

updating after feedback

cacffb2

update linting

4512429

update for linting

632f871

removing spaces

9ff12c8

Adding comment about number of jobs not effecting if distances are pr…

cdc75e9

…ecomputed

updating the text after feedback

00e6a8d

updating after feedback

1344138

update linting

37de28f

update for linting

1d80902

removing spaces

7451357

Merge branch 'nJobs' of github.com:JohanWork/scikit-learn into nJobs

91016ff

JohanWork closed this Mar 2, 2020

JohanWork mentioned this pull request Mar 2, 2020

git update n_jobs #16615

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix n_jobs for DBSCAN #16384

Fix n_jobs for DBSCAN #16384

Uh oh!

JohanWork commented Feb 4, 2020 •

edited

Loading

Uh oh!

rth Feb 4, 2020

Uh oh!

JohanWork Feb 5, 2020

Uh oh!

rth Feb 5, 2020

Uh oh!

amueller commented Feb 28, 2020

Uh oh!

JohanWork commented Mar 1, 2020

Uh oh!

rth commented Mar 1, 2020

Uh oh!

rth commented Mar 1, 2020

Uh oh!

JohanWork commented Mar 2, 2020

Uh oh!

Uh oh!

Uh oh!

Fix n_jobs for DBSCAN #16384

Fix n_jobs for DBSCAN #16384

Uh oh!

Conversation

JohanWork commented Feb 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

rth Feb 4, 2020

Choose a reason for hiding this comment

Uh oh!

JohanWork Feb 5, 2020

Choose a reason for hiding this comment

Uh oh!

rth Feb 5, 2020

Choose a reason for hiding this comment

Uh oh!

amueller commented Feb 28, 2020

Uh oh!

JohanWork commented Mar 1, 2020

Uh oh!

rth commented Mar 1, 2020

Uh oh!

rth commented Mar 1, 2020

Uh oh!

JohanWork commented Mar 2, 2020

Uh oh!

Uh oh!

JohanWork commented Feb 4, 2020 •

edited

Loading