-
-
Notifications
You must be signed in to change notification settings - Fork 26k
Fix n_jobs for DBSCAN #16384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix n_jobs for DBSCAN #16384
Conversation
sklearn/cluster/_dbscan.py
Outdated
@@ -86,7 +86,9 @@ def dbscan(X, eps=0.5, min_samples=5, metric='minkowski', metric_params=None, | |||
The number of parallel jobs to run for neighbors search. | |||
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. | |||
``-1`` means using all processors. See :term:`Glossary <n_jobs>` | |||
for more details. | |||
for more details. Parallel execution is implemented for computing the | |||
distances. If precomputed distance are used, parallel execution is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks you. Probably better to remove "Parallel execution is implemented for computing the distances." That's redundant with the first sentence in this paragraph. Also "If precomputed distance is used"? Otherwise LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks wil remove the redundancy. The ticket is about parallel execution not making a difference when precomputed distances are used and clarifying that in the documentation. Thats way "If precomputed distance are used" is in there but should perhaps be updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sounds good.
…abetes dataset (scikit-learn#16341) * exchanged the boston for diabetes dataset loading * updated the feature names * exchanged remaining feature names * cleaning up * changed features used to age and bmi for easier understanding
Fixes scikit-learn#15604 This is more computationally expensive than the previous implementation, but should reduce memory costs substantially in common use cases.
accidentally pushed to master This reverts commit ae9eaf8.
… .… (4) (scikit-learn#16320) * Update random_state doc for mutual_info.py, unsupervised.py, testing.py and utils/init.py * Update comments from pr * Update sklearn/metrics/cluster/_unsupervised.py Co-Authored-By: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * Update sklearn/utils/__init__.py Co-Authored-By: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * Update sklearn/utils/__init__.py Co-Authored-By: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
* fix max_leaf_node max_depth interaction * Added test * comment * what's new * simpler solution * moved and simplified test * typo
* dropping python-3.5 * fix install guide * fix setup.py * fix circleci * advanced_installation * index.html * readme * pyparsing.py * remove clean_warning_registry * 1.13.1 and 1.19.1 * don't use 16.04 * 18.04 libatlas-dev -> libatlas-base-dev * min pillow version for 3.6 is 4.2.1 * echo commands * fix conflict, and 32 bit * fix conflict for circle * fix scikit-image version dep * mark tests as xfail on 32bit py3.6 * move to 1.13.3 min numpy version, and simplify old code * remove the rest of _object_dtype_isnan usages * Revert "remove the rest of _object_dtype_isnan usages" This reverts commit c6e867e. * fix issues raised by jeremy * minor fix * minor fixes mentioned by Olivier
* DOC promote shallow copy in the docs * a shorter version
…16244) * add pyproject.toml * do it in CI * cln * no build isolation in the doc * bump dependencies * no-build-isolation everywhere needed
* add Bunch to public docs and API * address Thomas's suggestions * use Thomas's description
…kit-learn#14675) * Add context to tutorial example * try to keep line length under 80 characters * make CI pass * MNT Skips unsupervised_learning when scikit-image is not installed * DOC make code comments as text * solve issue plotting * fix * center images Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
Hey @JohanWork. Are you still interested in following up on this? I think it's nearly ready to go! |
…t lack of parallelism in the method (scikit-learn#16599)
Yes, please. Thanks! |
Also flake8 linting is failing (probably too long line, see the logs). |
* Added MLPRegressor and MLPClassifier examples * shortened line due to test failure * changed way to calculate score due to approximation error in tests * removed comments, used predict instead of cross validation * DOC Simplified make_regression and make_classification arguments * DOC fix for linting * DOC Update * DOC Less precision Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
* MNT Users pull request labeler * MNT Fix * labeler fix * MNT Update * MNT Uses single quotes * BUG Uses number * MN Removes Build / CI auto labeling * MNT Updates to labeler v2.4.1 * ENH Updates version to include logs
I made a misstake, trying to rebase. |
Reference Issues/PRs
Fixes #16299
What does this implement/fix? Explain your changes.
The implementation updates documentation for sklearn.cluster.DBSCAN regarding n_jobs for precomputed distances
Any other comments?
My first commit to open source, happy for any feedback!