Skip to content

Fix n_jobs for DBSCAN #16384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 101 commits into from
Closed

Fix n_jobs for DBSCAN #16384

wants to merge 101 commits into from

Conversation

JohanWork
Copy link
Contributor

@JohanWork JohanWork commented Feb 4, 2020

Reference Issues/PRs

Fixes #16299

What does this implement/fix? Explain your changes.

The implementation updates documentation for sklearn.cluster.DBSCAN regarding n_jobs for precomputed distances

Any other comments?

My first commit to open source, happy for any feedback!

@@ -86,7 +86,9 @@ def dbscan(X, eps=0.5, min_samples=5, metric='minkowski', metric_params=None,
The number of parallel jobs to run for neighbors search.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.
for more details. Parallel execution is implemented for computing the
distances. If precomputed distance are used, parallel execution is not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you. Probably better to remove "Parallel execution is implemented for computing the distances." That's redundant with the first sentence in this paragraph. Also "If precomputed distance is used"? Otherwise LGTM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks wil remove the redundancy. The ticket is about parallel execution not making a difference when precomputed distances are used and clarifying that in the documentation. Thats way "If precomputed distance are used" is in there but should perhaps be updated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good.

DatenBiene and others added 27 commits February 5, 2020 10:36
…abetes dataset (scikit-learn#16341)

* exchanged the boston for diabetes dataset loading

* updated the feature names

* exchanged remaining feature names

* cleaning up

* changed features used to age and bmi for easier understanding
Fixes scikit-learn#15604

This is more computationally expensive than the previous implementation,
but should reduce memory costs substantially in common use cases.
accidentally pushed to master

This reverts commit ae9eaf8.
… .… (4) (scikit-learn#16320)

* Update random_state doc for mutual_info.py, unsupervised.py, testing.py and utils/init.py

* Update comments from pr

* Update sklearn/metrics/cluster/_unsupervised.py

Co-Authored-By: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* Update sklearn/utils/__init__.py

Co-Authored-By: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

* Update sklearn/utils/__init__.py

Co-Authored-By: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
* fix max_leaf_node max_depth interaction

* Added test

* comment

* what's new

* simpler solution

* moved and simplified test

* typo
* dropping python-3.5

* fix install guide

* fix setup.py

* fix circleci

* advanced_installation

* index.html

* readme

* pyparsing.py

* remove clean_warning_registry

* 1.13.1 and 1.19.1

* don't use 16.04

* 18.04 libatlas-dev -> libatlas-base-dev

* min pillow version for 3.6 is 4.2.1

* echo commands

* fix conflict, and 32 bit

* fix conflict for circle

* fix scikit-image version dep

* mark tests as xfail on 32bit py3.6

* move to 1.13.3 min numpy version, and simplify old code

* remove the rest of _object_dtype_isnan usages

* Revert "remove the rest of _object_dtype_isnan usages"

This reverts commit c6e867e.

* fix issues raised by jeremy

* minor fix

* minor fixes mentioned by Olivier
* DOC promote shallow copy in the docs

* a shorter version
…16244)

* add pyproject.toml

* do it in CI

* cln

* no build isolation in the doc

* bump dependencies

* no-build-isolation everywhere needed
* add Bunch to public docs and API

* address Thomas's suggestions

* use Thomas's description
…kit-learn#14675)

* Add context to tutorial example

*  try to keep line length under 80 characters

* make CI pass

* MNT Skips unsupervised_learning when scikit-image is not installed

* DOC make code comments as text

* solve issue plotting

* fix

* center images

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
@amueller
Copy link
Member

Hey @JohanWork. Are you still interested in following up on this? I think it's nearly ready to go!

@JohanWork
Copy link
Contributor Author

@amueller Hi sorry! So that @hugolmn approved but @rth do you still want the update if so, I will fixit tomorrow! Sorry for being slow in the response, will improve!

@rth
Copy link
Member

rth commented Mar 1, 2020

do you still want the update if so, I will fixit tomorrow!

Yes, please. Thanks!

@rth
Copy link
Member

rth commented Mar 1, 2020

Also flake8 linting is failing (probably too long line, see the logs).

VarIr and others added 3 commits March 2, 2020 00:41
* Added MLPRegressor and MLPClassifier examples

* shortened line due to test failure

* changed way to calculate score due to approximation error in tests

* removed comments, used predict instead of cross validation

* DOC Simplified make_regression and make_classification arguments

* DOC fix for linting

* DOC Update

* DOC Less precision

Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
* MNT Users pull request labeler

* MNT Fix

* labeler fix

* MNT Update

* MNT Uses single quotes

* BUG Uses number

* MN Removes Build / CI auto labeling

* MNT Updates to labeler v2.4.1

* ENH Updates version to include logs
@JohanWork
Copy link
Contributor Author

I made a misstake, trying to rebase.

@JohanWork JohanWork closed this Mar 2, 2020
@JohanWork JohanWork mentioned this pull request Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

n_jobs for DBSCAN