BUG ensure that parallel/sequential give the same permutation importances #15933

glemaitre · 2019-12-19T16:14:59Z

Reference Issues/PRs

closes #15898
closes #15810

What does this implement/fix? Explain your changes.

Make sure that fixing the random_state will give the same results in parallel and sequential processing.

Any other comments?

Edit: now also ensure thread-safety.

Edit: also fixes #15810 as a side effect.

ogrisel

LGTM, there is just one little thingy to check. Also please add a whats_new entry. For 0.22.1.

sklearn/inspection/tests/test_permutation_importance.py

ogrisel · 2019-12-19T17:40:20Z

What's new was already included. But there is a typo. I will fix it.

ogrisel

LGTM.

glemaitre · 2019-12-19T17:41:32Z

thanks

ogrisel · 2019-12-19T18:09:07Z

Actually, reading the code, I believe that this is not thread-safe and should break with the threading backend. Let me give it a try.

sklearn/inspection/_permutation_importance.py

ogrisel · 2019-12-19T18:51:45Z

sklearn/inspection/_permutation_importance.py


+    # Work on a copy of X to to ensure thread-safety in case of threading
+    # based parallelism:
+    X_permuted = X.copy()


Note for the reviewers: the fact that we always make a copy here also fixes the issue with read-only memmaps as reported in #15810.

Just by chance, you are not aware of a way to avoid to re-allocating the full dataframe, and instead make a view for most columns except for the one we want to change inplace?

For instance of one makes a slice of a dataframe, and then tries to modify to a column, pandas will raise a warning about a view being modified, but I'm not sure if it will actually change the original dataframe inplace or not in this case...

It depends on the internal block structure of the dataframe but this is considered private API and is likely to change in future versions of pandas. I would rather stay safe for now.

In the future, I can see this being possible when pandas switches to using a columnar data structure to hold its data as detailed in the pandas roadmap.

ogrisel · 2019-12-19T20:14:06Z

Thanks @thomasjpfan for the typofix.

sklearn/inspection/_permutation_importance.py

ogrisel

Some more comments:

sklearn/inspection/_permutation_importance.py

sklearn/inspection/tests/test_permutation_importance.py

ogrisel · 2019-12-20T13:12:22Z

Generate the indices as I proposed there: #15933 (comment)?

I had not seen this comment, it's worth a try.

glemaitre · 2019-12-20T13:13:59Z

I had not seen this comment, it's worth a try.

It will be as fast as conversion to numpy array with the memory overhead to create the array of indices which should be ok.

sklearn/inspection/_permutation_importance.py

ogrisel · 2019-12-20T15:12:59Z

Alright, I think this PR is ready now. I am still +1 obviously :)

ogrisel · 2019-12-20T15:17:34Z

The [ci skip] thingy is ignored by Azure Pipelines?

thomasjpfan · 2019-12-20T16:47:23Z

The [ci skip] thingy is ignored by Azure Pipelines?

That is strange it should work based on https://docs.microsoft.com/en-us/azure/devops/pipelines/build/triggers?view=azure-devops&tabs=yaml#skipping-ci-for-individual-commits

sklearn/inspection/_permutation_importance.py

thomasjpfan

LGTM!

ogrisel · 2019-12-20T18:15:05Z

Yeah! Thank you everybody!

glemaitre · 2019-12-21T07:12:06Z

sklearn/inspection/_permutation_importance.py

-        feature_score = scorer(estimator, X, y)
+        random_state.shuffle(shuffling_idx)
+        if hasattr(X_permuted, "iloc"):
+            col = X_permuted.iloc[shuffling_idx, col_idx]


Speaking with @jorisvandenbossche, this is the case where one should use values, to discard the index.

X_permuted.iloc[:, col_idx] = X_permuted[shuffle_idx, col_idx].values

But .values would also discard dtype information for ExtendedArray columns, no?

No it will keep the dtype info of the ExtendedArray:

In [6]: s = pd.Series(pd.Categorical(['a', 'b'])) In [7]: s Out[7]: 0 a 1 b dtype: category Categories (2, object): [a, b] In [8]: s.values Out[8]: [a, b] Categories (2, object): [a, b]

That's why you need to use .to_numpy() to explicitly try to convert it to a numpy dtype.

Alright, good point. Feel free to submit a new PR to simplify the code then :)

The .values is a bit of a historical mix. For categorical it indeed already returns the categorical array (so preserving the dtype), but eg for periods or datetime with timezone, it does not (now in practice I suppose sklearn will care less about those dtypes).

To solve this mix, in more recent versions of pandas there is .array to .values for this use case, but so that's only available for recent versions.

glemaitre · 2019-12-21T07:14:28Z

And regarding the very slow shuffling, investigating with @jorisvandenbossche, we found that numpy falls back on pure python implementation (with a for loop) to swap values for container with non-numpy dtype, which explain the super slow-down.

jorisvandenbossche · 2019-12-21T20:25:13Z

we found that numpy falls back on pure python implementation (with a for loop) to swap values for container with non-numpy dtype, which explain the super slow-down.

And if you want the "fast" shuffling of a categorical inplace, you can shuffle the integer codes:

In [4]: a = pd.Categorical(['a', 'b', 'c', 'a']*10000)  

In [5]: a     
Out[5]: 
[a, b, c, a, a, ..., a, a, b, c, a]
Length: 40000
Categories (3, object): [a, b, c]

In [6]: np.random.shuffle(a.codes) 

In [7]: a                 
Out[7]: 
[a, a, b, b, b, ..., b, b, b, a, c]
Length: 40000
Categories (3, object): [a, b, c]

ogrisel · 2019-12-22T10:39:03Z

Interesting but this is not a generic thing for any ExtendedArray column, right?

jorisvandenbossche · 2019-12-23T09:02:49Z

Indeed, that is very specific to Categorical

…nces (scikit-learn#15933)

@nkish

* DOC fixed default values in dbscan (#15753) * DOC fix incorrect branch reference in contributing doc (#15779) * DOC relabel Feature -> Efficiency in change log (#15770) * DOC fixed Birch default value (#15780) * STY Minior change on code padding in website theme (#15768) * DOC Fix yticklabels order in permutation importances example (#15799) * Fix yticklabels order in permutation importances example * STY Update wrapper width (#15793) * DOC Long sentence was hard to parse and ambiguous in _classification.py (#15769) * DOC Removed duplicate 'classes_' attribute in Naive Bayes classifiers (#15811) * BUG Fixes pandas dataframe bug with boolean dtypes (#15797) * BUG Returns only public estimators in all_estimators (#15380) * DOC improve doc for multiclass and types_of_target (#15333) * TST Increases tol for check_pca_float_dtype_preservation assertion (#15775) * update _alpha_grid class in _coordinate_descent.py (#15835) * FIX Explicit conversion of ndarray to object dtype. (#15832) * BLD Parallelize sphinx builds on circle ci (#15745) * DOC correct url for preprocessing (#15853) * MNT avoid generating too many cross links in examples (#15844) * DOC Correct wrong doc in precision_recall_fscore_support (#15833) * DOC add comment in check_pca_float_dtype_preservation (#15819) Documenting the changes in #15775 * DOC correct indents in docstring _split.py (#15843) * DOC fix docstring of KMeans based on sklearn guideline (#15754) * DOC fix docstring of AgglomerativeClustering based on sklearn guideline (#15764) * DOC fix docstring of AffinityPropagation based on sklearn guideline (#15777) * DOC fixed SpectralCoclustering and SpectralBiclustering docstrings following sklearn guideline (#15778) * DOC fix FeatureAgglomeration and MiniBatchKMeans docstring following sklearn guideline (#15809) * TST Specify random_state in test_cv_iterable_wrapper (#15829) * DOC Include LinearSV{C, R} in models that support sample_weights (#15871) * DOC correct some indents (#15875) * DOC Fix documentation of default values in tree classes (#15870) * DOC fix typo in docstring (#15887) * DOC FIX default value for xticks_rotation in plot_confusion_matrix (#15890) * Fix imports in pip3 ubuntu by suffixing affected files (#15891) * MNT Raise erorr when normalize is invalid in confusion_matrix (#15888) * [MRG] DOC Increases search results for API object results (#15574) * MNT Ignores warning in pyamg for deprecated scipy.random (#15914) * DOC Instructions to troubleshoot Windows path length limit (#15916) * DOC add versionadded directive to some estimators (#15849) * DOC clarify doc-string of roc_auc_score and add references (#15293) * MNT Adds skip lint to azure pipeline CI (#15904) * BLD Fixes bug when building with NO_MATHJAX=1 (#15892) * [MRG] BUG Checks to number of axes in passed in ax more generically (#15760) * EXA Minor fixes in plot_sparse_logistic_regression_20newsgroups.py (#15925) * BUG Do not shadow public functions with deprecated modules (#15846) * Import sklearn._distributor_init first (#15929) * DOC Fix typos, via a Levenshtein-style corrector (#15923) * DOC in canned comment, mention that PR title becomes commit me… (#15935) * DOC/EXA Correct spelling of "Classification" (#15938) * BUG fix pip3 ubuntu update by suffixing file (#15928) * [MRG] Ways to compute center_shift_total were different in "full" and "elkan" algorithms. (#15930) * TST Fixes integer test for train and test indices (#15941) * BUG ensure that parallel/sequential give the same permutation importances (#15933) * Formatting fixes in changelog (#15944) * MRG FIX: order of values of self.quantiles_ in QuantileTransformer (#15751) * [MRG] BUG Fixes constrast in plot_confusion_matrix (#15936) * BUG use zero_division argument in classification_report (#15879) * DOC change logreg solver in plot_logistic_path (#15927) * DOC fix whats new ordering (#15961) * COSMIT use np.iinfo to define the max int32 (#15960) * DOC Apply numpydoc validation to VotingRegressor methods (#15969) Co-authored-by: Tiffany R. Williams <Tiffany8@users.noreply.github.com> * DOC improve naive_bayes.py documentation (#15943) Co-authored-by: Jigna Panchal <40188288+jigna-panchal@users.noreply.github.com> * DOC Fix default values in Perceptron documentation (#15965) * DOC Improve default values in logistic documentation (#15966) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * DOC Improve documentation of default values for imputers (#15964) * EXA/MAINT Simplify code in manifold learning example (#15949) * DOC Improve default values in SGD documentation (#15967) * DOC Improve defaults in neural network documentation (#15968) * FIX use safe_sparse_dot for callable kernel in LabelSpreading (#15868) * BUG Adds attributes back to check_is_fitted (#15947) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * DOC update check_is_fitted what's new * DOC change python-devel to python3-devel for yum. (#15986) * DOC Correct the default value of values_format in plot_confusion_matrix (#15981) * [MRG] MNT Updates pypy to use 7.2.0 (#15954) * FIX Add missing 'values_format' param to disp.plot() in plot_confusion_matrix (#15937) * FIX support scalar values in fit_params in SearchCV (#15863) * support a scalar fit param * pep8 * TST add test for desired behavior * FIX introduce _check_fit_params to validate parameters * DOC update whats new * TST tests both grid-search and randomize-search * PEP8 * DOC revert unecessary change * TST add test for _check_fit_params * olivier comments * TST fixes * DOC whats new * DOC whats new * TST revert type of error * add olivier suggestions * address olivier comments * address thomas comments * PEP8 * comments olivier * TST fix test by passing X * avoid to call twice tocsr * add case column/row sparse in check_fit_param * provide optional indices * TST check content when indexing params * PEP8 * TST update tests to check identity * stupid fix * use a distribution in RandomizedSearchCV * MNT add lightgbm to one of the CI build * move to another build * do not install dependencies lightgbm * MNT comments on the CI setup * address some comments * Test fit_params compat without dependency on lightgbm Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Remove abstractmethod that silently brake downstream packages (#15996) * FIX restore BaseNB._check_X without abstractmethod decoration (#15997) * Update v0.22 changelog for 0.22.1 (#16002) - set the date - move entry for quantile transformer to the 0.22.1 section - fix alphabetical ordering of modules * STY Removes hidden scroll bar (#15999) * Flake8 fixes * Fix: remove left-over lines that should have been deleted during conflict resolution when rebasing * Fix missing imports * Update version * Fix test_check_is_fitted * Make test_sag_regressor_computed_correctly deterministic (#16003) Fix #15818. Co-authored-by: cgsavard <claire.savard@colorado.edu> Co-authored-by: Joel Nothman <joel.nothman@gmail.com> Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com> Co-authored-by: Matt Hall <matt@agilegeoscience.com> Co-authored-by: Kathryn Poole <kathryn.poole2@gmail.com> Co-authored-by: lucyleeow <jliu176@gmail.com> Co-authored-by: JJmistry <jayminm22@gmail.com> Co-authored-by: Juan Carlos Alfaro Jiménez <JuanCarlos.Alfaro@uclm.es> Co-authored-by: SylvainLan <sylvain.s.lannuzel@gmail.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Hanmin Qin <qinhanmin2005@sina.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Vachan D A <vachanda@users.noreply.github.com> Co-authored-by: Sambhav Kothari <sambhavs.email@gmail.com> Co-authored-by: wenliwyan <12013376+wenliwyan@users.noreply.github.com> Co-authored-by: shivamgargsya <shivam.gargshya@gmail.com> Co-authored-by: Reshama Shaikh <rs2715@stern.nyu.edu> Co-authored-by: Oliver Urs Lenz <oulenz@users.noreply.github.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Brian Wignall <BrianWignall@gmail.com> Co-authored-by: Ritchie Ng <ritchieng@u.nus.edu> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: inderjeet <43402782+inder128@users.noreply.github.com> Co-authored-by: scibol <scibol@users.noreply.github.com> Co-authored-by: Tirth Patel <tirthasheshpatel@gmail.com> Co-authored-by: Bibhash Chandra Mitra <bibhashm220896@gmail.com> Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org> Co-authored-by: Tiffany R. Williams <Tiffany8@users.noreply.github.com> Co-authored-by: Jigna Panchal <40188288+jigna-panchal@users.noreply.github.com> Co-authored-by: @nkish <19225359+ankishb@users.noreply.github.com> Co-authored-by: Pulkit Mehta <pulkit_mehta_work@yahoo.com> Co-authored-by: David Breuer <DavidBreuer@users.noreply.github.com> Co-authored-by: Niklas <niklas.sm+github@gmail.com> Co-authored-by: Windber <guolipengyeah@126.com> Co-authored-by: Stephen Blystone <29995339+blynotes@users.noreply.github.com> Co-authored-by: Brigitta Sipőcz <b.sipocz@gmail.com>

…nces (scikit-learn#15933)

@nkish

* DOC fixed default values in dbscan (scikit-learn#15753) * DOC fix incorrect branch reference in contributing doc (scikit-learn#15779) * DOC relabel Feature -> Efficiency in change log (scikit-learn#15770) * DOC fixed Birch default value (scikit-learn#15780) * STY Minior change on code padding in website theme (scikit-learn#15768) * DOC Fix yticklabels order in permutation importances example (scikit-learn#15799) * Fix yticklabels order in permutation importances example * STY Update wrapper width (scikit-learn#15793) * DOC Long sentence was hard to parse and ambiguous in _classification.py (scikit-learn#15769) * DOC Removed duplicate 'classes_' attribute in Naive Bayes classifiers (scikit-learn#15811) * BUG Fixes pandas dataframe bug with boolean dtypes (scikit-learn#15797) * BUG Returns only public estimators in all_estimators (scikit-learn#15380) * DOC improve doc for multiclass and types_of_target (scikit-learn#15333) * TST Increases tol for check_pca_float_dtype_preservation assertion (scikit-learn#15775) * update _alpha_grid class in _coordinate_descent.py (scikit-learn#15835) * FIX Explicit conversion of ndarray to object dtype. (scikit-learn#15832) * BLD Parallelize sphinx builds on circle ci (scikit-learn#15745) * DOC correct url for preprocessing (scikit-learn#15853) * MNT avoid generating too many cross links in examples (scikit-learn#15844) * DOC Correct wrong doc in precision_recall_fscore_support (scikit-learn#15833) * DOC add comment in check_pca_float_dtype_preservation (scikit-learn#15819) Documenting the changes in scikit-learn#15775 * DOC correct indents in docstring _split.py (scikit-learn#15843) * DOC fix docstring of KMeans based on sklearn guideline (scikit-learn#15754) * DOC fix docstring of AgglomerativeClustering based on sklearn guideline (scikit-learn#15764) * DOC fix docstring of AffinityPropagation based on sklearn guideline (scikit-learn#15777) * DOC fixed SpectralCoclustering and SpectralBiclustering docstrings following sklearn guideline (scikit-learn#15778) * DOC fix FeatureAgglomeration and MiniBatchKMeans docstring following sklearn guideline (scikit-learn#15809) * TST Specify random_state in test_cv_iterable_wrapper (scikit-learn#15829) * DOC Include LinearSV{C, R} in models that support sample_weights (scikit-learn#15871) * DOC correct some indents (scikit-learn#15875) * DOC Fix documentation of default values in tree classes (scikit-learn#15870) * DOC fix typo in docstring (scikit-learn#15887) * DOC FIX default value for xticks_rotation in plot_confusion_matrix (scikit-learn#15890) * Fix imports in pip3 ubuntu by suffixing affected files (scikit-learn#15891) * MNT Raise erorr when normalize is invalid in confusion_matrix (scikit-learn#15888) * [MRG] DOC Increases search results for API object results (scikit-learn#15574) * MNT Ignores warning in pyamg for deprecated scipy.random (scikit-learn#15914) * DOC Instructions to troubleshoot Windows path length limit (scikit-learn#15916) * DOC add versionadded directive to some estimators (scikit-learn#15849) * DOC clarify doc-string of roc_auc_score and add references (scikit-learn#15293) * MNT Adds skip lint to azure pipeline CI (scikit-learn#15904) * BLD Fixes bug when building with NO_MATHJAX=1 (scikit-learn#15892) * [MRG] BUG Checks to number of axes in passed in ax more generically (scikit-learn#15760) * EXA Minor fixes in plot_sparse_logistic_regression_20newsgroups.py (scikit-learn#15925) * BUG Do not shadow public functions with deprecated modules (scikit-learn#15846) * Import sklearn._distributor_init first (scikit-learn#15929) * DOC Fix typos, via a Levenshtein-style corrector (scikit-learn#15923) * DOC in canned comment, mention that PR title becomes commit me… (scikit-learn#15935) * DOC/EXA Correct spelling of "Classification" (scikit-learn#15938) * BUG fix pip3 ubuntu update by suffixing file (scikit-learn#15928) * [MRG] Ways to compute center_shift_total were different in "full" and "elkan" algorithms. (scikit-learn#15930) * TST Fixes integer test for train and test indices (scikit-learn#15941) * BUG ensure that parallel/sequential give the same permutation importances (scikit-learn#15933) * Formatting fixes in changelog (scikit-learn#15944) * MRG FIX: order of values of self.quantiles_ in QuantileTransformer (scikit-learn#15751) * [MRG] BUG Fixes constrast in plot_confusion_matrix (scikit-learn#15936) * BUG use zero_division argument in classification_report (scikit-learn#15879) * DOC change logreg solver in plot_logistic_path (scikit-learn#15927) * DOC fix whats new ordering (scikit-learn#15961) * COSMIT use np.iinfo to define the max int32 (scikit-learn#15960) * DOC Apply numpydoc validation to VotingRegressor methods (scikit-learn#15969) Co-authored-by: Tiffany R. Williams <Tiffany8@users.noreply.github.com> * DOC improve naive_bayes.py documentation (scikit-learn#15943) Co-authored-by: Jigna Panchal <40188288+jigna-panchal@users.noreply.github.com> * DOC Fix default values in Perceptron documentation (scikit-learn#15965) * DOC Improve default values in logistic documentation (scikit-learn#15966) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * DOC Improve documentation of default values for imputers (scikit-learn#15964) * EXA/MAINT Simplify code in manifold learning example (scikit-learn#15949) * DOC Improve default values in SGD documentation (scikit-learn#15967) * DOC Improve defaults in neural network documentation (scikit-learn#15968) * FIX use safe_sparse_dot for callable kernel in LabelSpreading (scikit-learn#15868) * BUG Adds attributes back to check_is_fitted (scikit-learn#15947) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * DOC update check_is_fitted what's new * DOC change python-devel to python3-devel for yum. (scikit-learn#15986) * DOC Correct the default value of values_format in plot_confusion_matrix (scikit-learn#15981) * [MRG] MNT Updates pypy to use 7.2.0 (scikit-learn#15954) * FIX Add missing 'values_format' param to disp.plot() in plot_confusion_matrix (scikit-learn#15937) * FIX support scalar values in fit_params in SearchCV (scikit-learn#15863) * support a scalar fit param * pep8 * TST add test for desired behavior * FIX introduce _check_fit_params to validate parameters * DOC update whats new * TST tests both grid-search and randomize-search * PEP8 * DOC revert unecessary change * TST add test for _check_fit_params * olivier comments * TST fixes * DOC whats new * DOC whats new * TST revert type of error * add olivier suggestions * address olivier comments * address thomas comments * PEP8 * comments olivier * TST fix test by passing X * avoid to call twice tocsr * add case column/row sparse in check_fit_param * provide optional indices * TST check content when indexing params * PEP8 * TST update tests to check identity * stupid fix * use a distribution in RandomizedSearchCV * MNT add lightgbm to one of the CI build * move to another build * do not install dependencies lightgbm * MNT comments on the CI setup * address some comments * Test fit_params compat without dependency on lightgbm Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Remove abstractmethod that silently brake downstream packages (scikit-learn#15996) * FIX restore BaseNB._check_X without abstractmethod decoration (scikit-learn#15997) * Update v0.22 changelog for 0.22.1 (scikit-learn#16002) - set the date - move entry for quantile transformer to the 0.22.1 section - fix alphabetical ordering of modules * STY Removes hidden scroll bar (scikit-learn#15999) * Flake8 fixes * Fix: remove left-over lines that should have been deleted during conflict resolution when rebasing * Fix missing imports * Update version * Fix test_check_is_fitted * Make test_sag_regressor_computed_correctly deterministic (scikit-learn#16003) Fix scikit-learn#15818. Co-authored-by: cgsavard <claire.savard@colorado.edu> Co-authored-by: Joel Nothman <joel.nothman@gmail.com> Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com> Co-authored-by: Matt Hall <matt@agilegeoscience.com> Co-authored-by: Kathryn Poole <kathryn.poole2@gmail.com> Co-authored-by: lucyleeow <jliu176@gmail.com> Co-authored-by: JJmistry <jayminm22@gmail.com> Co-authored-by: Juan Carlos Alfaro Jiménez <JuanCarlos.Alfaro@uclm.es> Co-authored-by: SylvainLan <sylvain.s.lannuzel@gmail.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Hanmin Qin <qinhanmin2005@sina.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Vachan D A <vachanda@users.noreply.github.com> Co-authored-by: Sambhav Kothari <sambhavs.email@gmail.com> Co-authored-by: wenliwyan <12013376+wenliwyan@users.noreply.github.com> Co-authored-by: shivamgargsya <shivam.gargshya@gmail.com> Co-authored-by: Reshama Shaikh <rs2715@stern.nyu.edu> Co-authored-by: Oliver Urs Lenz <oulenz@users.noreply.github.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Brian Wignall <BrianWignall@gmail.com> Co-authored-by: Ritchie Ng <ritchieng@u.nus.edu> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: inderjeet <43402782+inder128@users.noreply.github.com> Co-authored-by: scibol <scibol@users.noreply.github.com> Co-authored-by: Tirth Patel <tirthasheshpatel@gmail.com> Co-authored-by: Bibhash Chandra Mitra <bibhashm220896@gmail.com> Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org> Co-authored-by: Tiffany R. Williams <Tiffany8@users.noreply.github.com> Co-authored-by: Jigna Panchal <40188288+jigna-panchal@users.noreply.github.com> Co-authored-by: @nkish <19225359+ankishb@users.noreply.github.com> Co-authored-by: Pulkit Mehta <pulkit_mehta_work@yahoo.com> Co-authored-by: David Breuer <DavidBreuer@users.noreply.github.com> Co-authored-by: Niklas <niklas.sm+github@gmail.com> Co-authored-by: Windber <guolipengyeah@126.com> Co-authored-by: Stephen Blystone <29995339+blynotes@users.noreply.github.com> Co-authored-by: Brigitta Sipőcz <b.sipocz@gmail.com>

glemaitre added 3 commits December 19, 2019 17:13

BUG ensure that parallel/sequential provide the same results

a47ebe2

iter

ab95fd5

whats new

d58272d

glemaitre changed the title ~~BUG ensure that parallel/sequential provide the same results~~ BUG ensure that parallel/sequential give the same permutation importances Dec 19, 2019

glemaitre added the Blocker label Dec 19, 2019

glemaitre added this to the 0.22.1 milestone Dec 19, 2019

glemaitre mentioned this pull request Dec 19, 2019

[MRG] BUG avoid memmaping large dataframe in permutation_importance #15898

Closed

ogrisel reviewed Dec 19, 2019

View reviewed changes

sklearn/inspection/tests/test_permutation_importance.py Show resolved Hide resolved

Check that the test is not trivial

7211a53

Typo in PR number

70e1ef4

ogrisel approved these changes Dec 19, 2019

View reviewed changes

ogrisel added 2 commits December 19, 2019 19:26

Fix thread-safety issue

a6909ca

Add non-regression test to check that issue 15810 is fixed.

a24b9d5

ogrisel reviewed Dec 19, 2019

View reviewed changes

sklearn/inspection/_permutation_importance.py Show resolved Hide resolved

Leaner test

7ad0b26

ogrisel reviewed Dec 19, 2019

View reviewed changes

ogrisel and others added 2 commits December 19, 2019 20:12

Support joblib 0.11

fb69870

BUG Syntax error

f80dc69

ogrisel reviewed Dec 19, 2019

View reviewed changes

sklearn/inspection/_permutation_importance.py Outdated Show resolved Hide resolved

glemaitre added 5 commits December 19, 2019 21:40

MAX_RAND_SEED should be int32

775e986

cosmetic

0631299

inplace operation

1a21a98

cosmit

023eca2

cosmit

e213236

ogrisel reviewed Dec 20, 2019

View reviewed changes

sklearn/inspection/_permutation_importance.py Outdated Show resolved Hide resolved

sklearn/inspection/tests/test_permutation_importance.py Show resolved Hide resolved

Better comment explaining the need for X.copy()

be8f1c1

reshuffling by position

bdaffb5

glemaitre commented Dec 20, 2019

View reviewed changes

sklearn/inspection/_permutation_importance.py Outdated Show resolved Hide resolved

glemaitre and others added 2 commits December 20, 2019 15:10

remove unused import

42e8cb5

[ci skip] typos & better comment

1834fca

ogrisel requested review from rth and thomasjpfan December 20, 2019 15:13

Merge branch 'master' into is/random_state_parallel

74a1c54

thomasjpfan reviewed Dec 20, 2019

View reviewed changes

sklearn/inspection/_permutation_importance.py Outdated Show resolved Hide resolved

ogrisel added 2 commits December 20, 2019 18:41

TST: check dataframe with a weird index

5cf37f6

FIX make column permutation robust to weird indices

51f7467

thomasjpfan reviewed Dec 20, 2019

View reviewed changes

sklearn/inspection/_permutation_importance.py Show resolved Hide resolved

thomasjpfan approved these changes Dec 20, 2019

View reviewed changes

ogrisel merged commit a68ba97 into scikit-learn:master Dec 20, 2019

glemaitre commented Dec 21, 2019

View reviewed changes

ogrisel pushed a commit to ogrisel/scikit-learn that referenced this pull request Dec 31, 2019

BUG ensure that parallel/sequential give the same permutation importa…

354bcec

…nces (scikit-learn#15933)

ogrisel pushed a commit to ogrisel/scikit-learn that referenced this pull request Jan 2, 2020

BUG ensure that parallel/sequential give the same permutation importa…

126b0a4

…nces (scikit-learn#15933)

glemaitre mentioned this pull request Jan 14, 2020

[MRG] Fast PDP for DecisionTreeRegressor and RandomForestRegressor #15864

Merged

panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020

BUG ensure that parallel/sequential give the same permutation importa…

b6d097c

…nces (scikit-learn#15933)

Uh oh!

BUG ensure that parallel/sequential give the same permutation importances #15933

BUG ensure that parallel/sequential give the same permutation importances #15933

Uh oh!

Conversation

glemaitre commented Dec 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel commented Dec 19, 2019

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Dec 19, 2019

Uh oh!

ogrisel commented Dec 19, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Dec 19, 2019

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Dec 20, 2019

Uh oh!

glemaitre commented Dec 20, 2019

Uh oh!

Uh oh!

ogrisel commented Dec 20, 2019

Uh oh!

ogrisel commented Dec 20, 2019

Uh oh!

thomasjpfan commented Dec 20, 2019

Uh oh!

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Dec 20, 2019

Uh oh!

glemaitre Dec 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Dec 21, 2019

Uh oh!

jorisvandenbossche commented Dec 21, 2019

Uh oh!

ogrisel commented Dec 22, 2019

Uh oh!

jorisvandenbossche commented Dec 23, 2019

Uh oh!

Uh oh!

glemaitre commented Dec 19, 2019 •

edited

Loading

glemaitre Dec 21, 2019 •

edited

Loading