FIX Fixes pandas extension arrays in check_array #25813

thomasjpfan · 2023-03-10T17:16:55Z

Reference Issues/PRs

Closes #25637
Follow up to #25638

What does this implement/fix? Explain your changes.

The duck typing with iloc does works with pandas Series, but not extension arrays directly:

import pandas as pd

y_true = pd.Series([1, 0, 0, 1, 0, 1, 1, 0, 1], dtype="Int64")
print(type(y_true))
# <class 'pandas.core.series.Series'>
assert hasattr(y_true, "iloc")

y_unique = y_true.unique()
print(type(y_unique))
# <class 'pandas.core.arrays.integer.IntegerArray'>
assert not hasattr(y_unique, "iloc")

This PR uses is_extension_array_dtype directly to detect the second class so it goes down the same code path as pd.Series.

lorentzenchr · 2023-03-11T17:26:33Z

If we start using the pattern

try:
    from pandas import XXX

anyway, should we also use it to identify an array-like as pandas instead of relying on hasattr?

Has anyone tested what happens with a pyarrow table or a polars dataframe? I guess, I better open a separate issue for it.

lorentzenchr

LGTM

lorentzenchr · 2023-03-11T17:27:04Z

doc/whats_new/v1.3.rst

@@ -350,6 +353,9 @@ Changelog
  :pr:`25733` by :user:`Brigitta Sipőcz <bsipocz>` and
  :user:`Jérémie du Boisberranger <jeremiedbb>`.

+- |Fix| :func:`utils.validation.check_array` now suports extension arrays with object


Suggested change

- |Fix| :func:`utils.validation.check_array` now suports extension arrays with object

- |Fix| :func:`utils.validation.check_array` now supports extension arrays with object

What are extension arrays? Maybe a link to pandas docs or just adding „pandas“ in the sentence.

lorentzenchr · 2023-03-11T17:29:33Z

doc/whats_new/v1.3.rst

@@ -350,6 +353,9 @@ Changelog
  :pr:`25733` by :user:`Brigitta Sipőcz <bsipocz>` and
  :user:`Jérémie du Boisberranger <jeremiedbb>`.

+- |Fix| :func:`utils.validation.check_array` now suports extension arrays with object
+  dtypes by return an ndarray with object dtype. :pr:`xxxxx` by `Thomas Fan`_.


Suggested change

dtypes by return an ndarray with object dtype. :pr:`xxxxx` by `Thomas Fan`_.

dtypes. In that case, it returns an ndarray with object dtype. :pr:`xxxxx` by `Thomas Fan`_.

lorentzenchr · 2023-03-11T17:30:32Z

sklearn/preprocessing/tests/test_label.py


    y_true = pd.Series([1, 0, 0, 1, 0, 1, 1, 0, 1], dtype=dtype)
+    if unique_first:
+        y_true = y_true.unique()


Maybe add a comment that explains what the returned object / type of unique is.

glemaitre · 2023-03-13T15:28:23Z

Has anyone tested what happens with a pyarrow table or a polars dataframe?

This is really a relevant question because pandas 2.0 will come with Arrow dtype so we will encounter this road quite soon.

glemaitre · 2023-03-13T15:43:49Z

I am wondering if we should be considering refactoring check_array and splitting the conversion from the statistic check. Even in the conversion, we could split into library-type support. I don't know if it would be enough to make the code more readable.

thomasjpfan · 2023-03-13T16:22:02Z

Has anyone tested what happens with a pyarrow table or a polars dataframe?

In terms of check_array, polars dataframes return smoething but give the wrong ndarray:

import polars as pl
dataframe = pl.DataFrame({"integer": [1, 2, None], 
                          "float":[4.0, 5.0, 6.0]})

# This should be transposed.
check_array(dataframe, force_all_finite="allow-nan")
# array([[ 1.,  2., nan],
#        [ 4.,  5.,  6.]])

# This is because `np.asarray` returns the above:
np.asarray(dataframe)
# array([[ 1.,  2., nan],
#       [ 4.,  5.,  6.]])

The same story with PyArrow tables:

import pyarrow as pa
ints = pa.array([2, 4, 5, 100])
floats = pa.array([1.0, 2.0, 3.0, None])
table = pa.Table.from_arrays([ints, floats], names=["ints", "floats"])

# Again, this should be transposed.
check_array(table, force_all_finite="allow-nan")
# array([[  2.,   4.,   5., 100.],
#        [  1.,   2.,   3.,  nan]])

I don't know if it would be enough to make the code more readable.

I think it's possible to refactor check_array into something more readable and also support other DataFrames. In general, I'll want to support the DataFrame Protocol. The DataFrame Protocol is already implemented in many dataframes libraries such as Polars & pandas.

This is really a relevant question because pandas 2.0 will come with Arrow dtype so we will encounter this road quite soon.

The PyArrow types are extension arrays, so they "just work". For example, on main:

import pandas as pd
from sklearn.utils.validation import check_array


X_df = pd.DataFrame({
    "a": pd.Series([-1.5, 0.2, None], dtype="float32[pyarrow]"),
    "b": pd.Series([1, None, 2], dtype="int32[pyarrow]"),
    "c": pd.Series([2, 3, 4], dtype="int64[pyarrow]")
}) 

check_array(X_df, force_all_finite="allow-nan")
# array([[-1.5,  1. ,  2. ],
#       [ 0.2,  nan,  3. ],
#       [ nan,  2. ,  4. ]])

glemaitre · 2023-03-13T17:08:23Z

The PyArrow types are extension arrays, so they "just work"

Nice. I did not yet look at those.

betatim · 2023-03-21T08:48:13Z

If we start using the pattern
try:
    from pandas import XXX
anyway, should we also use it to identify an array-like as pandas instead of relying on hasattr?

I think we should. There are a few "magic" if statements in the code base that deal with recognising inputs of a particular type without using/importing the library that provides that type. I'm not sure what the reasoning was to go down that road. To me having one central is_pandas(X) in utils or so would already be a good step forward. And it seems like a good idea to use import pandas as pd in that function, because if X is indeed a pandas type, we will be able to import pandas.

TL;DR: yes please.

jeremiedbb

The fix looks acceptable for now. Though, I agree with the discussions and a refactoring of check_array to disentangle pandas (and maybe more general) dataframes and numpy arrays would make things a lot easier to maintain/enhance.

doc/whats_new/v1.3.rst

sklearn/preprocessing/tests/test_label.py

* MAINT Clean deprecated losses in (hist) gradient boosting for 1.3 (scikit-learn#25834) * MAINT Clean deprecation of normalize in calibration_curve for 1.3 (scikit-learn#25833) * BLD Clean command removes generated from cython templates (scikit-learn#25839) * PERF Implement `PairwiseDistancesReduction` backend for `KNeighbors.predict_proba` (scikit-learn#24076) Signed-off-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * MAINT Added Parameter Validation for datasets.make_circles (scikit-learn#25848) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MNT use a single job by default with sphinx build (scikit-learn#25836) * BLD Generate warning automatically for templated cython files (scikit-learn#25842) * MAINT parameter validation for sklearn.datasets.fetch_lfw_people (scikit-learn#25820) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameters validation for metrics.fbeta_score (scikit-learn#25841) * TST add global_random_seed fixture to sklearn/covariance/tests/test_robust_covariance.py (scikit-learn#25821) * MAINT Parameter validation for linear_model.orthogonal_mp (scikit-learn#25817) * TST activate common tests for TSNE (scikit-learn#25374) * CI Update lock files (scikit-learn#25849) * MAINT Added Parameter Validation for metrics.mean_gamma_deviance (scikit-learn#25853) * MAINT Parameters validation for feature_selection.mutual_info_regression (scikit-learn#25850) * MAINT parameter validation metrics.class_likelihood_ratios (scikit-learn#25863) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Ensure disjoint interval constraints (scikit-learn#25797) * MAINT Parameters validation for utils.gen_batches (scikit-learn#25864) * TST use global_random_seed in test_dict_vectorizer.py (scikit-learn#24533) * TST use global_random_seed in test_pls.py (scikit-learn#24526) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * TST use global_random_seed in test_gpc.py (scikit-learn#24600) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * DOC Fix overlapping plot axis in bench_sample_without_replacement.py (scikit-learn#25870) * MAINT Use contiguous memoryviews in _random.pyx (scikit-learn#25871) * MAINT parameter validation sklearn.datasets.fetch_lfw_pair (scikit-learn#25857) * MAINT Parameters validation for metrics.classification_report (scikit-learn#25868) * Empty commit * DOC fix docstring dtype parameter in OrdinalEncoder (scikit-learn#25877) * MAINT Clean up depreacted "log" loss of SGDClassifier for 1.3 (scikit-learn#25865) * ENH Adds TargetEncoder (scikit-learn#25334) Co-authored-by: Andreas Mueller <t3kcit@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jovan Stojanovic <62058944+jovan-stojanovic@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * CI make it possible to cancel running Azure jobs (scikit-learn#25876) * MAINT Clean-up deprecated if_delegate_has_method for 1.3 (scikit-learn#25879) * MAINT Parameter validation for tree.export_text (scikit-learn#25867) * DOC impact of `tol` for solvers in RidgeClassifier (scikit-learn#25530) * MAINT Parameters validation for metrics.hinge_loss (scikit-learn#25880) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for metrics.ndcg_score (scikit-learn#25885) * ENH KMeans initialization account for sample weights (scikit-learn#25752) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * TST use global_random_seed in sklearn/tests/test_dummy.py (scikit-learn#25884) * DOC improve calibration user guide (scikit-learn#25687) * ENH Support for sparse matrices added to `sklearn.metrics.silhouette_samples` (scikit-learn#24677) Co-authored-by: Sahil Gupta <sahil@Sahils-MBP.lan> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT validate_params for plot_tree (scikit-learn#25882) Co-authored-by: Itay <itayvegh@gmail.com> * MAINT add missing space in error message in SVM (scikit-learn#25913) * FIX Adds requires_y tag to TargetEncoder (scikit-learn#25917) * MAINT Consistent cython types continued (scikit-learn#25810) * TST Speed-up common tests of DictionaryLearning (scikit-learn#25892) * TST Speed-up test_dbscan_optics_parity (scikit-learn#25893) * ENH add np.nan option for zero_division in precision/recall/f-score (scikit-learn#25531) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT Parameters validation for datasets.make_low_rank_matrix (scikit-learn#25901) * MAINT Parameter validation for metrics.cluster.adjusted_mutual_info_score (scikit-learn#25898) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * TST Speed-up test_partial_dependence.test_output_shape (scikit-learn#25895) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * MAINT Parameters validation for datasets.make_regression (scikit-learn#25899) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for metrics.mean_squared_log_error (scikit-learn#25924) * TST Use global_random_seed in tests/test_naive_bayes.py (scikit-learn#25890) * TST add global_random_seed fixture to sklearn/datasets/tests/test_covtype.py (scikit-learn#25904) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameters validation for datasets.make_multilabel_classification (scikit-learn#25920) * Fixed feature mapping typo (scikit-learn#25934) * MAINT switch to newer codecov uploader (scikit-learn#25919) Co-authored-by: Loïc Estève <loic.esteve@ymail.com> * TST Speed-up test suite when using pytest-xdist (scikit-learn#25918) * DOC update license year to 2023 (scikit-learn#25936) * FIX Remove spurious feature names warning in IsolationForest (scikit-learn#25931) * TST fix unstable test_newrand_set_seed (scikit-learn#25940) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Clean-up deprecated max_features="auto" in trees/forests/gb (scikit-learn#25941) * MAINT LogisticRegression informative error msg when penaly=elasticnet and l1_ratio is None (scikit-learn#25925) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Clean-up remaining SGDClassifier(loss="log") (scikit-learn#25938) * FIX Fixes pandas extension arrays in check_array (scikit-learn#25813) * FIX Fixes pandas extension arrays with objects in check_array (scikit-learn#25814) * CI Disable pytest-xdist in pylatest_pip_openblas_pandas build (scikit-learn#25943) * MAINT remove deprecated call to resources.content (scikit-learn#25951) * DOC note on calibration impact on ranking (scikit-learn#25900) * Remove loguniform fix, use scipy.stats instead (scikit-learn#24665) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * MAINT Fix broken links in cluster.dbscan module (scikit-learn#25958) * DOC Fix lars Xy shape (scikit-learn#25952) * ENH Add drop_intermediate parameter to metrics.precision_recall_curve (scikit-learn#24668) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * FIX improve error message when computing NDCG with a single document (scikit-learn#25672) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT introduce _get_response_values and _check_response_methods (scikit-learn#23073) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Extend message for large sparse matrices support (scikit-learn#25961) Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com> * MAINT Parameters validation for datasets.make_gaussian_quantiles (scikit-learn#25959) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.d2_tweedie_score (scikit-learn#25975) * MAINT Parameters validation for datasets.make_hastie_10_2 (scikit-learn#25967) * MAINT Parameters validation for preprocessing.minmax_scale (scikit-learn#25962) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for datasets.make_checkerboard (scikit-learn#25955) * MAINT Parameters validation for datasets.make_biclusters (scikit-learn#25945) * MAINT Parameters validation for datasets.make_moons (scikit-learn#25971) * DOC replace deviance by loss in docstring of GradientBoosting (scikit-learn#25968) * MAINT Fix broken link in feature_selection/_univariate_selection.py (scikit-learn#25984) * DOC Update model_persistence.rst to fix skops example (scikit-learn#25993) Co-authored-by: adrinjalali <adrin.jalali@gmail.com> * DOC Specified meaning for max_patches=None in extract_patches_2d (scikit-learn#25996) * DOC document that last step is never cached in pipeline (scikit-learn#25995) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * FIX SequentialFeatureSelector throws IndexError when cv is a generator (scikit-learn#25973) * ENH Adds infrequent categories support to OrdinalEncoder (scikit-learn#25677) Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Andreas Mueller <t3kcit@gmail.com> * MAINT make plot_digits_denoising deterministic by fixing random state (scikit-learn#26004) * DOC improve example of PatchExtractor (scikit-learn#26002) * MAINT Parameters validation for datasets.make_friedman2 (scikit-learn#25986) * MAINT Parameters validation for datasets.make_friedman3 (scikit-learn#25989) * MAINT Parameters validation for datasets.make_sparse_uncorrelated (scikit-learn#26001) * MAINT Parameters validation for datasets.make_spd_matrix (scikit-learn#26003) * MAINT Parameters validation for datasets.make_sparse_spd_matrix (scikit-learn#26009) * DOC Added the meanings of default=None for PatchExtractor parameters (scikit-learn#26005) * MAINT remove unecessary check covered by parameter validation framework (scikit-learn#26014) * MAINT Consistent cython types from _typedefs (scikit-learn#25942) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> * MAINT Parameters validation for datasets.make_swiss_roll (scikit-learn#26020) * MAINT Parameters validation for datasets.make_s_curve (scikit-learn#26022) * MAINT Parameters validation for datasets.make_blobs (scikit-learn#25983) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * DOC fix SplineTransformer include_bias docstring (scikit-learn#26018) * ENH RocCurveDisplay add option to plot chance level (scikit-learn#25987) * DOC show from_estimator and from_predictions for Displays (scikit-learn#25994) * EXA Fix rst in plot_partial_dependence (scikit-learn#26028) * CI Adds coverage to docker jobs on Azure (scikit-learn#26027) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * API Replace `n_iter` in `Bayesian Ridge` and `ARDRegression` (scikit-learn#25697) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * CLN Make _NumPyAPIWrapper naming consistent to _ArrayAPIWrapper (scikit-learn#26039) * CI disable coverage on Windows to keep CI times reasonable (scikit-learn#26052) * DOC Use Scientific Python Plausible instance for analytics (scikit-learn#25547) * MAINT Parameters validation for sklearn.preprocessing.scale (scikit-learn#26036) * MAINT Parameters validation for sklearn.metrics.pairwise.haversine_distances (scikit-learn#26047) * MAINT Parameters validation for sklearn.metrics.pairwise.laplacian_kernel (scikit-learn#26048) * MAINT Parameters validation for sklearn.metrics.pairwise.linear_kernel (scikit-learn#26049) * MAINT Parameters validation for sklearn.metrics.silhouette_samples (scikit-learn#26053) * MAINT Parameters validation for sklearn.preprocessing.add_dummy_feature (scikit-learn#26058) * Added Parameter Validation for metrics.cluster.normalized_mutual_info_score() (scikit-learn#26060) * DOC Typos in HistGradientBoosting documentation (scikit-learn#26057) * TST add global_random_seed fixture to sklearn/datasets/tests/test_rcv1.py (scikit-learn#26043) * MAINT Parameters validation for sklearn.metrics.pairwise.cosine_similarity (scikit-learn#26006) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * ENH Adds isdtype to Array API wrapper (scikit-learn#26029) * MAINT Parameters validation for sklearn.metrics.silhouette_score (scikit-learn#26054) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * FIX fix spelling mistake in _NumPyAPIWrapper (scikit-learn#26064) * CI ignore more non-library Python files in codecov (scikit-learn#26059) * MAINT Parameters validation for sklearn.metrics.pairwise.cosine_distances (scikit-learn#26046) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Introduce BinaryClassifierCurveDisplayMixin (scikit-learn#25969) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * ENH Forces shape to be tuple when using Array API's reshape (scikit-learn#26030) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Tim Head <betatim@gmail.com> * MAINT Parameters validation for sklearn.metrics.pairwise.paired_euclidean_distances (scikit-learn#26073) * MAINT Parameters validation for sklearn.metrics.pairwise.paired_manhattan_distances (scikit-learn#26074) * MAINT Parameters validation for sklearn.metrics.pairwise.paired_cosine_distances (scikit-learn#26075) * MAINT Parameters validation for sklearn.preprocessing.binarize (scikit-learn#26076) * MAINT Parameters validation for metrics.explained_variance_score (scikit-learn#26079) * DOC use correct template name for displays (scikit-learn#26081) * MAINT Parameters validation for sklearn.preprocessing.maxabs_scale (scikit-learn#26077) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.preprocessing.label_binarize (scikit-learn#26078) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT parameter validation for d2_absolute_error_score (scikit-learn#26066) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameter validation for roc_auc_score (scikit-learn#26007) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameters validation for sklearn.preprocessing.normalize (scikit-learn#26069) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameter validation for metrics.cluster.fowlkes_mallows_score (scikit-learn#26080) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameters validation for compose.make_column_transformer (scikit-learn#25897) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameters validation for sklearn.metrics.pairwise.polynomial_kernel (scikit-learn#26070) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.pairwise.rbf_kernel (scikit-learn#26071) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.pairwise.sigmoid_kernel (scikit-learn#26072) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Param validation: constraint for numeric missing values (scikit-learn#26085) * FIX Adds support for negative values in categorical features in gradient boosting (scikit-learn#25629) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Tim Head <betatim@gmail.com> * MAINT Fix C warning in Cython module splitting.pyx (scikit-learn#26051) * MNT Updates _isotonic.pyx to use memoryviews instead of `cnp.ndarray` (scikit-learn#26068) * FIX Fixes memory regression for inspecting extension arrays (scikit-learn#26106) * PERF set openmp to use only physical cores by default (scikit-learn#26082) * MNT Update black to 23.3.0 (scikit-learn#26110) * MNT Adds black commit to git-blame-ignore-revs (scikit-learn#26111) * MAINT Parameters validation for sklearn.metrics.pair_confusion_matrix (scikit-learn#26107) * MAINT Parameters validation for sklearn.metrics.mean_poisson_deviance (scikit-learn#26104) * DOC Use notebook style in plot_lof_outlier_detection.py (scikit-learn#26017) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT utils._fast_dict uses types from utils._typedefs (scikit-learn#26025) * DOC remove sparse-matrix for `y` in ElasticNet (scikit-learn#26127) * ENH add exponential loss (scikit-learn#25965) * MAINT Parameters validation for sklearn.preprocessing.robust_scale (scikit-learn#26086) * MAINT Parameters validation for sklearn.datasets.fetch_rcv1 (scikit-learn#26126) * MAINT Parameters validation for sklearn.metrics.adjusted_rand_score (scikit-learn#26134) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.calinski_harabasz_score (scikit-learn#26135) * MAINT Parameters validation for sklearn.metrics.davies_bouldin_score (scikit-learn#26136) * MAINT: remove `from numpy.math cimport` statements (scikit-learn#26143) * MAINT Parameters validation for sklearn.inspection.permutation_importance (scikit-learn#26145) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.cluster.homogeneity_completeness_v_measure (scikit-learn#26137) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.rand_score (scikit-learn#26138) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * DOC update comment in metrics/tests/test_classification.py (scikit-learn#26150) * CI small cleanup of Cirrus CI test script (scikit-learn#26168) * MAINT remove deprecated is_categorical_dtype (scikit-learn#26156) * DOC Add skforecast to related projects page (scikit-learn#26133) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * FIX Keeps namedtuple's class when transform returns a tuple (scikit-learn#26121) * DOC corrected letter case for better readability in sklearn/metrics/_classification.py / (scikit-learn#26169) * MAINT Parameters validation for sklearn.preprocessing.power_transform (scikit-learn#26142) * FIX `roc_auc_score` now uses `y_prob` instead of `y_pred` (scikit-learn#26155) * MAINT Parameters validation for sklearn.datasets.load_iris (scikit-learn#26177) * MAINT Parameters validation for sklearn.datasets.load_diabetes (scikit-learn#26166) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.datasets.load_breast_cancer (scikit-learn#26165) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.cluster.entropy (scikit-learn#26162) * MAINT Parameters validation for sklearn.datasets.fetch_species_distributions (scikit-learn#26161) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * ASV Fix tol in SGDRegressorBenchmark (scikit-learn#26146) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * MNT use api.openml.org URLs for fetch_openml (scikit-learn#26171) * MAINT Parameters validation for sklearn.utils.resample (scikit-learn#26139) * MAINT make it explicit that additive_chi2_kernel does not accept sparse matrix (scikit-learn#26178) * MNT fix circleci link in README.rst (scikit-learn#26183) * CI Fix circleci artifact redirector action (scikit-learn#26181) * GOV introduce rights for groups as discussed in SLEP019 (scikit-learn#25753) Co-authored-by: Julien <git@jjerphan.xyz> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * MAINT Parameters validation for sklearn.neighbors.sort_graph_by_row_values (scikit-learn#26173) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * FIX improve convergence criterion for LogisticRegression(penalty="l1", solver='liblinear') (scikit-learn#25214) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * MAINT Fix several typos in src and doc files (scikit-learn#26187) * PERF fix overhead of _rescale_data in LinearRegression (scikit-learn#26207) * ENH add Huber loss (scikit-learn#25966) * MAINT Refactor GraphicalLasso and graphical_lasso (scikit-learn#26033) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Cython linting (scikit-learn#25861) * DOC Add JupyterLite button in example gallery (scikit-learn#25887) * MAINT Parameters validation for sklearn.covariance.ledoit_wolf_shrinkage (scikit-learn#26200) * MAINT Parameters validation for sklearn.datasets.load_linnerud (scikit-learn#26199) * MAINT Parameters validation for sklearn.datasets.load_wine (scikit-learn#26196) * DOC Added redirect to Provost paper + minor refactor (scikit-learn#26223) * MAINT Parameter Validation for `covariance.graphical_lasso` (scikit-learn#25053) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.datasets.load_digits (scikit-learn#26195) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.preprocessing.quantile_transform (scikit-learn#26144) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.model_selection.cross_validate (scikit-learn#26129) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * DOC Adds TargetEncoder example explaining the internal CV (scikit-learn#26185) Co-authored-by: Tim Head <betatim@gmail.com> * spelling mistake corrected in documentation for script `plot_document_clustering.py` (scikit-learn#26228) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * FIX possible UnboundLocalError in fetch_openml (scikit-learn#26236) * ENH Adds PyTorch support to LinearDiscriminantAnalysis (scikit-learn#25956) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Tim Head <betatim@gmail.com> * MNT Use fixed version of Pyodide (scikit-learn#26247) * MNT Reset transform_output default in example to fix doc build build (scikit-learn#26269) * DOC Update example plot_nearest_centroid.py (scikit-learn#26263) * MNT reduce JupyterLite build size (scikit-learn#26246) * DOC term -> meth in GradientBoosting (scikit-learn#26225) * MNT speed-up html-noplot build (scikit-learn#26245) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * MNT Use copy=False when creating DataFrames (scikit-learn#26272) * MAINT Parameters validation for sklearn.model_selection.permutation_test_score (scikit-learn#26230) * MAINT Parameters validation for sklearn.datasets.clear_data_home (scikit-learn#26259) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.datasets.load_files (scikit-learn#26203) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.datasets.get_data_home (scikit-learn#26260) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * DOC Fix y-axis plot labels in permutation test score example (scikit-learn#26240) * MAINT cython-lint ignores asv_benchmarks (scikit-learn#26282) * MAINT Parameter validation for metrics.cluster._supervised (scikit-learn#26258) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * DOC Improve docstring for tol in SequentialFeatureSelector (scikit-learn#26271) * MAINT Parameters validation for sklearn.datasets.load_sample_image (scikit-learn#26226) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * DOC Consistent param type for pos_label (scikit-learn#26237) * DOC Minor grammar fix to imputation docs (scikit-learn#26283) * MAINT Parameters validation for sklearn.calibration.calibration_curve (scikit-learn#26198) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * MAINT Parameters validation for sklearn.inspection.partial_dependence (scikit-learn#26209) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * MAINT Parameters validation for sklearn.model_selection.validation_curve (scikit-learn#26229) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.model_selection.learning_curve (scikit-learn#26227) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * MNT Remove deprecated pandas.api.types.is_sparse (scikit-learn#26287) * CI Use Trusted Publishers for uploading wheels to PyPI (scikit-learn#26249) * MAINT Parameters validation for sklearn.metrics.pairwise.manhattan_distances (scikit-learn#26122) * PERF revert openmp use in csr_row_norms (scikit-learn#26275) * MAINT Parameters validation for metrics.check_scoring (scikit-learn#26041) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MNT Improve error message when checking classification target is of a non-regression type (scikit-learn#26281) Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * DOC fix link to User Guide encoder_infrequent_categories (scikit-learn#26309) * MNT remove unused args in _predict_regression_tree_inplace_fast_dense (scikit-learn#26314) * ENH Adds missing value support for trees (scikit-learn#23595) Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> * CLN Clean up logic in validate_data and cast_to_ndarray (scikit-learn#26300) * MAINT refactor scorer using _get_response_values (scikit-learn#26037) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> * DOC Add HGBDT to "see also" section of random forests (scikit-learn#26319) Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Tim Head <betatim@gmail.com> * MNT Bump Github Action labeler version to use newer Node (scikit-learn#26302) * FIX thresholds should not exceed 1.0 with probabilities in `roc_curve` (scikit-learn#26194) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * ENH Allow for appropriate dtype us in `preprocessing.PolynomialFeatures` for sparse matrices (scikit-learn#23731) Co-authored-by: Aleksandr Kokhaniukov <alexander.kohanyukov@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * DOC Fix minor typo (scikit-learn#26327) * MAINT bump minimum version for pytest (scikit-learn#26184) Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * DOC fix return type in isotonic_regression (scikit-learn#26332) * FIX fix available_if for MultiOutputRegressor.partial_fit (scikit-learn#26333) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * FIX make pipeline pass check_estimator (scikit-learn#26325) * FEA Add multiclass support to `average_precision_score` (scikit-learn#24769) Co-authored-by: Geoffrey <geoffrey.bolmier@gmail.com> Co-authored-by: gbolmier <geoffrey.bolmier@volvocars.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> --------- Signed-off-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: zeeshan lone <56621467+still-learning-ev@users.noreply.github.com> Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Shiva chauhan <103742975+Shivachauhan17@users.noreply.github.com> Co-authored-by: AymericBasset <45051041+AymericBasset@users.noreply.github.com> Co-authored-by: Maren Westermann <maren.westermann@gmail.com> Co-authored-by: Nishu Choudhary <51842539+choudharynishu@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Benedek Harsanyi <80836204+hbenedek@users.noreply.github.com> Co-authored-by: Pooja Subramaniam <poojas2086@gmail.com> Co-authored-by: Rushil Desai <rushildesai01@gmail.com> Co-authored-by: Xiao Yuan <yuanx749@gmail.com> Co-authored-by: Omar Salman <omar.salman@arbisoft.com> Co-authored-by: 2357juan <29247195+2357juan@users.noreply.github.com> Co-authored-by: Théophile Baranger <39696928+tbaranger@users.noreply.github.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Andreas Mueller <t3kcit@gmail.com> Co-authored-by: Jovan Stojanovic <62058944+jovan-stojanovic@users.noreply.github.com> Co-authored-by: Rahil Parikh <75483881+rprkh@users.noreply.github.com> Co-authored-by: Bharat Raghunathan <bharatraghunthan9767@gmail.com> Co-authored-by: Sortofamudkip <wishyutp0328@gmail.com> Co-authored-by: Gleb Levitski <36483986+glevv@users.noreply.github.com> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Ashwin Mathur <97467100+awinml@users.noreply.github.com> Co-authored-by: Sahil Gupta <sahil@Sahils-MBP.lan> Co-authored-by: Veghit <itay.vegh@gmail.com> Co-authored-by: Itay <itayvegh@gmail.com> Co-authored-by: precondition <57645186+precondition@users.noreply.github.com> Co-authored-by: Marc Torrellas Socastro <marc.torsoc@gmail.com> Co-authored-by: Dominic Fox <dominicjfox2@gmail.com> Co-authored-by: futurewarning <36329275+futurewarning@users.noreply.github.com> Co-authored-by: Yao Xiao <108576690+Charlie-XIAO@users.noreply.github.com> Co-authored-by: Joey Ortiz <orangesherbet0@gmail.com> Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Christian Veenhuis <veenhuis@gmail.com> Co-authored-by: adienes <51664769+adienes@users.noreply.github.com> Co-authored-by: Dave Berenbaum <dave.berenbaum@gmail.com> Co-authored-by: Lene Preuss <lene.preuss@gmail.com> Co-authored-by: A.H.Mansouri <83764851+A-H-Mansoury@users.noreply.github.com> Co-authored-by: Boris Feld <lothiraldan@gmail.com> Co-authored-by: Carla J <ca.jancik@gmail.com> Co-authored-by: windiana42 <61181806+windiana42@users.noreply.github.com> Co-authored-by: mdarii <dariimaxim@gmail.com> Co-authored-by: murezzda <47388020+murezzda@users.noreply.github.com> Co-authored-by: Peter Piontek <piontek0@gmail.com> Co-authored-by: John Pangas <swiftyxswaggy@outlook.com> Co-authored-by: Dmitry Nesterov <76070534+dmitrylala@users.noreply.github.com> Co-authored-by: Yuchen Zhou <72342196+ROMEEZHOU@users.noreply.github.com> Co-authored-by: Ekaterina Butyugina <102963496+ekaterinabutyugina@users.noreply.github.com> Co-authored-by: Jiawei Zhang <jiawei.zhang@nyu.edu> Co-authored-by: Ansam Zedan <86729068+ansamz@users.noreply.github.com> Co-authored-by: genvalen <genvalen@protonmail.com> Co-authored-by: farhan khan <86480450+BabaYaga1221@users.noreply.github.com> Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com> Co-authored-by: Jiawei Zhang <jz4721@nyu.edu> Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com> Co-authored-by: Jessicakk0711 <106110789+Jessicakk0711@users.noreply.github.com> Co-authored-by: Ankur Singh <singankur28@gmail.com> Co-authored-by: Seoeun(Sun☀️) Hong <75988952+seoeunHong@users.noreply.github.com> Co-authored-by: Nightwalkx <74856680+xi-jiajun@users.noreply.github.com> Co-authored-by: VIGNESH D <35656793+dvignesh1995@users.noreply.github.com> Co-authored-by: Vincent-violet <130581473+Vincent-violet@users.noreply.github.com> Co-authored-by: Elabonga Atuo <elabongaatuo@gmail.com> Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org> Co-authored-by: André Pedersen <andrped94@gmail.com> Co-authored-by: Ashish Dutt <ashish.dutt8@gmail.com> Co-authored-by: Phil <philsupertramp@users.noreply.github.com> Co-authored-by: Stanislav (Stanley) Modrak <44023416+smith558@users.noreply.github.com> Co-authored-by: hujiahong726 <52920842+hujiahong726@users.noreply.github.com> Co-authored-by: James Dean <24254612+AcylSilane@users.noreply.github.com> Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Aleksandr Kokhaniukov <alexander.kohanyukov@gmail.com> Co-authored-by: c-git <43485962+c-git@users.noreply.github.com> Co-authored-by: annegnx <64203599+annegnx@users.noreply.github.com> Co-authored-by: Geoffrey <geoffrey.bolmier@gmail.com> Co-authored-by: gbolmier <geoffrey.bolmier@volvocars.com>

FIX Fixes pandas extension arrays in check_array

04c4c7b

github-actions bot added module:preprocessing module:utils labels Mar 10, 2023

thomasjpfan added 2 commits March 10, 2023 12:17

DOC Adds PR number

a8b7c29

DOC Adds PR number

8f2b835

lorentzenchr approved these changes Mar 11, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/main' into fix_unique_extensino

c7f7b37

thomasjpfan added 2 commits March 13, 2023 12:29

REV Remove unneeded changes

457362e

ENH Adds back required changes

04a6e15

lorentzenchr mentioned this pull request Mar 17, 2023

Support other dataframes like polars and pyarrow not just pandas #25896

Closed

jeremiedbb approved these changes Mar 21, 2023

View reviewed changes

doc/whats_new/v1.3.rst Outdated Show resolved Hide resolved

sklearn/preprocessing/tests/test_label.py Show resolved Hide resolved

thomasjpfan added 4 commits March 21, 2023 14:03

Merge remote-tracking branch 'upstream/main' into fix_unique_extensino

182bd07

CLN Address comments

71480ce

TST Adds test for check_array directly

2d32dbd

TST Use issubdtype

0eff901

jeremiedbb merged commit 65dfab0 into scikit-learn:main Mar 22, 2023

Veghit pushed a commit to Veghit/scikit-learn that referenced this pull request Apr 15, 2023

FIX Fixes pandas extension arrays in check_array (scikit-learn#25813)

f3cf046

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FIX Fixes pandas extension arrays in check_array #25813

FIX Fixes pandas extension arrays in check_array #25813

Uh oh!

thomasjpfan commented Mar 10, 2023

Uh oh!

lorentzenchr commented Mar 11, 2023

Uh oh!

lorentzenchr left a comment

Uh oh!

lorentzenchr Mar 11, 2023

Uh oh!

lorentzenchr Mar 11, 2023

Uh oh!

lorentzenchr Mar 11, 2023

Uh oh!

lorentzenchr Mar 11, 2023

Uh oh!

glemaitre commented Mar 13, 2023

Uh oh!

glemaitre commented Mar 13, 2023

Uh oh!

thomasjpfan commented Mar 13, 2023 •

edited

Loading

Uh oh!

glemaitre commented Mar 13, 2023

Uh oh!

betatim commented Mar 21, 2023

Uh oh!

jeremiedbb left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	- \|Fix\| :func:`utils.validation.check_array` now suports extension arrays with object
	- \|Fix\| :func:`utils.validation.check_array` now supports extension arrays with object

	dtypes by return an ndarray with object dtype. :pr:`xxxxx` by `Thomas Fan`_.
	dtypes. In that case, it returns an ndarray with object dtype. :pr:`xxxxx` by `Thomas Fan`_.

Uh oh!

FIX Fixes pandas extension arrays in check_array #25813

FIX Fixes pandas extension arrays in check_array #25813

Uh oh!

Conversation

thomasjpfan commented Mar 10, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

lorentzenchr commented Mar 11, 2023

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Mar 11, 2023

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Mar 11, 2023

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Mar 11, 2023

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Mar 11, 2023

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Mar 13, 2023

Uh oh!

glemaitre commented Mar 13, 2023

Uh oh!

thomasjpfan commented Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Mar 13, 2023

Uh oh!

betatim commented Mar 21, 2023

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasjpfan commented Mar 13, 2023 •

edited

Loading