-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Labels
BlockerBugHigh PriorityHigh priority issues and pull requestsHigh priority issues and pull requestsRegression
Milestone
Description
Describe the bug
The evaluation of a pipeline that encodes categorical data with v1.1 takes around 8 times longer than using v1.0.2
Steps/Code to Reproduce
import numpy as np
import pandas as pd
from time import time
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OrdinalEncoder
from sklearn.compose import make_column_transformer, make_column_selector
rng = np.random.RandomState(0)
n_samples, n_features = 50_000, 2
X = pd.DataFrame(rng.randn(n_samples, n_features))
X[2] = np.random.choice(
["male", "female", "other"], size=n_samples, p=[0.49, 0.49, 0.02]
)
X[3] = np.random.choice(
["jan", "feb", "mar", "apr", "may", "jun",
"jul", "aug", "sep", "oct", "nov", "dec"],
size=n_samples,
)
y = np.random.choice(
[0, 1, 2], size=n_samples, p=[0.01, 0.49, 0.5]
)
preprocessor = make_column_transformer(
(OrdinalEncoder(), make_column_selector(dtype_include=object)),
remainder="passthrough"
)
X_transformed = preprocessor.fit_transform(X)
t0 = time()
DecisionTreeClassifier().fit(X_transformed, y)
duration = time() - t0
duration
Expected Results
~450ms
Actual Results
3s
Versions
System:
python: 3.9.5 | packaged by conda-forge | (default, Jun 19 2021, 00:32:32) [GCC 9.3.0]
executable: /home/arturoamor/miniforge3/envs/scikit-learn-course/bin/python
machine: Linux-5.14.0-1036-oem-x86_64-with-glibc2.31
Python dependencies:
sklearn: 1.1.0
pip: 21.1.3
setuptools: 49.6.0.post20210108
numpy: 1.21.0
scipy: 1.7.0
Cython: None
pandas: 1.3.0
matplotlib: 3.4.2
joblib: 1.0.1
threadpoolctl: 2.1.0
Built with OpenMP: True
threadpoolctl info:
filepath: /home/arturoamor/miniforge3/envs/scikit-learn-course/lib/python3.9/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
prefix: libgomp
user_api: openmp
internal_api: openmp
version: None
num_threads: 8
filepath: /home/arturoamor/miniforge3/envs/scikit-learn-course/lib/libopenblasp-r0.3.15.so
prefix: libopenblas
user_api: blas
internal_api: openblas
version: 0.3.15
num_threads: 8
threading_layer: pthreads
Metadata
Metadata
Assignees
Labels
BlockerBugHigh PriorityHigh priority issues and pull requestsHigh priority issues and pull requestsRegression