You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey, I don't know if I should call this a bug, but for me at least it was unexpected behavior. I tried to subclass from Pipeline
to implement a customization, so having a simplified configuration, which is used to build a sequence of transformations.
It generates an AttributeError, due to not having an instance attribute with the same name as a positional argument (same is true for a kwarg) of the subclasses's init. Find a minimal example below.
Is this expected behavior? It does not harm to set the instance attributes with the same name, but it is surprising it is demanded and is very implicit. Also, it does not pop up, when you instantiate the object, but only when you try to call a method on it.
In case it is absolutely necessary, it may need some documentation.
In addition, I tried to globally skip parameter validation and it did not help in this situation, which might be a real bug?
Thanks for your help, and your good work:)
A simple example:
importsklearnsklearn.set_config(
skip_parameter_validation=True, # disable validation
)
fromsklearn.pipelineimportPipelinefromsklearn.preprocessingimportOneHotEncoderfromsklearn.baseimportBaseEstimator, TransformerMixinimportpandasaspdclassTakeColumn(BaseEstimator, TransformerMixin):
def__init__(self, column: str):
self.column=columndef__str__(self):
returnself.__class__.__name__+f"[{self.column}]"deffit(self, X, y=None):
returnselfdeftransform(self, X: pd.DataFrame) ->pd.DataFrame:
returnX[[self.column]]
classCategoricalFeature(Pipeline):
def__init__(self, column: str, encode=True):
take_column=TakeColumn(column)
steps= [(str(take_column), take_column)]
ifencode:
encoder=OneHotEncoder()
steps.append((str(encoder), encoder))
# setting instance attributes having the same name, removes the exception#self.column = column#self.encode = encodesuper().__init__(steps)
df=pd.DataFrame([["a"], ["b"], ["c"]], columns=["column"])
column_feature=CategoricalFeature("column")
some_other_feature=CategoricalFeature("other_column", encode=False)
# this fails, if instance attributes are not set with the same name as the# corresponding parameter of __init__result=column_feature.fit_transform(df)
Output:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Users/kristof/Library/Application Support/JetBrains/IntelliJIdea2024.3/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kristof/Library/Application Support/JetBrains/IntelliJIdea2024.3/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/kristof/Projects/pipeline-issue/default_console.py", line 45, in <module>
result = column_feature.fit_transform(df)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kristof/Projects/pipeline-issue/.venv/lib/python3.12/site-packages/sklearn/base.py", line 1382, in wrapper
estimator._validate_params()
File "/Users/kristof/Projects/pipeline-issue/.venv/lib/python3.12/site-packages/sklearn/base.py", line 438, in _validate_params
self.get_params(deep=False),
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kristof/Projects/pipeline-issue/.venv/lib/python3.12/site-packages/sklearn/pipeline.py", line 299, in get_params
return self._get_params("steps", deep=deep)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kristof/Projects/pipeline-issue/.venv/lib/python3.12/site-packages/sklearn/utils/metaestimators.py", line 30, in _get_params
out = super().get_params(deep=deep)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kristof/Projects/pipeline-issue/.venv/lib/python3.12/site-packages/sklearn/base.py", line 248, in get_params
value = getattr(self, key)
^^^^^^^^^^^^^^^^^^
AttributeError: 'CategoricalFeature' object has no attribute 'column'
TL;DR: in an estimator's __init__ you should not perform any work, just store all the (keyword) arguments as instance attributes. All other work, including validation, should happen in fit.
Describe the issue linked to the documentation
Hey, I don't know if I should call this a bug, but for me at least it was unexpected behavior. I tried to subclass from
Pipeline
to implement a customization, so having a simplified configuration, which is used to build a sequence of transformations.
It generates an
AttributeError
, due to not having an instance attribute with the same name as a positional argument (same is true for a kwarg) of the subclasses's init. Find a minimal example below.Is this expected behavior? It does not harm to set the instance attributes with the same name, but it is surprising it is demanded and is very implicit. Also, it does not pop up, when you instantiate the object, but only when you try to call a method on it.
In case it is absolutely necessary, it may need some documentation.
In addition, I tried to globally skip parameter validation and it did not help in this situation, which might be a real bug?
Thanks for your help, and your good work:)
A simple example:
Output:
Suggest a potential alternative/fix
No response
The text was updated successfully, but these errors were encountered: