Programmatically pass categorical_features to HGBT

#### Describe the workflow you want to enable
#18394 added native support for categorical features to HGBT. Therefore, you have to ordinal encode your categoricals, e.g. in a `ColumnTransformer` (potentially part of a pipeline), and indicate the column positions of the passed `X` via the parameter `categorical_features`.

How can we then programmatically, i.e. without manually filling in `categorical_features`, specify the positions of categorical (ordinal encoded) columns in the feature matrix `X` that is finally passed to HGBT?

```python
X, y = ...

ct = make_column_transformer(
    (OrdinalEncoder(),
     make_column_selector(dtype_include='category')),
    remainder='passthrough')

hist_native = make_pipeline(
    ct,
    HistGradientBoostingRegressor(categorical_features=???)
)
```
How to fill `???`?

#### Possible solutions
1. Set it manually, e.g. use `OrdinalEncoder` as first or last part of a `ColumnTransformer`. This is currently used in [this example](https://scikit-learn.org/dev/auto_examples/ensemble/plot_gradient_boosting_categorical.html#sphx-glr-auto-examples-ensemble-plot-gradient-boosting-categorical-py) but it's not ideal
2. Passing a callable/function, e.g `HistGradientBoostingRegressor(categorical_features=my_function)`, see https://github.com/scikit-learn/scikit-learn/pull/18394#issuecomment-731568451 for details.
   > Sadly, this doesn't work. It breaks when the pipeline is used in e.g. cross_val_score because the estimators will be cloned there, and thus the callable refers to an unfitted CT:
3. Pass feature names once they are available. Even then, you have to know the exact feature names that are created by `OrdinalEncoder`.
4. Pass feature-aligned meta data "this is a categorical feature" similar to [SLEP006](https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep006/proposal.html) and proposed in #4196.
5. Internally use an OE within the GBDT estimator so that users don't need to create a pipeline

#### Further context
One day, this might become relevant for more estimators, for linear models see #18893.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Programmatically pass categorical_features to HGBT #18894

Describe the workflow you want to enable

Possible solutions

Further context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Programmatically pass categorical_features to HGBT #18894

Description

Describe the workflow you want to enable

Possible solutions

Further context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions