Skip to content

QuantileTransformer is incredibly slow #31901

@antipisa

Description

@antipisa

Describe the workflow you want to enable

This is a feature request to improve performance of the QuantileTransformer. It takes ~60 minutes to fit, uses a huge amount of memory when transforming large non-sparse dataframes with 30M+ rows and 500 columns. It also does not support sample_weight. Ideally it should be as fast as catboost's Pool quantize method, which does many of the same computations in a fraction of the time:
https://catboost.ai/docs/en/concepts/python-reference_pool_quantized

Describe your proposed solution

See source code for https://catboost.ai/docs/en/concepts/python-reference_pool_quantized

Describe alternatives you've considered, if relevant

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions