Skip to content

Expose Seed in FeatureHasher and HashingVectorizer #29748

Open
@FelixLabelle

Description

@FelixLabelle

Describe the workflow you want to enable

Varying the seed of the FeatureHasher allows the user to control what inputs collide. This can allow for a better feature space either through experimentation (as a hyperparameter) or explicitly searching for a space that minimizes "bad" collisions

Describe your proposed solution

Add an optional "seed" parameter to the init of FeatureHasher which defaults to 0 (the current behavior, see the underlying hashing function). The seed would be thread through to _hashing_transform

Ditto for HashingVectorizer. Only difference here is that the seed would be passed to the FeatureHasher instance

This seems straightforward so I can implement this solution if it makes sense

Describe alternatives you've considered, if relevant

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions