Open
Description
Describe the workflow you want to enable
Varying the seed of the FeatureHasher allows the user to control what inputs collide. This can allow for a better feature space either through experimentation (as a hyperparameter) or explicitly searching for a space that minimizes "bad" collisions
Describe your proposed solution
Add an optional "seed" parameter to the init of FeatureHasher which defaults to 0 (the current behavior, see the underlying hashing function). The seed would be thread through to _hashing_transform
Ditto for HashingVectorizer. Only difference here is that the seed would be passed to the FeatureHasher instance
This seems straightforward so I can implement this solution if it makes sense
Describe alternatives you've considered, if relevant
No response
Additional context
No response