-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Provide examples on how to customize the scikit-learn classes #28828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This can certainly be an example (under our |
Hey @adrinjalali, I'd love to work on this issue. I am new to contributing to this repo. Any sort of advice/help will be greatly appreciated :) |
Great. Would it be worth it to compile first a list of classes for which it would be useful to provide such customization examples? Or should we just start with this one and take it from there? |
@plon-Susk7 this is more of an advanced issue. Probably working on easier issues for a while would be more fruitful. You can also look at the list of stalled and help wanted existing pull requests since a lot of them need somebody to pick it up and continue the work. |
@adrinjalali pinging in case you missed it. I'll have some more free time from Monday onwards, so could start working on this. |
@miguelcsilva starting with an example for the splitters would be a good start. |
Hey @miguelcsilva , please consider the examples here A concrete from this page is the Given a pandas dataframe with dimension
The
class Patch(BaseEstimator, TransformerMixin):
def __init__(self,pipeline1, tcam_obj):
self.pipeline1 = pipeline1
self.tcam_obj = tcam_obj
def _transform_y(self, y, map1):
yt = pd.Series(map1).rename('row').to_frame()
yt = yt.merge(y.reset_index(), left_index = True, right_on = "SubjectID").drop_duplicates()
yt.index = yt['row']
yt = yt['label'].sort_index()
return yt
def fit(self, X, y = None):
self.pipeline1 = self.pipeline1.fit(X,y=y)
tensor, map1, map3 = table2tensor(self.pipeline1.transform(X))
if y is not None:
yt = self._transform_y(y,map1)
else:
yt = None
self.tcam_obj = self.tcam_obj.fit(tensor,y=yt)
return self
def transform(self, X):
Xt1 = self.pipeline1.transform(X)
tensor, map1, map3 = table2tensor(Xt1)
Xt2 = self.tcam_obj.transform(tensor)
return Xt2 I hope you will find this example interesting. |
Hi @adrinjalali and @miguelcsilva, is this Issue still open? It seems a big Issue to work on providing multiple examples of custom classes that use sklearn as a base. |
We don't need very elaborate examples with real statistical usecases for this. We need to have examples on the API, so that it showcases the boilerplates needed to develop them. This issue is also originally about non-estimator objects. |
Thanks for clarifying @adrinjalali ! |
Describe the issue linked to the documentation
Recently I add to implement my custom CV Splitter for a project I'm working on. My first instinct was to look in the documentation to see if there were any examples of how this could be done. I could not find anything too concrete, but after not too much time I found the Glossary of Common Terms and API Elements. Although not exactly what I hoped to find, it does have a section on CV Splitters. From there I can read that they expected to have a
split
andget_n_splits
methods, and following some other links in the docs I can find what arguments they take and what they should return.Although all the information is in fact there, I believe that more inexperienced users may find it a bit more difficult to piece together all the pieces, and was thinking if it wouldn't be beneficial for all users to have a section in the documentation with examples on how to customize the sci-kit learn classes to suit the user's needs. After all, I understand the library was developed with a API in mind that would allow for this exact flexibility and customization.
I know this is not a small task, and may add a non-trivial maintenance burden to the team, but would like to understand how the maintenance team would feel about a space in the documentation for these customization examples? Of course as the person suggesting I would be happy contribute for this.
Suggest a potential alternative/fix
One way I could see this taking shape would be with a dedicated page in the documentation, where examples of customized classes could be demonstrated. I think it's also important to show how the customized class would be used as part of a larger pipeline and allowing the user to copy and paste the code to their working environment.
I'll leave below of an example of a custom CV Splitter for discussion. But the idea would be to then expand to most commonly used classes.
The text was updated successfully, but these errors were encountered: