Skip to content

Private instances exposed as public attributes #14720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rth opened this issue Aug 22, 2019 · 5 comments
Open

Private instances exposed as public attributes #14720

rth opened this issue Aug 22, 2019 · 5 comments

Comments

@rth
Copy link
Member

rth commented Aug 22, 2019

Something that came up as part of the GLM PR in #14300 (comment)

@rth

We have other examples of such private but documented behavior, existing for the same reasons:

  • DecisionTreeClassifier.tree_ is a sklearn.tree._tree.Tree object
  • Birch.root_ is a _CFNode

meaning that we don't guarantee backward compatibility on those, but you can use them if you know what you are doing.

@NicolasHug

I'm OK with documenting the Link class as long as the GeneralizedLinearRegressor isn't exposed, but if we ultimately expose is, I don't think we should document the use of private API, especially for init arguments.
Regarding attributes: I would argue that we should not expose tree_ and root_ as public attributes if these are subject to change without deprecation. We claim that the public attributes should be backward-compatible.

opening this issue for further discussion.

@NicolasHug
Copy link
Member

@rth I edited _tree and _root to tree_ and root_

@NicolasHug
Copy link
Member

To sum up my view: I think that tree_ and root_ should be private. I'm still OK with documenting these, but we should make clear that we don't guarantee backward compatibility.

@rth
Copy link
Member Author

rth commented Aug 22, 2019

To sum up my view: I think that tree_ and root_ should be private. I'm still OK with documenting these, but we should make clear that we don't guarantee backward compatibility.

Overall I agree. Though I now that those are public, I wonder if it's really worth changing them.

BTW, GradientBoostingClassifier.loss_, a LossFunction (will be private once the deprecation cycle finished in 0.23) is also affected.

A releated question is then more on how to support custom objects as input arguments and whether this should be mentioned in some way in the docstring.

For instance,

  • we support custom callable metrics. That fairly straightforward.
  • one step up in complexity is support of custom loss functions (putting aside whether we might want to do such thing for now). For instance, lighgbm accepts as loss a function returning tuple of gradient, hessian which is also manageable.
  • the question is what should be done in more complicated cases (such as GLMs where we need to be able to compute 4+ quantities, but not all at the same time). For instance, in plot_tree we sidestep this by passing the whole DecisionTreeClassifier estimator and then get estimator.tree_ returning a private instance. We could accept a function returning a long tuple, but that would unpractical and computationally inefficient. Making the underlying object public could be one way to go, but I'm not sure if it's reason enough to do that. Or mentioning it in the documentation in some way, not sure.

@NicolasHug
Copy link
Member

Another instance that I found is the loss_function_ attribute of BaseSGD and all child classes. I'm not sure it make sense to expose these, even though they're not explicitly private, they are defined in sgd_fast.pyx. They're all undocumented classes.

from #14789 (comment)

1 similar comment
@NicolasHug
Copy link
Member

Another instance that I found is the loss_function_ attribute of BaseSGD and all child classes. I'm not sure it make sense to expose these, even though they're not explicitly private, they are defined in sgd_fast.pyx. They're all undocumented classes.

from #14789 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants