Skip to content

MNT Makes modules in tree and ensemble private #14964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Oct 16, 2019

Conversation

adrinjalali
Copy link
Member

Addresses part of #12927 and part of #9250

This PR renames file.py into _file.py in the sklearn.{tree, ensemble} modules.

In particular, please note that:

  • I rename tree.py to _tree_base.py since changing the _tree.{pyx,pxd}'s names would result in a much larger diff, same for gradient_boosting.py -> _gb.py

Ping @NicolasHug @thomasjpfan @rth @glemaitre.

Again, lint issue can be ignored and are triggered by new files being created.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you make sure no examples import from ensemble.XXX or tree.XXX ?

As a nit I feel like tree_base.py isn't a great name because it actually contains all of the concrete tree classes (DecisionTreeClassifier, etc).
Maybe we can just rename the .pyx file instead (that's gonna be a bit annoying for you because you'll need a make clean and make in again. Also since tree.pyx isn't formally private it would need to be tested too).

LGTM anyway, thanks.

@@ -11,6 +11,30 @@
# appropriate error message.

@pytest.mark.parametrize('deprecated_path, importee', (
('sklearn.tree.tree', 'DecisionTreeClassifier'),
('sklearn.tree.tree', 'DecisionTreeRegressor'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note, since we import * you only need to import one thing from the module in this test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be addressed. I do not think we need all these tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@glemaitre glemaitre self-requested a review September 13, 2019 09:31
@glemaitre
Copy link
Member

We might break some people code if they try to import the base class. The following will fail:

from sklearn.ensemble.bagging import BaseBagging

while I would argue that importing the _*** functions is not really our problem since they are private, the base class might still be used by people. We should probably add all base class in the __all__.

@glemaitre glemaitre removed their request for review September 13, 2019 09:52
@NicolasHug
Copy link
Member

good point, I didn't think of that.

Should we simply remove __all__ in _bagging.py?

@glemaitre
Copy link
Member

Should we simply remove all in _bagging.py?

Or add whatever public classes/functions in __all__.

@NicolasHug
Copy link
Member

A small issue I have with that is that we might miss some, and having them in all suggests that they should be publicly exposed somehow, but we are still making them private.

@glemaitre
Copy link
Member

Actually, both approaches work for me. Removing __all__ might be safer then.

@adrinjalali
Copy link
Member Author

Did you make sure no examples import from ensemble.XXX or tree.XXX ?

I think there's none, I checked, but I plan to do another round of warning removal and cleanup on the examples anyway.

As a nit I feel like tree_base.py isn't a great name because it actually contains all of the concrete tree classes (DecisionTreeClassifier, etc).
Maybe we can just rename the .pyx file instead (that's gonna be a bit annoying for you because you'll need a make clean and make in again. Also since tree.pyx isn't formally private it would need to be tested too).

I didn't rename the _tree.pyx cause it's used in so many different places that it would be a bigger change, also for people who use it. I know it's private, but I thought it'd be nice to avoid renaming that file.

@adrinjalali
Copy link
Member Author

Question, which of these do we want to expose as public?

$ git grep "^class " sklearn/ensemble/ | grep -v tests 
sklearn/ensemble/_gb_losses.py:class LossFunction(metaclass=ABCMeta):
sklearn/ensemble/_gb_losses.py:class RegressionLossFunction(LossFunction, metaclass=ABCMeta):
sklearn/ensemble/_gb_losses.py:class LeastSquaresError(RegressionLossFunction):
sklearn/ensemble/_gb_losses.py:class LeastAbsoluteError(RegressionLossFunction):
sklearn/ensemble/_gb_losses.py:class HuberLossFunction(RegressionLossFunction):
sklearn/ensemble/_gb_losses.py:class QuantileLossFunction(RegressionLossFunction):
sklearn/ensemble/_gb_losses.py:class ClassificationLossFunction(LossFunction, metaclass=ABCMeta):
sklearn/ensemble/_gb_losses.py:class BinomialDeviance(ClassificationLossFunction):
sklearn/ensemble/_gb_losses.py:class MultinomialDeviance(ClassificationLossFunction):
sklearn/ensemble/_gb_losses.py:class ExponentialLoss(ClassificationLossFunction):
sklearn/ensemble/_hist_gradient_boosting/binning.py:class _BinMapper(BaseEstimator, TransformerMixin):
sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py:class BaseHistGradientBoosting(BaseEstimator, ABC):
sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py:class HistGradientBoostingRegressor(BaseHistGradientBoosting, RegressorMixin):
sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py:class HistGradientBoostingClassifier(BaseHistGradientBoosting,
sklearn/ensemble/_hist_gradient_boosting/grower.py:class TreeNode:
sklearn/ensemble/_hist_gradient_boosting/grower.py:class TreeGrower:
sklearn/ensemble/_hist_gradient_boosting/loss.py:class BaseLoss(ABC):
sklearn/ensemble/_hist_gradient_boosting/loss.py:class LeastSquares(BaseLoss):
sklearn/ensemble/_hist_gradient_boosting/loss.py:class BinaryCrossEntropy(BaseLoss):
sklearn/ensemble/_hist_gradient_boosting/loss.py:class CategoricalCrossEntropy(BaseLoss):
sklearn/ensemble/_hist_gradient_boosting/predictor.py:class TreePredictor:
sklearn/ensemble/_hist_gradient_boosting/splitting.pyx:class SplitInfo:
sklearn/ensemble/bagging.py:class BaseBagging(BaseEnsemble, metaclass=ABCMeta):
sklearn/ensemble/bagging.py:class BaggingClassifier(BaseBagging, ClassifierMixin):
sklearn/ensemble/bagging.py:class BaggingRegressor(BaseBagging, RegressorMixin):
sklearn/ensemble/base.py:class BaseEnsemble(BaseEstimator, MetaEstimatorMixin, metaclass=ABCMeta):
sklearn/ensemble/forest.py:class BaseForest(BaseEnsemble, MultiOutputMixin, metaclass=ABCMeta):
sklearn/ensemble/forest.py:class ForestClassifier(BaseForest, ClassifierMixin, metaclass=ABCMeta):
sklearn/ensemble/forest.py:class ForestRegressor(BaseForest, RegressorMixin, metaclass=ABCMeta):
sklearn/ensemble/forest.py:class RandomForestClassifier(ForestClassifier):
sklearn/ensemble/forest.py:class RandomForestRegressor(ForestRegressor):
sklearn/ensemble/forest.py:class ExtraTreesClassifier(ForestClassifier):
sklearn/ensemble/forest.py:class ExtraTreesRegressor(ForestRegressor):
sklearn/ensemble/forest.py:class RandomTreesEmbedding(BaseForest):
sklearn/ensemble/gradient_boosting.py:class QuantileEstimator:
sklearn/ensemble/gradient_boosting.py:class MeanEstimator:
sklearn/ensemble/gradient_boosting.py:class LogOddsEstimator:
sklearn/ensemble/gradient_boosting.py:class ScaledLogOddsEstimator(LogOddsEstimator):
sklearn/ensemble/gradient_boosting.py:class PriorProbabilityEstimator:
sklearn/ensemble/gradient_boosting.py:class ZeroEstimator:
sklearn/ensemble/gradient_boosting.py:class LossFunction(metaclass=ABCMeta):
sklearn/ensemble/gradient_boosting.py:class RegressionLossFunction(LossFunction, metaclass=ABCMeta):
sklearn/ensemble/gradient_boosting.py:class LeastSquaresError(RegressionLossFunction):
sklearn/ensemble/gradient_boosting.py:class LeastAbsoluteError(RegressionLossFunction):
sklearn/ensemble/gradient_boosting.py:class HuberLossFunction(RegressionLossFunction):
sklearn/ensemble/gradient_boosting.py:class QuantileLossFunction(RegressionLossFunction):
sklearn/ensemble/gradient_boosting.py:class ClassificationLossFunction(LossFunction, metaclass=ABCMeta):
sklearn/ensemble/gradient_boosting.py:class BinomialDeviance(ClassificationLossFunction):
sklearn/ensemble/gradient_boosting.py:class MultinomialDeviance(ClassificationLossFunction):
sklearn/ensemble/gradient_boosting.py:class ExponentialLoss(ClassificationLossFunction):
sklearn/ensemble/gradient_boosting.py:class VerboseReporter(object):
sklearn/ensemble/gradient_boosting.py:class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta):
sklearn/ensemble/gradient_boosting.py:class GradientBoostingClassifier(BaseGradientBoosting, ClassifierMixin):
sklearn/ensemble/gradient_boosting.py:class GradientBoostingRegressor(BaseGradientBoosting, RegressorMixin):
sklearn/ensemble/iforest.py:class IsolationForest(BaseBagging, OutlierMixin):
sklearn/ensemble/voting.py:class _BaseVoting(_BaseComposition, TransformerMixin):
sklearn/ensemble/voting.py:class VotingClassifier(_BaseVoting, ClassifierMixin):
sklearn/ensemble/voting.py:class VotingRegressor(_BaseVoting, RegressorMixin):
sklearn/ensemble/weight_boosting.py:class BaseWeightBoosting(BaseEnsemble, metaclass=ABCMeta):
sklearn/ensemble/weight_boosting.py:class AdaBoostClassifier(BaseWeightBoosting, ClassifierMixin):
sklearn/ensemble/weight_boosting.py:class AdaBoostRegressor(BaseWeightBoosting, RegressorMixin):

@NicolasHug
Copy link
Member

Everything that is already in ensemble/__init__.py is public, the rest becomes private.

@adrinjalali
Copy link
Member Author

I still don't know what to call the new tree file. I really rather not rename _tree.pyx and _tree.pxd files, since not only it's engangled with the rest of the codebase, but also users who were importing things from tree._tree will get really odd errors after these renames.

@thomasjpfan
Copy link
Member

What do you think about renaming tree.py into _base.py?

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a merge but this still LGTM

@@ -7,7 +7,16 @@
# (new_module_name, deprecated_path, correct_import_path)
_DEPRECATED_MODULES = {
# TODO: Remove in 0.24
('_mocking', 'sklearn.utils.mocking', 'sklearn.utils')
('_mocking', 'sklearn.utils.mocking', 'sklearn.utils'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: skip line

@thomasjpfan
Copy link
Member

I see that you undid the change to partial_dependence at f3e50d9 What was the reason for this?

@NicolasHug
Copy link
Member

Also I don't mind too much about _base.py but in the svm module this is called _classes.py. Might make more sense since these are not just the base classes.

@adrinjalali
Copy link
Member Author

I see that you undid the change to partial_dependence at f3e50d9 What was the reason for this?

Because we were not exposing anything from inside that file at the sklearn.ensemble level. I thought that requires it's own careful consideration.

@NicolasHug
Copy link
Member

also everything from ensemble/partial_dependence is already deprecated so I agree we don't need to touch it.

@thomasjpfan let's merge?

@thomasjpfan thomasjpfan changed the title underscore files in {tree, ensemble} folder MNT Makes modules in tree and ensemble private Oct 16, 2019
@thomasjpfan thomasjpfan merged commit 437ca05 into scikit-learn:master Oct 16, 2019
@adrinjalali adrinjalali deleted the api/public/tree branch October 16, 2019 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants