Closed
Description
I'm slightly concerned that currently the common tests don't cover as much as I'd like them to cover, which results in no sparse data tests for clustering (#4052) for example.
I think for clustering, regression, classification and transformers we are in relatively good shape, but there are two cases of "odd" estimators that we need to watch out for:
- estimators not returned by
all_estimators
by default - estimators not belonging to the four mixin classes.
For the second:
estimators = all_estimators(type_filter=['classifier', 'regressor', 'transformer', 'cluster'])
{('CheckingClassifier', sklearn.utils.mocking.CheckingClassifier),
('CountVectorizer', sklearn.feature_extraction.text.CountVectorizer),
('DPGMM', sklearn.mixture.dpgmm.DPGMM),
('EmpiricalCovariance',
sklearn.covariance.empirical_covariance_.EmpiricalCovariance),
('GMM', sklearn.mixture.gmm.GMM),
('GMMHMM', sklearn.hmm.GMMHMM),
('GaussianHMM', sklearn.hmm.GaussianHMM),
('GraphLasso', sklearn.covariance.graph_lasso_.GraphLasso),
('GraphLassoCV', sklearn.covariance.graph_lasso_.GraphLassoCV),
('HashingVectorizer', sklearn.feature_extraction.text.HashingVectorizer),
('KernelDensity', sklearn.neighbors.kde.KernelDensity),
('LSHForest', sklearn.neighbors.approximate.LSHForest),
('LedoitWolf', sklearn.covariance.shrunk_covariance_.LedoitWolf),
('LogOddsEstimator', sklearn.ensemble.gradient_boosting.LogOddsEstimator),
('MDS', sklearn.manifold.mds.MDS),
('MeanEstimator', sklearn.ensemble.gradient_boosting.MeanEstimator),
('MinCovDet', sklearn.covariance.robust_covariance.MinCovDet),
('MultinomialHMM', sklearn.hmm.MultinomialHMM),
('NearestNeighbors', sklearn.neighbors.unsupervised.NearestNeighbors),
('OAS', sklearn.covariance.shrunk_covariance_.OAS),
('OneClassSVM', sklearn.svm.classes.OneClassSVM),
('PatchExtractor', sklearn.feature_extraction.image.PatchExtractor),
('PriorProbabilityEstimator',
sklearn.ensemble.gradient_boosting.PriorProbabilityEstimator),
('QuantileEstimator', sklearn.ensemble.gradient_boosting.QuantileEstimator),
('ScaledLogOddsEstimator',
sklearn.ensemble.gradient_boosting.ScaledLogOddsEstimator),
('ShrunkCovariance', sklearn.covariance.shrunk_covariance_.ShrunkCovariance),
('SpectralBiclustering', sklearn.cluster.bicluster.SpectralBiclustering),
('SpectralCoclustering', sklearn.cluster.bicluster.SpectralCoclustering),
('SpectralEmbedding', sklearn.manifold.spectral_embedding_.SpectralEmbedding),
('TSNE', sklearn.manifold.t_sne.TSNE),
('TfidfVectorizer', sklearn.feature_extraction.text.TfidfVectorizer),
('VBGMM', sklearn.mixture.dpgmm.VBGMM),
('ZeroEstimator', sklearn.ensemble.gradient_boosting.ZeroEstimator),
('_BaseHMM', sklearn.hmm._BaseHMM),
('_BaseRidgeCV', sklearn.linear_model.ridge._BaseRidgeCV),
('_ConstantPredictor', sklearn.multiclass._ConstantPredictor),
('_RidgeGCV', sklearn.linear_model.ridge._RidgeGCV)}
These are mostly covariance, density, preprocessing and density models.
It would be great if we could figure out a good way to test them, too, or make more tests applicable to all estimators, without filtering for the four standard kinds.