-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Improves class design for AgglomerativeClustering and FeatureAgglomeration #9875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Need help in what needs to be done to pass the tests in |
"Improves the class design" is what sense? what is the design issue you identified? |
I am not sure if I am on right track but I tried using a _BaseClass, from which the two classes Agglomerative Clustering and Feature Extraction are derived and having pooling_func only in FeatureAgglomeration. I am not sure. |
@lesteve I guess this is what you were trying to convey. I have made a base class which is used by both AgglomerativeClustering and FeatureAgglomeration. The later one uses pooling_func, not the former.
The above patch now raises an error:
|
I think this is not quite right. You don't need a new base class. We want to remove I see the current PR's implementation, however, as more wrong than right. |
So, what I can understand from your concern is that, at present, I only need to add a deprecation warning which will raise a warning if a user tries to explicitly call |
Except that there cannot be a warning in FeatureAgglomeration.__init__, so
no, you also need to implement FeatureAgglomeration.__init__ as if
pooling_func in AgglomerativeClustering did not exist.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mark the parameter as deprecated in the docstring.
sklearn/cluster/hierarchical.py
Outdated
@@ -678,6 +678,12 @@ def __init__(self, n_clusters=2, affinity="euclidean", | |||
self.linkage = linkage | |||
self.affinity = affinity | |||
self.pooling_func = pooling_func | |||
if self.pooling_func != np.mean: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really be in fit, not in __init__
according to our conventions. And ideally we'd change the default to something like 'deprecated' so that we can warn when the user explicitly passed pooling_func=np.mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the recommended changes, check once
sklearn/cluster/hierarchical.py
Outdated
@@ -642,6 +642,7 @@ class AgglomerativeClustering(BaseEstimator, ClusterMixin): | |||
all observations of the two sets. | |||
|
|||
pooling_func : callable, default=np.mean | |||
deprecated, since AgglomerativeClustering do not use pooling_func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just say "ignored. Deprecate to be removed in ..." And remove the rest as irrelevant
@jnothman Review please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
sklearn/cluster/hierarchical.py
Outdated
This combines the values of agglomerated features into a single | ||
value, and should accept an array of shape [M, N] and the keyword | ||
argument ``axis=1``, and reduce it to an array of size [M]. | ||
ignored, Deprecate to be removed in 0.22. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*deprecated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at other places we use the "deprecated" sphinx directive. git grep '\.\. deprecated'
will show you plenty of examples.
sklearn/cluster/hierarchical.py
Outdated
@@ -694,6 +692,12 @@ def fit(self, X, y=None): | |||
------- | |||
self | |||
""" | |||
if self.pooling_func != 'deprecated': | |||
warnings.warn('"pooling_func" as a parameter argument is ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it surprises me that this has test coverage. Please find out if there is a use of AgglomerativeClustering(pooling_func=...)
in the codebase and remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a use of pooling_func
. there's no test covering pooling_func.
sklearn/cluster/hierarchical.py
Outdated
memory=None, | ||
connectivity=None, compute_full_tree='auto', | ||
linkage='ward', pooling_func=np.mean): | ||
self.n_clusters = n_clusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And could you please use super(FeatureAgglomeration, self).__init__(n_clusters=..., ...)
instead of setting things individually here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved.
…into aggcluster
somewhere in tests this warning is raised, says codecov
…On 19 Oct 2017 7:14 pm, "Kumar Ashutosh" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/cluster/hierarchical.py
<#9875 (comment)>
:
> @@ -694,6 +692,12 @@ def fit(self, X, y=None):
-------
self
"""
+ if self.pooling_func != 'deprecated':
+ warnings.warn('"pooling_func" as a parameter argument is '
I don't think there's a use of pooling_func. there's no test covering
pooling_func.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9875 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6wYKs6l95l4sbAzbKMzV09mx31AAks5stwTsgaJpZM4PvpxO>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem is that with your changes as it is FeatureAgglomeration with default parameter has a warning:
from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import AgglomerativeClustering, FeatureAgglomeration
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=300, centers=centers, cluster_std=0.5,
random_state=0)
model = FeatureAgglomeration()
model.fit(X)
Some other comments too.
sklearn/cluster/hierarchical.py
Outdated
This combines the values of agglomerated features into a single | ||
value, and should accept an array of shape [M, N] and the keyword | ||
argument ``axis=1``, and reduce it to an array of size [M]. | ||
ignored, Deprecate to be removed in 0.22. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at other places we use the "deprecated" sphinx directive. git grep '\.\. deprecated'
will show you plenty of examples.
sklearn/cluster/hierarchical.py
Outdated
connectivity=connectivity, | ||
compute_full_tree=compute_full_tree, | ||
linkage=linkage, affinity=affinity, | ||
pooling_func=pooling_func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not pass pooling_function
to the base class otherwise you'll get a warning in AgglomerativeClustering.__init__
. This should be fine:
super(FeatureAgglomeration, self).__init__(n_clusters=n_clusters,
memory=memory,
connectivity=connectivity,
compute_full_tree=compute_full_tree,
linkage=linkage, affinity=affinity)
self.pooling_func=pooling_func
sklearn/cluster/hierarchical.py
Outdated
warnings.warn('"pooling_func" as a parameter argument is ' | ||
'deprecated in version 0.20 and will be removed ' | ||
'in version 0.22. It is being removed since ' | ||
'AgglomerativeClustering do not directly use ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewording:
Agglomerative 'pooling_func' parameter is not used. It has been deprecated in version 0.20
and will be removed in 0.22.
@lesteve Added the changes. I am still not able to figure out the warnings raised by codecov as pointed out by @jnothman . Also, there's a PEP8 failure "line too long". Here:
Can I modify it in any way? |
sklearn/cluster/hierarchical.py
Outdated
memory=None, | ||
connectivity=None, compute_full_tree='auto', | ||
linkage='ward', pooling_func=np.mean): | ||
super(FeatureAgglomeration, self).__init__(n_clusters=n_clusters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just put a newline after ( and indent 4 spaces
sklearn/cluster/hierarchical.py
Outdated
@@ -644,7 +644,11 @@ class AgglomerativeClustering(BaseEstimator, ClusterMixin): | |||
pooling_func : callable, default=np.mean | |||
This combines the values of agglomerated features into a single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it does not. This should just say ignored. When @lesteve pointed to other uses of .. deprecated
they may be cases where the param is deprecated, but still functional. This one is not.
sklearn/cluster/hierarchical.py
Outdated
This combines the values of agglomerated features into a single | ||
value, and should accept an array of shape [M, N] and the keyword | ||
argument ``axis=1``, and reduce it to an array of size [M]. | ||
ignored, Deprecated to be removed in 0.22. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not saying that you could or should not use .. deprecated
but that leaving in an untrue description of behaviour helped no one.
Is this what you were suggesting? @jnothman |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's alright.
I added a whats_new entry. I'll look at the generated doc and then merge if it looks fine. |
Merging, thanks a lot! |
…cs/donigian-update-contribution-guidelines * 'master' of github.com:scikit-learn/scikit-learn: (23 commits) fixes scikit-learn#10031: fix attribute name and shape in documentation (scikit-learn#10033) [MRG+1] add changelog entry for fixed and merged PR scikit-learn#10005 issue scikit-learn#9633 (scikit-learn#10025) [MRG] Fix LogisticRegression see also should include LogisticRegressionCV(scikit-learn#9995) (scikit-learn#10022) [MRG + 1] Labels of clustering should start at 0 or -1 if noise (scikit-learn#10015) MAINT Remove redundancy in scikit-learn#9552 (scikit-learn#9573) [MRG+1] correct comparison in GaussianNB for 'priors' (scikit-learn#10005) [MRG + 1] ENH add check_inverse in FunctionTransformer (scikit-learn#9399) [MRG] FIX bug in nested set_params usage (scikit-learn#9999) [MRG+1] Fix LOF and Isolation benchmarks (scikit-learn#9798) [MRG + 1] Fix negative inputs checking in mean_squared_log_error (scikit-learn#9968) DOC Fix typo (scikit-learn#9996) DOC Fix typo: x axis -> y axis (scikit-learn#9985) improve example plot_forest_iris.py (scikit-learn#9989) [MRG+1] Deprecate pooling_func unused parameter in AgglomerativeClustering (scikit-learn#9875) DOC update news DOC Fix three typos in manifold documentation (scikit-learn#9990) DOC add missing dot in docstring DOC Add what's new for 0.19.1 (scikit-learn#9983) Improve readability of outlier detection example. (scikit-learn#9973) DOC: Fixed typo (scikit-learn#9977) ...
Reference Issue
Fixes #9846
What does this implement/fix? Explain your changes.
Improves the class design of AgglomerativeClustering and FeatureAgglomeration
Any other comments?
Changes will be made if not green.