[MRG + 1] Remove unused variables #12230

mroeschke · 2018-09-30T22:03:16Z

Reference Issues/PRs

Progress towards Look at lgtm.com alerts for last couple of months #12167
Closes Unused Variable Assignment in for k-means #12186

What does this implement/fix? Explain your changes.

Removes unused variable assignments as flagged by PyCharm.

sklearn-lgtm · 2018-09-30T22:29:52Z

This pull request introduces 2 alerts and fixes 22 when merging 7ad074d into e1c3c22 - view on LGTM.com

new alerts:

2 for Unused import

fixed alerts:

20 for Unused local variable
2 for Variable defined multiple times

Comment posted by LGTM.com

xhluca · 2018-09-30T23:03:29Z

sklearn/cluster/k_means_.py

@@ -1556,7 +1556,7 @@ def fit(self, X, y=None, sample_weight=None):
                init_size=init_size)

            # Compute the label assignment on the init dataset
-            batch_inertia, centers_squared_diff = _mini_batch_step(
+            _mini_batch_step(


I mentioned this in the previous issue: is n_iter ever set to 0 for k-means, or is it strictly greater than 0? Because I would see how this assignment might be useful if the subsequent for loop does not assignment them.

It looks contingent on n_samples, self. max_iter and self. batch_size

scikit-learn/sklearn/cluster/k_means_.py

Lines 1524 to 1525 in 2e2e69d

n_batches = int(np.ceil(float(n_samples) / self.batch_size))

n_iter = int(self.max_iter * n_batches)

So it appears very unlikely that n_iter would ever be set to 0.

Even if n_iter=0, the variables batch_inertia, centers_squared_diff don't seem to be used afterwards, so it should be indeed fine.

adrinjalali · 2018-10-01T08:01:15Z

sklearn/covariance/shrunk_covariance_.py

@@ -309,7 +309,6 @@ def ledoit_wolf(X, assume_centered=False, block_size=1000):
        X = np.reshape(X, (1, -1))
        warnings.warn("Only one sample available. "
                      "You may want to reshape your data array")
-        n_samples = 1
        n_features = X.size
    else:
        n_samples, n_features = X.shape


If n_samples is not needed, then it's also not needed here.

adrinjalali · 2018-10-01T08:04:51Z

sklearn/covariance/robust_covariance.py

@@ -435,7 +434,6 @@ def fast_mcd(X, support_fraction=None,
            # (and less optimal)
            all_best_covariances = np.zeros((n_best_tot, n_features,
                                             n_features))
-            n_best_tot = 10


Doesn't this look odd? The comment says it's trying a smaller matrix, but it sets the n_best_tot after retrying the matrix allocation. It seems to me that those lines are swapped somehow.

Good point. I'll try swapping the lines and see if anything breaks.

adrinjalali · 2018-10-01T08:17:23Z

sklearn/utils/estimator_checks.py

@@ -523,7 +523,7 @@ def check_estimator_sparse_data(name, estimator_orig):
                          "sparse input is not supported if this is not"
                          " the case." % name)
                    raise
-        except Exception as e:


Shouldn't it print the exception instead of ignoring it? It kinda feels odd to catch Exception and suppress it completely.

There's a raise statement after the print statement below, so the exception will still be visible. It's still strange semantics that the exception does not wrap the print message though.

There's a raise, sure. But it says that the estimator is not failing properly on sparse data, i.e. the message is being very specific, whereas it's catching Exception. So if the estimator fails because there's not enough memory, or because there's something wrong with the disk and the estimator reads from the disk while fitting, for whatever reason, we still show the same error message. It's just a bad idea to catch Exception, and pretend it's much more specific than Exception.

Agreed that it's bad that except Exception can be catching anything (minus a TypeError or ValueError from above) while a specific print message is being shown (should probably be a warning).

I am not too familiar with this part of the codebase, but the scope of this PR (just removing unused variables) does not change the prior behavior. This block can be improved in a future PR.

In [5]: try: ...: 1 / 0 ...: except Exception as e: ...: print('Something went wrong') ...: raise ...: ...: Something went wrong --------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) <ipython-input-5-b91504a88660> in <module>() 1 try: ----> 2 1 / 0 3 except Exception as e: 4 print('Something went wrong') 5 raise ZeroDivisionError: division by zero In [6]: try: ...: 1 / 0 ...: except Exception: ...: print('Something went wrong') ...: raise ...: ...: Something went wrong --------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) <ipython-input-6-ee8d63f9937e> in <module>() 1 try: ----> 2 1 / 0 3 except Exception: 4 print('Something went wrong') 5 raise ZeroDivisionError: division by zero

@mroeschke , sorry, hadn't noticed the raise after the print. It's all good. Thanks!

amueller · 2018-10-01T16:30:13Z

sklearn/kernel_approximation.py

@@ -334,7 +334,7 @@ def fit(self, X, y=None):
        self : object
            Returns the transformer.
        """
-        X = check_array(X, accept_sparse='csr')
+        check_array(X, accept_sparse='csr')


I was like "this definitely doesn't look right" but it actually surprisingly is. With estimator tags we could entirely remove this line.

Not familiar with estimator tags, but are the tags in place so this line can be removed or should this be kept for now to still validate X?

should be kept for now, sorry for being cryptic.

sklearn-lgtm · 2018-10-01T23:01:48Z

This pull request introduces 2 alerts and fixes 22 when merging 60f5be5 into 59b15c5 - view on LGTM.com

new alerts:

2 for Unused import

fixed alerts:

20 for Unused local variable
2 for Variable defined multiple times

Comment posted by LGTM.com

sklearn-lgtm · 2018-10-03T15:46:55Z

This pull request introduces 2 alerts and fixes 22 when merging 9352f3f into bfab306 - view on LGTM.com

new alerts:

2 for Unused import

fixed alerts:

20 for Unused local variable
2 for Variable defined multiple times

Comment posted by LGTM.com

mroeschke · 2018-10-03T16:26:50Z

Is the codecov check necessary to pass although this is just cleaning some variables?

amueller · 2018-10-04T15:12:29Z

probably a false positive, I wouldn't worry about it.

sklearn-lgtm · 2018-10-18T02:13:17Z

This pull request introduces 2 alerts and fixes 20 when merging 067247b into d4c9e84 - view on LGTM.com

new alerts:

2 for Unused import

fixed alerts:

18 for Unused local variable
2 for Variable defined multiple times

Comment posted by LGTM.com

jnothman · 2018-10-18T06:40:13Z

Please remove the now-unused imports

sklearn-lgtm · 2018-10-19T21:52:34Z

This pull request fixes 20 alerts when merging ab1dc98 into 5bcd84b - view on LGTM.com

fixed alerts:

18 for Unused local variable
2 for Variable defined multiple times

Comment posted by LGTM.com

mroeschke · 2018-10-19T22:04:12Z

Removed those unused imports @jnothman. Thanks

rth · 2018-10-20T09:14:23Z

sklearn/cluster/k_means_.py

@@ -1556,7 +1556,7 @@ def fit(self, X, y=None, sample_weight=None):
                init_size=init_size)

            # Compute the label assignment on the init dataset
-            batch_inertia, centers_squared_diff = _mini_batch_step(
+            _mini_batch_step(


Even if n_iter=0, the variables batch_inertia, centers_squared_diff don't seem to be used afterwards, so it should be indeed fine.

rth · 2018-10-20T09:19:17Z

sklearn/utils/_scipy_sparse_lsqr_backport.py

@@ -281,7 +281,6 @@ def lsqr(A, b, damp=0.0, atol=1e-8, btol=1e-8, conlim=1e8,

    itn = 0
    istop = 0
-    nstop = 0


It's a backport from scipy, it's better to keep it without change, which will make comparing with the upstream version easier.

sklearn-lgtm · 2018-10-21T02:27:59Z

This pull request fixes 19 alerts when merging d9de4c0 into 5bcd84b - view on LGTM.com

fixed alerts:

17 for Unused local variable
2 for Variable defined multiple times

Comment posted by LGTM.com

jnothman · 2018-10-22T01:48:20Z

Looks good, thanks @mroeschke!

This reverts commit 294884c.

CLN: Unused variables

7ad074d

xhluca reviewed Sep 30, 2018

View reviewed changes

adrinjalali reviewed Oct 1, 2018

View reviewed changes

amueller reviewed Oct 1, 2018

View reviewed changes

eamanu approved these changes Oct 1, 2018

View reviewed changes

Address review

60f5be5

Merge remote-tracking branch 'upstream/master' into clean_unused

9352f3f

Merge remote-tracking branch 'upstream/master' into clean_unused

067247b

Matt Roeschke added 2 commits October 19, 2018 14:22

Merge remote-tracking branch 'upstream/master' into clean_unused

6780704

Remove unused imports

ab1dc98

rth approved these changes Oct 20, 2018

View reviewed changes

Add back variable for backport

d9de4c0

mroeschke changed the title ~~[MRG] Remove unused variables~~ [MRG + 1] Remove unused variables Oct 21, 2018

jnothman merged commit 3a3a78a into scikit-learn:master Oct 22, 2018

mroeschke deleted the clean_unused branch October 22, 2018 17:05

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

MNT Remove unused variables (scikit-learn#12230)

294884c

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "MNT Remove unused variables (scikit-learn#12230)"

2b54b0d

This reverts commit 294884c.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "MNT Remove unused variables (scikit-learn#12230)"

7eb12dc

This reverts commit 294884c.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

MNT Remove unused variables (scikit-learn#12230)

67566fc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG + 1] Remove unused variables #12230

[MRG + 1] Remove unused variables #12230

mroeschke commented Sep 30, 2018

sklearn-lgtm commented Sep 30, 2018

xhluca Sep 30, 2018

mroeschke Oct 1, 2018

rth Oct 20, 2018

adrinjalali Oct 1, 2018

adrinjalali Oct 1, 2018

mroeschke Oct 1, 2018

adrinjalali Oct 1, 2018

mroeschke Oct 1, 2018

adrinjalali Oct 2, 2018

mroeschke Oct 2, 2018

adrinjalali Oct 3, 2018

amueller Oct 1, 2018

mroeschke Oct 1, 2018

amueller Oct 1, 2018

sklearn-lgtm commented Oct 1, 2018

sklearn-lgtm commented Oct 3, 2018

mroeschke commented Oct 3, 2018

amueller commented Oct 4, 2018

sklearn-lgtm commented Oct 18, 2018

jnothman commented Oct 18, 2018

sklearn-lgtm commented Oct 19, 2018

mroeschke commented Oct 19, 2018

rth Oct 20, 2018

rth Oct 20, 2018

sklearn-lgtm commented Oct 21, 2018

jnothman commented Oct 22, 2018

	n_batches = int(np.ceil(float(n_samples) / self.batch_size))
	n_iter = int(self.max_iter * n_batches)

[MRG + 1] Remove unused variables #12230

[MRG + 1] Remove unused variables #12230

Conversation

mroeschke commented Sep 30, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

sklearn-lgtm commented Sep 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sklearn-lgtm commented Oct 1, 2018

sklearn-lgtm commented Oct 3, 2018

mroeschke commented Oct 3, 2018

amueller commented Oct 4, 2018

sklearn-lgtm commented Oct 18, 2018

jnothman commented Oct 18, 2018

sklearn-lgtm commented Oct 19, 2018

mroeschke commented Oct 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sklearn-lgtm commented Oct 21, 2018

jnothman commented Oct 22, 2018