Skip to content

Unused Variable Assignment in for k-means #12186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xhluca opened this issue Sep 28, 2018 · 2 comments · Fixed by #12230
Closed

Unused Variable Assignment in for k-means #12186

xhluca opened this issue Sep 28, 2018 · 2 comments · Fixed by #12230

Comments

@xhluca
Copy link
Contributor

xhluca commented Sep 28, 2018

Description

in sklearn/cluster/k_means_.py, line 1559:

batch_inertia, centers_squared_diff = _mini_batch_step(
X_valid, sample_weight_valid,
x_squared_norms[validation_indices], cluster_centers,
weight_sums, old_center_buffer, False, distances=None,
verbose=self.verbose)

Are variables batch_inertia and centers_squared_diff actually used before being reassigned in line 1589?

for iteration_idx in range(n_iter):
# Sample a minibatch from the full dataset
minibatch_indices = random_state.randint(
0, n_samples, self.batch_size)
# Perform the actual update step on the minibatch data
batch_inertia, centers_squared_diff = _mini_batch_step(
X[minibatch_indices], sample_weight[minibatch_indices],
x_squared_norms[minibatch_indices],
self.cluster_centers_, self.counts_,
old_center_buffer, tol > 0.0, distances=distances,
# Here we randomly choose whether to perform
# random reassignment: the choice is done as a function
# of the iteration index, and the minimum number of
# counts, in order to force this reassignment to happen
# every once in a while
random_reassign=((iteration_idx + 1)
% (10 + int(self.counts_.min())) == 0),
random_state=random_state,
reassignment_ratio=self.reassignment_ratio,
verbose=self.verbose)

I presume that if n_iter is 0, then the for loop at line 1583 will not be run, which makes the assignment useful. But for k-means is n_iter ever set to 0?

Note

I found this through lgtm.com, as recommended by #12167

Versions

0.20

@jeremiedbb
Copy link
Member

jeremiedbb commented Sep 28, 2018

The variables batch_inertia and centers_squared_diff aren't used indeed. However, the function _mini_batch_step performs inplace operations on the centers for example, so it can't be removed.
We could instead do

_, _ = _mini_batch_step(...)

or just

_mini_batch_step(...)

but is it necessary ?

@xhluca
Copy link
Contributor Author

xhluca commented Sep 28, 2018

Yes I think that if we have to modify it we could go with _mini_batch_step(...).

I realized just after I opened this issue that you are working on a new implementation of k-means. In this case this change won't be necessary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants