Description
Could possibly be @jnhansen's pull request #10933? It's the latest commit that changes the lines relating to this --
#10933
Description
I have a really simple, basic error that's really frustrating me.
I recently updated my SKLearn library to the most recent one on Github (did this today) and now I'm getting "expected: 6 values but given 4" or "expected: 4 values but given 3". The code is exactly the same as it is in the documentation and when it worked before. Why could it be throwing this error?
Steps/Code to Reproduce
pipeline = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
])
X = pipeline.fit_transform(list_of_sentences)
km = KMeans(n_clusters=3, random_state=0)
km.fit(X) or km.fit(X.toarray())
Expected Results
No error thrown: KMeans fits the tfidf matrix.
Actual Results
When I try to run KMeans.fit(X), where X is a sparse matrix, I get this error:
kmeans_model.fit(tfidf_matrix)
File "/Users/..../anaconda3/lib/python3.6/site-packages/sklearn/cluster/k_means_.py", line 896, in fit
return_n_iter=True)
File "/Users/.../anaconda3/lib/python3.6/site-packages/sklearn/cluster/k_means_.py", line 346, in k_means
x_squared_norms=x_squared_norms, random_state=random_state)
File "/Users/.../anaconda3/lib/python3.6/site-packages/sklearn/cluster/k_means_.py", line 493, in _kmeans_single_lloyd
distances=distances)
File "/Users/.../anaconda3/lib/python3.6/site-packages/sklearn/cluster/k_means_.py", line 616, in _labels_inertia
X, x_squared_norms, centers, labels, distances=distances)
File "sklearn/cluster/_k_means.pyx", line 104, in sklearn.cluster._k_means.__pyx_fuse_1_assign_labels_csr
TypeError: __pyx_fuse_1_assign_labels_csr() takes exactly 6 positional arguments (4 given)
I think it may something to do with the sample weights? specifically in
inertia = _k_means._assign_labels_csr( X, x_squared_norms, centers, labels, distances=distances)
is where it's throwing the error.
When I try to run KMeans on a dense matrix (or X.toarray()), I get:
1 km = KMeans(n_clusters=3, random_state=0)
----> 2 km.fit(X.toarray())
~/anaconda3/lib/python3.6/site-packages/sklearn/cluster/k_means_.py in fit(self, X, y)
894 tol=self.tol, random_state=random_state, copy_x=self.copy_x,
895 n_jobs=self.n_jobs, algorithm=self.algorithm,
--> 896 return_n_iter=True)
897 return self
898
~/anaconda3/lib/python3.6/site-packages/sklearn/cluster/k_means_.py in k_means(X, n_clusters, init, precompute_distances, n_init, max_iter, verbose, tol, random_state, copy_x, n_jobs, algorithm, return_n_iter)
344 X, n_clusters, max_iter=max_iter, init=init, verbose=verbose,
345 precompute_distances=precompute_distances, tol=tol,
--> 346 x_squared_norms=x_squared_norms, random_state=random_state)
347 # determine if these results are the best so far
348 if best_inertia is None or inertia < best_inertia:
~/anaconda3/lib/python3.6/site-packages/sklearn/cluster/k_means_.py in _kmeans_single_elkan(X, n_clusters, max_iter, init, verbose, x_squared_norms, random_state, tol, precompute_distances)
398 print('Initialization complete')
399 centers, labels, n_iter = k_means_elkan(X, n_clusters, centers, tol=tol,
--> 400 max_iter=max_iter, verbose=verbose)
401 inertia = np.sum((X - centers[labels]) ** 2, dtype=np.float64)
402 return labels, inertia, centers, n_iter
sklearn/cluster/_k_means_elkan.pyx in sklearn.cluster._k_means_elkan.k_means_elkan()
Versions
Please run the following snippet and paste the output below.
import platform; print(platform.platform())
Darwin-17.6.0-x86_64-i386-64bit
import sys; print("Python", sys.version)
Python 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
import numpy; print("NumPy", numpy.version)
NumPy 1.14.5
import scipy; print("SciPy", scipy.version)
SciPy 1.1.0
import sklearn; print("Scikit-Learn", sklearn.version)
What could be causing the issue?