rmatam
diff --git a/‎dev/.buildinfo
Lines changed: 1 addition & 1 deletion b/‎dev/.buildinfo
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/auto_examples_jupyter.zip
1020 KB b/‎dev/_downloads/auto_examples_jupyter.zip
1020 KB
diff --git a/‎dev/_downloads/auto_examples_python.zip
770 KB b/‎dev/_downloads/auto_examples_python.zip
770 KB
diff --git a/‎dev/_downloads/bicluster_newsgroups.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/bicluster_newsgroups.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/digits_classification_exercise.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/digits_classification_exercise.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/document_classification_20newsgroups.ipynb
Lines changed: 3 additions & 3 deletions b/‎dev/_downloads/document_classification_20newsgroups.ipynb
Lines changed: 3 additions & 3 deletions
diff --git a/‎dev/_downloads/document_clustering.ipynb
Lines changed: 3 additions & 3 deletions b/‎dev/_downloads/document_clustering.ipynb
Lines changed: 3 additions & 3 deletions
diff --git a/‎dev/_downloads/face_recognition.ipynb
Lines changed: 7 additions & 7 deletions b/‎dev/_downloads/face_recognition.ipynb
Lines changed: 7 additions & 7 deletions
diff --git a/‎dev/_downloads/feature_selection_pipeline.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/feature_selection_pipeline.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/feature_stacker.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/feature_stacker.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/grid_search_digits.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/grid_search_digits.ipynb
Lines changed: 1 addition & 1 deletion
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: b90c712f3add20a5f92735ef28d2c523
+config: 91e11510374afa0b84e8436171ed9db9
 tags: 645f666f9bcd5a90fca523b33c5a78b7
@@ -15,7 +15,7 @@
     }, 
     {
       "source": [
-        "\n# Biclustering documents with the Spectral Co-clustering algorithm\n\n\nThis example demonstrates the Spectral Co-clustering algorithm on the\ntwenty newsgroups dataset. The 'comp.os.ms-windows.misc' category is\nexcluded because it contains many posts containing nothing but data.\n\nThe TF-IDF vectorized posts form a word frequency matrix, which is\nthen biclustered using Dhillon's Spectral Co-Clustering algorithm. The\nresulting document-word biclusters indicate subsets words used more\noften in those subsets documents.\n\nFor a few of the best biclusters, its most common document categories\nand its ten most important words get printed. The best biclusters are\ndetermined by their normalized cut. The best words are determined by\ncomparing their sums inside and outside the bicluster.\n\nFor comparison, the documents are also clustered using\nMiniBatchKMeans. The document clusters derived from the biclusters\nachieve a better V-measure than clusters found by MiniBatchKMeans.\n\nOutput::\n\n    Vectorizing...\n    Coclustering...\n    Done in 9.53s. V-measure: 0.4455\n    MiniBatchKMeans...\n    Done in 12.00s. V-measure: 0.3309\n\n    Best biclusters:\n    ----------------\n    bicluster 0 : 1951 documents, 4373 words\n    categories   : 23% talk.politics.guns, 19% talk.politics.misc, 14% sci.med\n    words        : gun, guns, geb, banks, firearms, drugs, gordon, clinton, cdt, amendment\n\n    bicluster 1 : 1165 documents, 3304 words\n    categories   : 29% talk.politics.mideast, 26% soc.religion.christian, 25% alt.atheism\n    words        : god, jesus, christians, atheists, kent, sin, morality, belief, resurrection, marriage\n\n    bicluster 2 : 2219 documents, 2830 words\n    categories   : 18% comp.sys.mac.hardware, 16% comp.sys.ibm.pc.hardware, 16% comp.graphics\n    words        : voltage, dsp, board, receiver, circuit, shipping, packages, stereo, compression, package\n\n    bicluster 3 : 1860 documents, 2745 words\n    categories   : 26% rec.motorcycles, 23% rec.autos, 13% misc.forsale\n    words        : bike, car, dod, engine, motorcycle, ride, honda, cars, bmw, bikes\n\n    bicluster 4 : 12 documents, 155 words\n    categories   : 100% rec.sport.hockey\n    words        : scorer, unassisted, reichel, semak, sweeney, kovalenko, ricci, audette, momesso, nedved\n\n"
+        "\n# Biclustering documents with the Spectral Co-clustering algorithm\n\n\nThis example demonstrates the Spectral Co-clustering algorithm on the\ntwenty newsgroups dataset. The 'comp.os.ms-windows.misc' category is\nexcluded because it contains many posts containing nothing but data.\n\nThe TF-IDF vectorized posts form a word frequency matrix, which is\nthen biclustered using Dhillon's Spectral Co-Clustering algorithm. The\nresulting document-word biclusters indicate subsets words used more\noften in those subsets documents.\n\nFor a few of the best biclusters, its most common document categories\nand its ten most important words get printed. The best biclusters are\ndetermined by their normalized cut. The best words are determined by\ncomparing their sums inside and outside the bicluster.\n\nFor comparison, the documents are also clustered using\nMiniBatchKMeans. The document clusters derived from the biclusters\nachieve a better V-measure than clusters found by MiniBatchKMeans.\n\nOutput::\n\n    Vectorizing...\n    Coclustering...\n    Done in 9.53s. V-measure: 0.4455\n    MiniBatchKMeans...\n    Done in 12.00s. V-measure: 0.3309\n\n    Best biclusters:\n    ----------------\n    bicluster 0 : 1951 documents, 4373 words\n    categories   : 23% talk.politics.guns, 19% talk.politics.misc, 14% sci.med\n    words        : gun, guns, geb, banks, firearms, drugs, gordon, clinton, cdt, amendment\n\n    bicluster 1 : 1165 documents, 3304 words\n    categories   : 29% talk.politics.mideast, 26% soc.religion.christian, 25% alt.atheism\n    words        : god, jesus, christians, atheists, kent, sin, morality, belief, resurrection, marriage\n\n    bicluster 2 : 2219 documents, 2830 words\n    categories   : 18% comp.sys.mac.hardware, 16% comp.sys.ibm.pc.hardware, 16% comp.graphics\n    words        : voltage, dsp, board, receiver, circuit, shipping, packages, stereo, compression, package\n\n    bicluster 3 : 1860 documents, 2745 words\n    categories   : 26% rec.motorcycles, 23% rec.autos, 13% misc.forsale\n    words        : bike, car, dod, engine, motorcycle, ride, honda, cars, bmw, bikes\n\n    bicluster 4 : 12 documents, 155 words\n    categories   : 100% rec.sport.hockey\n    words        : scorer, unassisted, reichel, semak, sweeney, kovalenko, ricci, audette, momesso, nedved\n\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
 
@@ -15,7 +15,7 @@
     }, 
     {
       "source": [
-        "\n# Digits Classification Exercise\n\n\nA tutorial exercise regarding the use of classification techniques on\nthe Digits dataset.\n\nThis exercise is used in the :ref:`clf_tut` part of the\n:ref:`supervised_learning_tut` section of the\n:ref:`stat_learn_tut_index`.\n"
+        "\n# Digits Classification Exercise\n\n\nA tutorial exercise regarding the use of classification techniques on\nthe Digits dataset.\n\nThis exercise is used in the `clf_tut` part of the\n`supervised_learning_tut` section of the\n`stat_learn_tut_index`.\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
 
@@ -15,7 +15,7 @@
     }, 
     {
       "source": [
-        "\n# Classification of text documents using sparse features\n\n\nThis is an example showing how scikit-learn can be used to classify documents\nby topics using a bag-of-words approach. This example uses a scipy.sparse\nmatrix to store the features and demonstrates various classifiers that can\nefficiently handle sparse matrices.\n\nThe dataset used in this example is the 20 newsgroups dataset. It will be\nautomatically downloaded, then cached.\n\nThe bar plot indicates the accuracy, training time (normalized) and test time\n(normalized) of each classifier.\n\n"
+        "\n# Classification of text documents using sparse features\n\n\nThis is an example showing how scikit-learn can be used to classify documents\nby topics using a bag-of-words approach. This example uses a scipy.sparse\nmatrix to store the features and demonstrates various classifiers that can\nefficiently handle sparse matrices.\n\nThe dataset used in this example is the 20 newsgroups dataset. It will be\nautomatically downloaded, then cached.\n\nThe bar plot indicates the accuracy, training time (normalized) and test time\n(normalized) of each classifier.\n\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
@@ -33,7 +33,7 @@
     }, 
     {
       "source": [
-        "Load some categories from the training set\n"
+        "Load some categories from the training set\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
@@ -51,7 +51,7 @@
     }, 
     {
       "source": [
-        "Benchmark classifiers\n"
+        "Benchmark classifiers\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
 
@@ -15,7 +15,7 @@
     }, 
     {
       "source": [
-        "\n# Clustering text documents using k-means\n\n\nThis is an example showing how the scikit-learn can be used to cluster\ndocuments by topics using a bag-of-words approach. This example uses\na scipy.sparse matrix to store the features instead of standard numpy arrays.\n\nTwo feature extraction methods can be used in this example:\n\n  - TfidfVectorizer uses a in-memory vocabulary (a python dict) to map the most\n    frequent words to features indices and hence compute a word occurrence\n    frequency (sparse) matrix. The word frequencies are then reweighted using\n    the Inverse Document Frequency (IDF) vector collected feature-wise over\n    the corpus.\n\n  - HashingVectorizer hashes word occurrences to a fixed dimensional space,\n    possibly with collisions. The word count vectors are then normalized to\n    each have l2-norm equal to one (projected to the euclidean unit-ball) which\n    seems to be important for k-means to work in high dimensional space.\n\n    HashingVectorizer does not provide IDF weighting as this is a stateless\n    model (the fit method does nothing). When IDF weighting is needed it can\n    be added by pipelining its output to a TfidfTransformer instance.\n\nTwo algorithms are demoed: ordinary k-means and its more scalable cousin\nminibatch k-means.\n\nAdditionally, latent semantic analysis can also be used to reduce dimensionality\nand discover latent patterns in the data. \n\nIt can be noted that k-means (and minibatch k-means) are very sensitive to\nfeature scaling and that in this case the IDF weighting helps improve the\nquality of the clustering by quite a lot as measured against the \"ground truth\"\nprovided by the class label assignments of the 20 newsgroups dataset.\n\nThis improvement is not visible in the Silhouette Coefficient which is small\nfor both as this measure seem to suffer from the phenomenon called\n\"Concentration of Measure\" or \"Curse of Dimensionality\" for high dimensional\ndatasets such as text data. Other measures such as V-measure and Adjusted Rand\nIndex are information theoretic based evaluation scores: as they are only based\non cluster assignments rather than distances, hence not affected by the curse\nof dimensionality.\n\nNote: as k-means is optimizing a non-convex objective function, it will likely\nend up in a local optimum. Several runs with independent random init might be\nnecessary to get a good convergence.\n\n"
+        "\n# Clustering text documents using k-means\n\n\nThis is an example showing how the scikit-learn can be used to cluster\ndocuments by topics using a bag-of-words approach. This example uses\na scipy.sparse matrix to store the features instead of standard numpy arrays.\n\nTwo feature extraction methods can be used in this example:\n\n  - TfidfVectorizer uses a in-memory vocabulary (a python dict) to map the most\n    frequent words to features indices and hence compute a word occurrence\n    frequency (sparse) matrix. The word frequencies are then reweighted using\n    the Inverse Document Frequency (IDF) vector collected feature-wise over\n    the corpus.\n\n  - HashingVectorizer hashes word occurrences to a fixed dimensional space,\n    possibly with collisions. The word count vectors are then normalized to\n    each have l2-norm equal to one (projected to the euclidean unit-ball) which\n    seems to be important for k-means to work in high dimensional space.\n\n    HashingVectorizer does not provide IDF weighting as this is a stateless\n    model (the fit method does nothing). When IDF weighting is needed it can\n    be added by pipelining its output to a TfidfTransformer instance.\n\nTwo algorithms are demoed: ordinary k-means and its more scalable cousin\nminibatch k-means.\n\nAdditionally, latent semantic analysis can also be used to reduce dimensionality\nand discover latent patterns in the data. \n\nIt can be noted that k-means (and minibatch k-means) are very sensitive to\nfeature scaling and that in this case the IDF weighting helps improve the\nquality of the clustering by quite a lot as measured against the \"ground truth\"\nprovided by the class label assignments of the 20 newsgroups dataset.\n\nThis improvement is not visible in the Silhouette Coefficient which is small\nfor both as this measure seem to suffer from the phenomenon called\n\"Concentration of Measure\" or \"Curse of Dimensionality\" for high dimensional\ndatasets such as text data. Other measures such as V-measure and Adjusted Rand\nIndex are information theoretic based evaluation scores: as they are only based\non cluster assignments rather than distances, hence not affected by the curse\nof dimensionality.\n\nNote: as k-means is optimizing a non-convex objective function, it will likely\nend up in a local optimum. Several runs with independent random init might be\nnecessary to get a good convergence.\n\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
@@ -33,7 +33,7 @@
     }, 
     {
       "source": [
-        "Load some categories from the training set\n"
+        "Load some categories from the training set\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
@@ -51,7 +51,7 @@
     }, 
     {
       "source": [
-        "Do the actual clustering\n"
+        "Do the actual clustering\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
 
@@ -15,7 +15,7 @@
     }, 
     {
       "source": [
-        "\n# Faces recognition example using eigenfaces and SVMs\n\n\nThe dataset used in this example is a preprocessed excerpt of the\n\"Labeled Faces in the Wild\", aka LFW_:\n\n  http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz (233MB)\n\n.. _LFW: http://vis-www.cs.umass.edu/lfw/\n\nExpected results for the top 5 most represented people in the dataset:\n\n================== ============ ======= ========== =======\n                   precision    recall  f1-score   support\n================== ============ ======= ========== =======\n     Ariel Sharon       0.67      0.92      0.77        13\n     Colin Powell       0.75      0.78      0.76        60\n  Donald Rumsfeld       0.78      0.67      0.72        27\n    George W Bush       0.86      0.86      0.86       146\nGerhard Schroeder       0.76      0.76      0.76        25\n      Hugo Chavez       0.67      0.67      0.67        15\n       Tony Blair       0.81      0.69      0.75        36\n\n      avg / total       0.80      0.80      0.80       322\n================== ============ ======= ========== =======\n\n"
+        "\n# Faces recognition example using eigenfaces and SVMs\n\n\nThe dataset used in this example is a preprocessed excerpt of the\n\"Labeled Faces in the Wild\", aka LFW_:\n\n  http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz (233MB)\n\n\nExpected results for the top 5 most represented people in the dataset:\n\n================== ============ ======= ========== =======\n                   precision    recall  f1-score   support\n================== ============ ======= ========== =======\n     Ariel Sharon       0.67      0.92      0.77        13\n     Colin Powell       0.75      0.78      0.76        60\n  Donald Rumsfeld       0.78      0.67      0.72        27\n    George W Bush       0.86      0.86      0.86       146\nGerhard Schroeder       0.76      0.76      0.76        25\n      Hugo Chavez       0.67      0.67      0.67        15\n       Tony Blair       0.81      0.69      0.75        36\n\n      avg / total       0.80      0.80      0.80       322\n================== ============ ======= ========== =======\n\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
@@ -33,7 +33,7 @@
     }, 
     {
       "source": [
-        "Download the data, if not already on disk and load it as numpy arrays\n"
+        "Download the data, if not already on disk and load it as numpy arrays\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
@@ -51,7 +51,7 @@
     }, 
     {
       "source": [
-        "Split into a training set and a test set using a stratified k fold\n"
+        "Split into a training set and a test set using a stratified k fold\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
@@ -69,7 +69,7 @@
     }, 
     {
       "source": [
-        "Compute a PCA (eigenfaces) on the face dataset (treated as unlabeled\ndataset): unsupervised feature extraction / dimensionality reduction\n"
+        "Compute a PCA (eigenfaces) on the face dataset (treated as unlabeled\ndataset): unsupervised feature extraction / dimensionality reduction\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
@@ -87,7 +87,7 @@
     }, 
     {
       "source": [
-        "Train a SVM classification model\n"
+        "Train a SVM classification model\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
@@ -105,7 +105,7 @@
     }, 
     {
       "source": [
-        "Quantitative evaluation of the model quality on the test set\n"
+        "Quantitative evaluation of the model quality on the test set\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
@@ -123,7 +123,7 @@
     }, 
     {
       "source": [
-        "Qualitative evaluation of the predictions using matplotlib\n"
+        "Qualitative evaluation of the predictions using matplotlib\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
 
@@ -15,7 +15,7 @@
     }, 
     {
       "source": [
-        "\n# Pipeline Anova SVM\n\n\nSimple usage of Pipeline that runs successively a univariate\nfeature selection with anova and then a C-SVM of the selected features.\n"
+        "\n# Pipeline Anova SVM\n\n\nSimple usage of Pipeline that runs successively a univariate\nfeature selection with anova and then a C-SVM of the selected features.\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
 
@@ -15,7 +15,7 @@
     }, 
     {
       "source": [
-        "\n# Concatenating multiple feature extraction methods\n\n\nIn many real-world examples, there are many ways to extract features from a\ndataset. Often it is beneficial to combine several methods to obtain good\nperformance. This example shows how to use ``FeatureUnion`` to combine\nfeatures obtained by PCA and univariate selection.\n\nCombining features using this transformer has the benefit that it allows\ncross validation and grid searches over the whole process.\n\nThe combination used in this example is not particularly helpful on this\ndataset and is only used to illustrate the usage of FeatureUnion.\n"
+        "\n# Concatenating multiple feature extraction methods\n\n\nIn many real-world examples, there are many ways to extract features from a\ndataset. Often it is beneficial to combine several methods to obtain good\nperformance. This example shows how to use ``FeatureUnion`` to combine\nfeatures obtained by PCA and univariate selection.\n\nCombining features using this transformer has the benefit that it allows\ncross validation and grid searches over the whole process.\n\nThe combination used in this example is not particularly helpful on this\ndataset and is only used to illustrate the usage of FeatureUnion.\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
 
@@ -15,7 +15,7 @@
     }, 
     {
       "source": [
-        "\n# Parameter estimation using grid search with cross-validation\n\n\nThis examples shows how a classifier is optimized by cross-validation,\nwhich is done using the :class:`sklearn.model_selection.GridSearchCV` object\non a development set that comprises only half of the available labeled data.\n\nThe performance of the selected hyper-parameters and trained model is\nthen measured on a dedicated evaluation set that was not used during\nthe model selection step.\n\nMore details on tools available for model selection can be found in the\nsections on :ref:`cross_validation` and :ref:`grid_search`.\n\n"
+        "\n# Parameter estimation using grid search with cross-validation\n\n\nThis examples shows how a classifier is optimized by cross-validation,\nwhich is done using the :class:`sklearn.model_selection.GridSearchCV` object\non a development set that comprises only half of the available labeled data.\n\nThe performance of the selected hyper-parameters and trained model is\nthen measured on a dedicated evaluation set that was not used during\nthe model selection step.\n\nMore details on tools available for model selection can be found in the\nsections on `cross_validation` and `grid_search`.\n\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
Original file line number	Diff line number	Diff line change
`@@ -15,7 +15,7 @@`
`15`	`15`	`},`
`16`	`16`	`{`
`17`	`17`	`"source": [`
`18`		- "\n# Biclustering documents with the Spectral Co-clustering algorithm\n\n\nThis example demonstrates the Spectral Co-clustering algorithm on the\ntwenty newsgroups dataset. The 'comp.os.ms-windows.misc' category is\nexcluded because it contains many posts containing nothing but data.\n\nThe TF-IDF vectorized posts form a word frequency matrix, which is\nthen biclustered using Dhillon's Spectral Co-Clustering algorithm. The\nresulting document-word biclusters indicate subsets words used more\noften in those subsets documents.\n\nFor a few of the best biclusters, its most common document categories\nand its ten most important words get printed. The best biclusters are\ndetermined by their normalized cut. The best words are determined by\ncomparing their sums inside and outside the bicluster.\n\nFor comparison, the documents are also clustered using\nMiniBatchKMeans. The document clusters derived from the biclusters\nachieve a better V-measure than clusters found by MiniBatchKMeans.\n\nOutput::\n\n Vectorizing...\n Coclustering...\n Done in 9.53s. V-measure: 0.4455\n MiniBatchKMeans...\n Done in 12.00s. V-measure: 0.3309\n\n Best biclusters:\n ----------------\n bicluster 0 : 1951 documents, 4373 words\n categories : 23% talk.politics.guns, 19% talk.politics.misc, 14% sci.med\n words : gun, guns, geb, banks, firearms, drugs, gordon, clinton, cdt, amendment\n\n bicluster 1 : 1165 documents, 3304 words\n categories : 29% talk.politics.mideast, 26% soc.religion.christian, 25% alt.atheism\n words : god, jesus, christians, atheists, kent, sin, morality, belief, resurrection, marriage\n\n bicluster 2 : 2219 documents, 2830 words\n categories : 18% comp.sys.mac.hardware, 16% comp.sys.ibm.pc.hardware, 16% comp.graphics\n words : voltage, dsp, board, receiver, circuit, shipping, packages, stereo, compression, package\n\n bicluster 3 : 1860 documents, 2745 words\n categories : 26% rec.motorcycles, 23% rec.autos, 13% misc.forsale\n words : bike, car, dod, engine, motorcycle, ride, honda, cars, bmw, bikes\n\n bicluster 4 : 12 documents, 155 words\n categories : 100% rec.sport.hockey\n words : scorer, unassisted, reichel, semak, sweeney, kovalenko, ricci, audette, momesso, nedved\n\n"
	`18`	+ "\n# Biclustering documents with the Spectral Co-clustering algorithm\n\n\nThis example demonstrates the Spectral Co-clustering algorithm on the\ntwenty newsgroups dataset. The 'comp.os.ms-windows.misc' category is\nexcluded because it contains many posts containing nothing but data.\n\nThe TF-IDF vectorized posts form a word frequency matrix, which is\nthen biclustered using Dhillon's Spectral Co-Clustering algorithm. The\nresulting document-word biclusters indicate subsets words used more\noften in those subsets documents.\n\nFor a few of the best biclusters, its most common document categories\nand its ten most important words get printed. The best biclusters are\ndetermined by their normalized cut. The best words are determined by\ncomparing their sums inside and outside the bicluster.\n\nFor comparison, the documents are also clustered using\nMiniBatchKMeans. The document clusters derived from the biclusters\nachieve a better V-measure than clusters found by MiniBatchKMeans.\n\nOutput::\n\n Vectorizing...\n Coclustering...\n Done in 9.53s. V-measure: 0.4455\n MiniBatchKMeans...\n Done in 12.00s. V-measure: 0.3309\n\n Best biclusters:\n ----------------\n bicluster 0 : 1951 documents, 4373 words\n categories : 23% talk.politics.guns, 19% talk.politics.misc, 14% sci.med\n words : gun, guns, geb, banks, firearms, drugs, gordon, clinton, cdt, amendment\n\n bicluster 1 : 1165 documents, 3304 words\n categories : 29% talk.politics.mideast, 26% soc.religion.christian, 25% alt.atheism\n words : god, jesus, christians, atheists, kent, sin, morality, belief, resurrection, marriage\n\n bicluster 2 : 2219 documents, 2830 words\n categories : 18% comp.sys.mac.hardware, 16% comp.sys.ibm.pc.hardware, 16% comp.graphics\n words : voltage, dsp, board, receiver, circuit, shipping, packages, stereo, compression, package\n\n bicluster 3 : 1860 documents, 2745 words\n categories : 26% rec.motorcycles, 23% rec.autos, 13% misc.forsale\n words : bike, car, dod, engine, motorcycle, ride, honda, cars, bmw, bikes\n\n bicluster 4 : 12 documents, 155 words\n categories : 100% rec.sport.hockey\n words : scorer, unassisted, reichel, semak, sweeney, kovalenko, ricci, audette, momesso, nedved\n\n\n"
`19`	`19`	`],`
`20`	`20`	`"cell_type": "markdown",`
`21`	`21`	`"metadata": {}`