MemoryError from sklearn.metrics.silhouette_samples

Please note that this is a follow up issue to [#4701](https://github.com/scikit-learn/scikit-learn/issues/4701) and [#4197](https://github.com/scikit-learn/scikit-learn/issues/4197) not a duplicate. 

Calling sklearn.metrics.silhouette_samples with a large data set will cause a memory MemoryError in numpy.dot. This is not an issue with numpy.dot but with the implementation of sklearn.metrics.silhouette_samples. For a data set of length N it computes the full NxN pairwise distance matrix which can clearly take a lot of memory. Mathematically all that is needed in memory is a list of the same length as the largest cluster and the rest can be done by appropriate looping. 

This issue is to propose a new feature where a new parameter is added to the sklearn.metrics.silhouette_samples method. This would be a boolean to indicate if one wanted to use the faster but more memory consumptive existing method or the alternative method which is better on memory but likely slower.

A working version of the the new method was posted in an answer on [StackOverflow](https://stackoverflow.com/questions/47702750/memoryerror-from-sklearn-metrics-silhouette-samples) 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MemoryError from sklearn.metrics.silhouette_samples #10279

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

MemoryError from sklearn.metrics.silhouette_samples #10279

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions