Add a more useful example to the cluster comparison

The clustering comparison at http://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html is somewhat misleading in that the data are totally unlike anything that would be seen in 99% of cases. I realise that they're toy examples, but it would also be good to get something more realistic for comparison.

Here is a simple dataset that has one wide Gaussian distribution, with two smaller Gaussian distributions overlapping it to different extents:

![clustering with gaussians](https://f.cloud.github.com/assets/167164/2252300/9770a114-9da6-11e3-87b1-d0efbefb0ed8.png)

This shows the performance of the various models on more realistic data. It especially shows that DBSCAN isn't perfect ;)

Here's the dataset:

```
gaussians = np.concatenate([
    multiply(np.random.randn(500, 2), 10),
    add(np.random.randn(500, 2), (10,10)),
    add(np.random.randn(500,2), (-5, -5)),
    ]), None
```

The parameters are all arbitrary, and the performances for the various algorithms change a fair bit given different parameters, but since this is just to give a rough idea of the relative performances, I don't think that matters that much.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add a more useful example to the cluster comparison #2890

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Add a more useful example to the cluster comparison #2890

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions