Skip to content

Add a more useful example to the cluster comparison #2890

Closed
@naught101

Description

@naught101

The clustering comparison at http://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html is somewhat misleading in that the data are totally unlike anything that would be seen in 99% of cases. I realise that they're toy examples, but it would also be good to get something more realistic for comparison.

Here is a simple dataset that has one wide Gaussian distribution, with two smaller Gaussian distributions overlapping it to different extents:

clustering with gaussians

This shows the performance of the various models on more realistic data. It especially shows that DBSCAN isn't perfect ;)

Here's the dataset:

gaussians = np.concatenate([
    multiply(np.random.randn(500, 2), 10),
    add(np.random.randn(500, 2), (10,10)),
    add(np.random.randn(500,2), (-5, -5)),
    ]), None

The parameters are all arbitrary, and the performances for the various algorithms change a fair bit given different parameters, but since this is just to give a rough idea of the relative performances, I don't think that matters that much.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions