Description
The clustering comparison at http://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html is somewhat misleading in that the data are totally unlike anything that would be seen in 99% of cases. I realise that they're toy examples, but it would also be good to get something more realistic for comparison.
Here is a simple dataset that has one wide Gaussian distribution, with two smaller Gaussian distributions overlapping it to different extents:
This shows the performance of the various models on more realistic data. It especially shows that DBSCAN isn't perfect ;)
Here's the dataset:
gaussians = np.concatenate([
multiply(np.random.randn(500, 2), 10),
add(np.random.randn(500, 2), (10,10)),
add(np.random.randn(500,2), (-5, -5)),
]), None
The parameters are all arbitrary, and the performances for the various algorithms change a fair bit given different parameters, but since this is just to give a rough idea of the relative performances, I don't think that matters that much.