Closed
Description
I recommend the following clustering algorithms:
BIRCH is an clustering algorithm that received the SIGMOD 10 year test of time award. The algorithm minimize run time, define clusters without scanning all the data, and exploit the non uniformity of data to treat dense areas as one.
https://code.google.com/p/birch-clustering-algorithm/
https://en.wikipedia.org/wiki/BIRCH_%28data_clustering%29
I think this can be very useful for large datasets.
Input parameters would be the data and a radius to indicate how close points are.