Skip to content

OPTICS Reconsider whether there're unnecessary parameters #12375

Closed
@qinhanmin2014

Description

@qinhanmin2014

In OPTICS (_extract_optics), we include several parameters which do not appear in the original paper (Automatic Extraction of Clusters from Hierarchical Clustering Representations), including rejection_ratio, significant_min and ratio of points in the child we check (not exposed to users). I can't understand why we need these parameters. I think we take these code from https://github.com/amyxzhang/OPTICS-Automatic-Clustering and she noted in her code : An implementation of the following algorithm, with some minor add-ons. I think as scikit-learn, we should check whether these add-ons are reasonable and necessary. E.g.,

  • For rejection_ratio, the original paper said that "We experimented with different ratios and in fact, any value in the range 0.7-0.8 always gives good results", so I guess it won't make too much difference?

  • For significant_min, I don't think it makes sense to users, since we have normalized RD at this point.

  • For the magical 0.8 (ratio of points in the child we check) inside the code, I think we should remove it, or at least make it public (I won't vote +1 for it at this point).

  • I think we need to allow users to pass int to min_maxima_ratio, like min_cluster_size. And the relationship between min_cluster_size and min_samples is still unclear.

  • We're using number of points when checking whether a point needs to be moved, apparently that's wrong right? We should use RD.

ping @espg

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions