Support sample weight in clusterers

Currently no clusterers (or clustering metrics) support weighted dataset (although support for DBSCAN is proposed in #3994).

Weighting can be a compact way of representing repeated samples, and may affect cluster means and
variance, average link between clusters, etc.

Ideally BIRCH's global clustering stage should be provided a weighted dataset, and is current use of unweighted representatives may make its parametrisation more brittle.

This could be subject to an invariance test along the lines of:

```
sample_weight = np.random.randint(0, 10, size=X.shape[0])
weighted_y = clusterer.fit_predict(X, sample_weight=sample_weight)
repeated_y = clusterer.fit_predict(np.repeat(X, sample_weight))
assert_equal(adjusted_rand_score(np.repeat(weighted_y, sample_weight), repeated_y)
# NB: this is only a useful sufficient test if weighted_y differs from clusterer.fit_predict(X)
```

(There is also a minor question of whether `sample_weight` should be universally accepted by `ClusterMixin` or whether `WeightedClusterMixin` should be created, etc.)

Sample weight support for clusterers:
- [ ] Affinity propagation (I don't know this well enough to know the applicability)
- [ ] BIRCH
- [x] DBSCAN
- [ ] Hierarchical -> Ward link
- [x] Hierarchical -> Complete link (N/A, as far as I can tell)
- [ ] Hierarchical -> Average link
- [x] K Means
- [x] Minibatch K Means
- [ ] Mean shift
- [ ] Spectral


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support sample weight in clusterers #3998

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Support sample weight in clusterers #3998

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions