AgglomerativeClustering with metric='cosine' broken for all-zero rows

Original title: ***"Cosine" affinity type in FeatureAgglomeration somehow casue memory overflow in a particular dataset***




#### Description



Please carefully test the codes below! Using "cosine" affinity type in FeatureAgglomeration, the codes will cause memory overflow with a particular dataset ([Download here](http://gleason.case.edu/webdata/tpot/)). It is all right with other affinity types. And this issue cannot be reproduced using simulation data (like using make_classification in sci-kit learn). Not sure why it happened.
#### Steps/Code to Reproduce



```
from sklearn.cluster import FeatureAgglomeration
import numpy as np
import time
train_data = np.genfromtxt('fold_2_trainFeatVec.csv', delimiter=',')
train_labels= np.genfromtxt('fold_2_trainLabels.csv', delimiter=',')


fa = FeatureAgglomeration(affinity="cosine", linkage="average") # no matter linage ="average" or "complete"
time_start= time.time()
fa.fit(train_data, train_labels) # memory keeps increasing here
time_end = time.time()
print('Time usage:',time_end-time_start)

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

AgglomerativeClustering with metric='cosine' broken for all-zero rows #7689

Description

Steps/Code to Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

AgglomerativeClustering with metric='cosine' broken for all-zero rows #7689

Description

Description

Steps/Code to Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions