Closed
Description
Original title: "Cosine" affinity type in FeatureAgglomeration somehow casue memory overflow in a particular dataset
Description
Please carefully test the codes below! Using "cosine" affinity type in FeatureAgglomeration, the codes will cause memory overflow with a particular dataset (Download here). It is all right with other affinity types. And this issue cannot be reproduced using simulation data (like using make_classification in sci-kit learn). Not sure why it happened.
Steps/Code to Reproduce
from sklearn.cluster import FeatureAgglomeration
import numpy as np
import time
train_data = np.genfromtxt('fold_2_trainFeatVec.csv', delimiter=',')
train_labels= np.genfromtxt('fold_2_trainLabels.csv', delimiter=',')
fa = FeatureAgglomeration(affinity="cosine", linkage="average") # no matter linage ="average" or "complete"
time_start= time.time()
fa.fit(train_data, train_labels) # memory keeps increasing here
time_end = time.time()
print('Time usage:',time_end-time_start)