Description
Incremental PCA is consistently giving convergence issue with dataframe of 18000, 18000
Reproducing code example:
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA, IncrementalPCA
df_data=pd.read_csv("/home/ubuntu/df_data_18000_18000_data1.csv")
df_data.set_index('Unnamed: 0', inplace=True)
df_data=df_data.astype('int8')
ipca = IncrementalPCA(n_components=3600, batch_size=3600)
data_ipca = ipca.fit_transform(df_data)
total_explained_variances_ratio = sum(list(ipca.explained_variance_ratio_))
print("Total explained variance in IPCA is {}".format(total_explained_variances_ratio))
df = pd.DataFrame(data_ipca, index=list(df_data.index))
print("Size of vector space after IncrementalPCA {}".format(df.shape))
[df_data_18000_18000_data2.csv.zip](https://github.com/numpy/numpy/files/4486707/df_data_18000_18000_data2.csv.zip)
Error message:
Trackback:
Traceback (most recent call last):
File "ipca_script.py", line 8, in
data_ipca = ipca.fit_transform(df_data)
File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/sklearn/base.py", line 553, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/sklearn/decomposition/incremental_pca.py", line 201, in fit
self.partial_fit(X[batch], check_input=False)
File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/sklearn/decomposition/incremental_pca.py", line 279, in partial_fit
U, S, V = linalg.svd(X, full_matrices=False)
File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/scipy/linalg/decomp_svd.py", line 132, in svd
Numpy/Python version information:
1.17.4 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0]
print(sklearn.version)
0.21.2