Skip to content

[WIP] PCA NEP-37 adding random pathway and CuPy test #17676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

viclafargue
Copy link

@viclafargue viclafargue commented Jun 23, 2020

Reference Issues/PRs

This PR completes the existing experimental attempt to enable NEP-37 for the PCA algorithm.
See #16574

What does this implement/fix? Explain your changes.

  • Implement the pathway that make use of randomized_svd when svd_solver='randomized'
  • Add a CuPy test

@viclafargue
Copy link
Author

I experimented with CuPy and Dask arrays. I could identify 2 blockers :

  • With PCA parameters svd_solver='randomized' and iterated_power >= 2, linalg.lu is required. Unfortunately, this function is not implemented in CuPy. It seems like linalg.lu_factor cannot be used as an alternative. One solution, suggested by @ogrisel, is to use a QR decomposition when linalg.lu is not available in the module.

  • PCA requires linalg.svd. The implementation of this function in the Dask array module differs, as it does not take the full_matrices parameter. The necessary output cannot be retrieved. See Supporting full_matrices argument with Dask's svd dask/dask#3576

@ogrisel
Copy link
Member

ogrisel commented Jun 24, 2020

One solution, suggested by @ogrisel, is to use a QR decomposition when linalg.lu is not available in the module.

Can you please update your PR to implement this solution?

Actually you already did.

@ogrisel
Copy link
Member

ogrisel commented Jun 24, 2020

For reference I opened cupy/cupy#3483 to document the lack of linalg.lu upstream.

@ogrisel
Copy link
Member

ogrisel commented Jun 24, 2020

@WXBN did you run some benchmarks to see what are the benefits of this GPU-based implementation of PCA / randomized_svd?

For instance on a dataset like MNIST or bigger with 50 components.

@viclafargue
Copy link
Author

viclafargue commented Jun 25, 2020

I created a benchmark to compare the performance of the PCA algorithm with and without a GPU. The benefit only appears with a large dataset, here a (10k, 100) dataset. I compared the runtime for different PCA parameters including some that make use of randomized_svd.

Here are the results on an NVIDIA Tesla V100 :

With svd_solver='full' and iterated_power=2:

  • Without GPU : 1.220s
  • With GPU : 0.727s

With svd_solver='full' and iterated_power=10:

  • Without GPU : 1.204s
  • With GPU : 0.013s

With svd_solver='randomized' and iterated_power=2:

  • Without GPU : 0.106s
  • With GPU : 0.641s

With svd_solver='randomized' and iterated_power=10:

  • Without GPU : 0.261s
  • With GPU : 0.276s

@ogrisel
Copy link
Member

ogrisel commented Jun 25, 2020

Thanks for the benchmarks. It also probably depend on the number of features and components to extract.

Also keep in mind that because the GPU version uses QR instead of LU, the results might not have the same explained variance.

@ogrisel
Copy link
Member

ogrisel commented Jun 25, 2020

It's weird that you see a difference when changing iterated_power with svd_solver="full". iterated_power should only impact the randomized solver.

@viclafargue
Copy link
Author

Also keep in mind that because the GPU version uses QR instead of LU, the results might not have the same explained variance.

Yes indeed.

It's weird that you see a difference when changing iterated_power with svd_solver="full". iterated_power should only impact the randomized solver.

Thank you for noticing this. I forgot to run a warm-up launch. CuPy seems to be loading something on the first call, maybe some JIT CUDA code.

I got better results still with NVIDIA Tesla V100 :

With svd_solver='full' and iterated_power=2:

  • Without GPU : 1.195
  • With GPU : 0.013

With svd_solver='full' and iterated_power=10:

  • Without GPU : 1.195
  • With GPU : 0.013

With svd_solver='randomized' and iterated_power=2:

  • Without GPU : 0.034
  • With GPU : 0.010

With svd_solver='randomized' and iterated_power=10:

  • Without GPU : 0.186
  • With GPU : 0.010

@viclafargue viclafargue force-pushed the pca_nep37_random_pathway branch from 08eacfe to da3135d Compare June 25, 2020 14:43
@ogrisel
Copy link
Member

ogrisel commented Jun 25, 2020

Thanks for the update, that's interesting :)

@viclafargue viclafargue force-pushed the pca_nep37_random_pathway branch from da3135d to cc3e539 Compare June 25, 2020 14:45
@ogrisel
Copy link
Member

ogrisel commented Jun 25, 2020

In your benchmark script could you please report pca.explained_variance_.sum() in addition to the timings?

@viclafargue
Copy link
Author

Done ;)

@ogrisel
Copy link
Member

ogrisel commented Jun 25, 2020

I don't have my GPU machine handy (it's too warm today, I want to keep my flat cool today and tomorrow ;) what are the results ? Do the GPU variants with QR instead of LU explain approximately the same amount of variance?

@viclafargue
Copy link
Author

viclafargue commented Jun 26, 2020

Do the GPU variants with QR instead of LU explain approximately the same amount of variance?

There seems to be a small difference :

With svd_solver='full' and iterated_power=2:

  • Without GPU : explained variance: 5.658
  • With GPU : explained variance: 5.658

With svd_solver='full' and iterated_power=10:

  • Without GPU : explained variance: 5.658
  • With GPU : explained variance: 5.658

With svd_solver='randomized' and iterated_power=2:

  • Without GPU : explained variance: 5.145
  • With GPU : explained variance: 5.144

With svd_solver='randomized' and iterated_power=10:

  • Without GPU : explained variance: 5.624
  • With GPU : explained variance: 5.636

@ogrisel
Copy link
Member

ogrisel commented Jun 26, 2020

@WXBN It would be interesting to compare those results with the cuml.decomposition.PCA implementation from rapidsai.

@viclafargue
Copy link
Author

Unfortunately, the only options available for cuML's PCA svd_solver parameter are 'full' and 'jacobi'. Because of this, I can only compare on svd_solver='full'.

With svd_solver='full' and iterated_power=2:
Without GPU : runtime: 1.197s, explained variance: 6.967
With CuPy : runtime: 0.016s, explained variance: 6.967
With cuML : runtime: 0.007s, explained variance: 6.967

With svd_solver='full' and iterated_power=10:
Without GPU : runtime: 1.210s, explained variance: 6.967
With CuPy : runtime: 0.016s, explained variance: 6.967
With cuML : runtime: 0.007s, explained variance: 6.967

With svd_solver='randomized' and iterated_power=2:
Without GPU : runtime: 0.032s, explained variance: 6.745
With CuPy : runtime: 0.012s, explained variance: 6.693

With svd_solver='randomized' and iterated_power=10:
Without GPU : runtime: 0.204s, explained variance: 6.945
With CuPy : runtime: 0.040s, explained variance: 6.949

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants