Compile with pip Nvidia packages... (#236)

PhilipVinc · web-flow · commit 78299511b560 · 2024-06-07T16:37:29.000+02:00
If Nvidia pypi packages are installed, use those to compile and link mpi4jax, setting the rpath in the .so file accordingly.
diff --git a/.github/workflows/build-gpu-ext.yml b/.github/workflows/build-gpu-ext.yml
@@ -23,30 +23,40 @@ jobs:
             cuda: "12.0"
           - os: ubuntu-22.04
             cuda: "12.1"
+          - os: ubuntu-22.04
+            cuda: "pypi"
 
     steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v4
 
     # make sure tags are fetched so we can get a version
     - run: |
         git fetch --prune --unshallow --tags
 
     - name: Set up Python
-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v5
       with:
-        python-version: '3.x'
+        python-version: '3.11'
 
     - name: Install CUDA
       env:
         cuda: ${{ matrix.cuda }}
       run: |
-        source ./conf/install-cuda-ubuntu.sh
-        if [[ $? -eq 0 ]]; then
-          # Set paths for subsequent steps, using ${CUDA_PATH}
-          echo "Adding CUDA to CUDA_PATH, PATH and LD_LIBRARY_PATH"
-          echo "CUDA_PATH=${CUDA_PATH}" >> $GITHUB_ENV
-          echo "${CUDA_PATH}/bin" >> $GITHUB_PATH
-          echo "LD_LIBRARY_PATH=${CUDA_PATH}/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV
+        if [[ "${cuda}" == 'pypi' ]]; then
+          echo "Installing jax[cuda] from PyPI"
+          pip install 'nvidia-cublas-cu12>=12.1.3.1'
+          pip install 'nvidia-cuda-cupti-cu12>=12.1.105'
+          pip install 'nvidia-cuda-nvcc-cu12>=12.1.105'
+          pip install 'nvidia-cuda-runtime-cu12>=12.1.105'
+        else
+          source ./conf/install-cuda-ubuntu.sh
+          if [[ $? -eq 0 ]]; then
+            # Set paths for subsequent steps, using ${CUDA_PATH}
+            echo "Adding CUDA to CUDA_PATH, PATH and LD_LIBRARY_PATH"
+            echo "CUDA_PATH=${CUDA_PATH}" >> $GITHUB_ENV
+            echo "${CUDA_PATH}/bin" >> $GITHUB_PATH
+            echo "LD_LIBRARY_PATH=${CUDA_PATH}/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV
+          fi
         fi
       shell: bash
 
diff --git a/README.rst b/README.rst
@@ -26,13 +26,24 @@ Installation
    $ pip install mpi4jax                     # Pip
    $ conda install -c conda-forge mpi4jax    # conda
 
-If you use pip and don't have JAX installed already, you will also need to do:
+Depending on the different jax backends you want to use, you can install mpi4jax in the following way
 
 .. code:: bash
 
-   $ pip install jaxlib
+   # pip install 'jax[cpu]'
+   $ pip install mpi4jax
 
-(or an equivalent GPU-enabled version, `see the JAX installation instructions <https://github.com/google/jax#installation>`_)
+   # pip install -U 'jax[cuda12_pip]' -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+   $ pip install cython
+   $ pip install mpi4jax --no-build-isolation
+
+   # pip install -U 'jax[cuda12_local]' -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+   $ CUDA_ROOT=XXX pip install mpi4jax
+
+   # pip install -U 'jax[cuda12]'
+   # Not yet supported
+
+(for more informations on jax GPU distributions, `see the JAX installation instructions <https://github.com/google/jax#installation>`_)
 
 In case your MPI installation is not detected correctly, `it can help to install mpi4py separately <https://mpi4py.readthedocs.io/en/stable/install.html>`_. When using a pre-installed ``mpi4py``, you *must* use ``--no-build-isolation`` when installing ``mpi4jax``:
 
diff --git a/docs/installation.rst b/docs/installation.rst
@@ -8,7 +8,7 @@ Start by `installing a suitable version of JAX and jaxlib <https://github.com/go
 
 .. code:: bash
 
-   $ pip install jax jaxlib
+   $ pip install 'jax[cpu]'
 
 .. note::
 
@@ -67,13 +67,26 @@ Installation with NVIDIA GPU support (CUDA)
 
 .. note::
 
-   To use JAX on the GPU, make sure that your ``jaxlib`` is `built with CUDA support <https://github.com/google/jax#installation>`_.
+   There are 3 ways to install jax with CUDA support:
+   - using a pypi-distributed CUDA installation (suggested by jax developers) ``pip install -U 'jax[cuda12_pip]' -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html`` 
+   - using the locally-installed CUDA version, which must be compatible with jax. ``pip install -U 'jax[cuda12_local]' -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html`` 
+   The procedure to install ``mpi4jax`` for the two situations is different.
+   - using pip install -U 'jax[cuda12]', but this is not supported yet by ``mpi4jax``
 
-``mpi4jax`` supports communication of JAX arrays stored in GPU memory.
+To use ``mpi4jax`` with pypi-distributed nvidia packages, which is the preferred way to install jax, you **must** install ``mpi4jax`` disabling
+the build-time-isolation in order for it to link to the libraries in the nvidia-cuda-nvcc-cu12 package. To do so, run the following command:
 
-To build ``mpi4jax``'s GPU extensions, we need to be able to locate the CUDA headers on your system. If they are not detected automatically, you can set the environment variable :envvar:`CUDA_ROOT` when installing ``mpi4jax``::
+.. code:: bash
+
+   # assuming pip install -U 'jax[cuda12_pip]' -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html has been run
+   $ pip install cython
+   $ pip install mpi4jax --no-build-isolation
+
+Alternatively, if you want to install ``mpi4jax`` with a locally-installed CUDA version, you can run the following command we need 
+to be able to locate the CUDA headers on your system. If they are not detected automatically, you can set the environment 
+variable :envvar:`CUDA_ROOT` when installing ``mpi4jax``::
 
-   $ CUDA_ROOT=/usr/local/cuda pip install mpi4jax
+   $ CUDA_ROOT=/usr/local/cuda pip install --no-build-isolation mpi4jax
 
 This is sufficient for most situations. However, ``mpi4jax`` will copy all data from GPU to CPU and back before and after invoking MPI.
 
diff --git a/setup.py b/setup.py
@@ -2,6 +2,10 @@
 import sys
 import shlex
 
+import importlib.util
+import pathlib
+import fnmatch
+
 from setuptools import setup, find_packages
 from setuptools.extension import Extension
 from setuptools.command.build_ext import build_ext
@@ -91,6 +95,30 @@ def build_extensions(self):
 # Cuda detection
 
 
+# partly taken from JAX
+# https://github.com/google/jax/blob/4cca2335220dcc953edd2ac764b2387e53527495/jax/_src/lib/__init__.py#L129
+def get_cuda_paths_from_nvidia_pypi():
+    # try to check if nvidia-cuda-nvcc-cu* is installed
+    # we need to get the site-packages of this install. to do so we use
+    # mpi4py which must be installed
+    mpi4py_spec = importlib.util.find_spec("mpi4py")
+    depot_path = pathlib.Path(os.path.dirname(mpi4py_spec.origin)).parent
+
+    # If the pip package nvidia-cuda-nvcc-cu11 is installed, it should have
+    # both of the things XLA looks for in the cuda path, namely bin/ptxas and
+    # nvvm/libdevice/libdevice.10.bc
+    #
+    # The files are split in two sets of directories, so we return both
+    maybe_cuda_paths = [
+        depot_path / "nvidia" / "cuda_nvcc",
+        depot_path / "nvidia" / "cuda_runtime",
+    ]
+    if all(p.is_dir() for p in maybe_cuda_paths):
+        return [str(p) for p in maybe_cuda_paths]
+    else:
+        return []
+
+
 # Taken from CUPY (MIT License)
 def get_cuda_path():
     nvcc_path = search_on_path(("nvcc", "nvcc.exe"))
@@ -169,22 +197,86 @@ def get_sycl_info():
 sycl_info = get_sycl_info()
 
 
+def find_files(bases, pattern):
+    """Return list of files matching pattern in base folders and subfolders."""
+    if isinstance(bases, (str, pathlib.Path)):
+        bases = [bases]
+
+    result = []
+    for base in bases:
+        for root, dirs, files in os.walk(base):
+            for name in files:
+                if fnmatch.fnmatch(name, pattern):
+                    result.append(os.path.join(root, name))
+    return result
+
+
 def get_cuda_info():
-    cuda_info = {"compile": [], "libdirs": [], "libs": []}
-    cuda_path = get_cuda_path()
-    if not cuda_path:
-        return cuda_info
+    cuda_info = {"compile": [], "libdirs": [], "libs": [], "rpaths": []}
 
-    incdir = os.path.join(cuda_path, "include")
-    if os.path.isdir(incdir):
-        cuda_info["compile"].append(incdir)
+    # First check if the nvidia-cuda-nvcc-cu* package is installed. We ignore CUDA_ROOT
+    # because that is the same behaviour of jax.
+    cuda_paths = get_cuda_paths_from_nvidia_pypi()
 
-    for libdir in ("lib64", "lib"):
-        full_dir = os.path.join(cuda_path, libdir)
-        if os.path.isdir(full_dir):
-            cuda_info["libdirs"].append(full_dir)
+    # If not, try to find the CUDA_PATH by hand
+    if len(cuda_paths) > 0:
+        nvidia_pypi_package = True
+    else:
+        nvidia_pypi_package = False
+        _cuda_path = get_cuda_path()
+        if _cuda_path is None:
+            cuda_paths = []
+        else:
+            cuda_paths = [_cuda_path]
 
-    cuda_info["libs"].append("cudart")
+    if len(cuda_paths) == 0:
+        return cuda_info
+
+    for cuda_path in cuda_paths:
+        incdir = os.path.join(cuda_path, "include")
+        if os.path.isdir(incdir):
+            cuda_info["compile"].append(incdir)
+
+        for libdir in ("lib64", "lib"):
+            full_dir = os.path.join(cuda_path, libdir)
+            if os.path.isdir(full_dir):
+                cuda_info["libdirs"].append(full_dir)
+
+    # We need to link against libcudart.so
+    #   - If we are using standard CUDA installations, we simply add a link flag to
+    #     libcudart.so
+    #   - If we are using the nvidia-cuda-nvcc-cu* package, we need to find the exact
+    #     version of libcudart.so to link against because the the package does not provide
+    #     a generic binding to libcudart.so but only libcudart.so.XX.
+    #
+    # Moreover, if we are using nvidia-cuda-nvcc we must add @rpath (runtime search paths)
+    # because we do not expect the user to set LD_LIBRARY_PATH to the nvidia-cuda-nvcc
+    # package.
+    if not nvidia_pypi_package:
+        cuda_info["libs"].append("cudart")
+    else:
+        possible_libcudart = find_files(cuda_paths, "libcudart.so*")
+
+        if "libcudart.so" in possible_libcudart:
+            # If generic symlink is present, use standard linker flag.
+            # In theory with nvidia-cuda-nvcc-cu12 we should never reach this point
+            # But in the future they might fix it.
+            cuda_info["libs"].append("cudart")
+        elif len(possible_libcudart) > 0:
+            # This should be the standard case for nvidia-cuda-nvcc-cu*
+            # where we find a library libcudart.so.XX . The syntax to link to a
+            # specific version is -l:libcudart.so.XX
+            # We arbitrarily choose the first one
+            # and we add the runtime search path accordingly
+            lib_to_link = possible_libcudart[0]
+            cuda_info["libs"].append(f":{os.path.basename(lib_to_link)}")
+            cuda_info["rpaths"].append(os.path.dirname(lib_to_link))
+        else:
+            # If we cannot find libcudart.so, we cannot build the extension
+            # This should never happen with nvidia-cuda-nvcc-cu* package
+            cuda_info["libs"].append("cudart")
+
+    print("\n\nCUDA INFO:", cuda_info, "\n\n")
     return cuda_info
 
 
@@ -237,13 +329,18 @@ def get_extensions():
         )
 
     if cuda_info["compile"] and cuda_info["libdirs"]:
+        extra_extension_args = {}
+        if len(cuda_info["rpaths"]) > 0:
+            extra_extension_args["runtime_library_dirs"] = cuda_info["rpaths"]
+
         extensions.append(
             Extension(
                 name=f"{CYTHON_SUBMODULE_NAME}.mpi_xla_bridge_gpu",
                 sources=[f"{CYTHON_SUBMODULE_PATH}/mpi_xla_bridge_gpu.pyx"],
                 include_dirs=cuda_info["compile"],
                 library_dirs=cuda_info["libdirs"],
                 libraries=cuda_info["libs"],
+                **extra_extension_args,
             )
         )
     else: