Skip to content

BLD Reduce generated build file path lengths to avoid Windows path length limitation #31212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 5, 2025

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Apr 16, 2025

This should reduce the path length of our build files, making more likely to avoid the long path issues when building from source on Windows, see #31149 (comment) and #31123 (comment).

I feel this PR doesn't over-complicate the meson.build files while making it less likely that the long path limitation is reached on Windows.

FWIW, scipy is using generator, maybe in part for historical reasons, since Cython was not supported by meson at the beginning.

cc @rgommers in case you have heard of similar cases of meson generated files path length limitations on Windows. Also do you think it is worth opening a meson issue to ask if they would consider creating the generate .c or .cpp files in a less nested folder (maybe there is a reason behind it, not sure ...)?

How it works

Our longest generated file relative path is currently 201 characters , with this PR it would be 151 characters. What matters is the absolute path length and with the temporary build env created by pip you easily reach the long path limit (plus the default temp dir %TMP is quite long on Windows C:\Users\<user_name>\AppData\Local\Temp). The Windows long path limit is 260 characters (i.e. something like C:\path-with-max-256-characeters<NULL_CHARACTER>) according to the Windows doc.

By using a meson generator to run cython rather than meson built-in cython functionality, the generated .c or .cpp file is created in a less nested folder. For example the longest generated file relative path right now is the following one (201 characters)

build/cp313/sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors_classmode.cpython-313-x86_64-linux-gnu.so.p/sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors_classmode.pyx.cpp

With this PR it is 151 characters (notice the second sklearn/metrics/_pairwise_distances_reduction has been removed + .pyx.cpp -> .cpp. In total, 50 characters has been removed):

build/cp313/sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors_classmode.cpython-313-x86_64-linux-gnu.so.p/_radius_neighbors_classmode.cpp

@lesteve lesteve changed the title Reduce meson build nesting BLD Use meson generator to reduce build file path lengths Apr 16, 2025
Copy link

github-actions bot commented Apr 16, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 32c2698. Link to the linter CI: here

@lesteve lesteve marked this pull request as ready for review April 16, 2025 12:03
@lesteve lesteve changed the title BLD Use meson generator to reduce build file path lengths BLD Reduce generated build file path lengths to avoid Windows path length limitation Apr 16, 2025
@lesteve lesteve added this to the 1.7 milestone Apr 18, 2025
@lesteve
Copy link
Member Author

lesteve commented Apr 18, 2025

Weird failure in Ubuntu Atlas CI build doesn't seem related ? From failing build log. Merging main build log seems all fine 🤔 🤷 ...

_____ test_ridge_regression_check_arguments_validity[sag-array-None-False] _____
[gw0] linux -- Python 3.12.3 /home/vsts/work/1/s/testvenv/bin/python

return_intercept = False, sample_weight = None
        equal_nan  = True
        err_msg    = ''
        rtol       = 0
        verbose    = True
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (<function assert_allclose.<locals>.compare at 0x7fc514a2ede0>, array([0.99996399, 2.00006083, 0.10015876]), array([1. , 2. , 0.1]))
kwds = {'equal_nan': True, 'err_msg': '', 'header': 'Not equal to tolerance rtol=0, atol=0.0001', 'verbose': True}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError: 
E           Not equal to tolerance rtol=0, atol=0.0001
E           
E           Mismatched elements: 1 / 3 (33.3%)
E           Max absolute difference: 0.00015876
E           Max relative difference: 0.00158762
E            x: array([0.999964, 2.000061, 0.100159])
E            y: array([1. , 2. , 0.1])

args       = (<function assert_allclose.<locals>.compare at 0x7fc514a2ede0>, array([0.99996399, 2.00006083, 0.10015876]), array([1. , 2. , 0.1]))
func       = <function assert_array_compare at 0x7fc5255cac00>
kwds       = {'equal_nan': True, 'err_msg': '', 'header': 'Not equal to tolerance rtol=0, atol=0.0001', 'verbose': True}
self       = <contextlib._GeneratorContextManager object at 0x7fc5255a7320>

/usr/lib/python3.12/contextlib.py:81: AssertionError

@rgommers
Copy link
Contributor

cc @rgommers in case you have heard of similar cases of meson generated files path length limitations on Windows. Also do you think it is worth opening a meson issue to ask if they would consider creating the generate .c or .cpp files in a less nested folder (maybe there is a reason behind it, not sure ...)?

I haven't run into this before. I agree the generated paths look overly long, so perhaps that can be improved. @eli-schwartz may be able to confirm. The subfolder name repetition after path/to/target_filename.p/ looks unnecessary.

@eli-schwartz
Copy link

Previously: mesonbuild/meson#10788

@eli-schwartz
Copy link

eli-schwartz commented Apr 18, 2025

The subfolder name repetition after path/to/target_filename.p/ looks unnecessary.

This may possibly be motivated by a desire to identify and disambiguate between the possibility of having a single target which compiles multiple source files:

  • foo/bar/util.pyx
  • foo/baz/util.pyx

They can't both be generated outright as

build/mytarget.cpython-313-x86_64-linux-gnu.so.p/util.pyx.cpp

And the "simplest" answer for how to distinguish them is "just add the original path in there, and people can also investigate the build by simply matching the source path with the generated path".

At least in this case you don't have to deal with both path_max and filename_max...

@vitorpohlenz
Copy link
Contributor

@lesteve as suggested in the comment.
And despite the fact that I do not understand much about compilers, I could validate the situation that I've faced with MAX_PATH in Windows, and your solution solved, at least what I faced in the past.

I created a "long path" C:\Users\vitor.pohlenz\Documents\Python\long_path_to_test\scikit-learn and it breaks in the current version of v1.6.1 as #31149 comment.
But it works completely fine with your modifications in this PR (at least on my "machine" and "long path").

@lesteve
Copy link
Member Author

lesteve commented Apr 18, 2025

Thanks @vitorpohlenz for taking the time to test this PR 🙏! I was rather confident the work-around in this PR would make it less likely to hit long path limitation on Windows, but it's definitely useful to have an external confirmation.

@lesteve
Copy link
Member Author

lesteve commented Apr 18, 2025

This may possibly be motivated by a desire to identify and disambiguate between the possibility of having a single target which compiles multiple source files:

* foo/bar/util.pyx

* foo/baz/util.pyx

@eli-schwartz thanks for your inputs, super useful! I indeed thought about something like this but my guess is that using two .pyx in one extension module is actually is not possible in Cython (I am certainly not a Cython expert unfortunately so this would definitely need double-checking). Even if this was a Cython feature, I would expect that very few projects use it, since scipy and scikit-learn don't use it.

@lesteve
Copy link
Member Author

lesteve commented Apr 18, 2025

@eli-schwartz I had a quick and dirty attempt that seems to agree with my initial intuition that it isn't possible to have two .pyx in a single extension module, see below.

This SO question seems to go towards the same direction although I have to admit I did not read everything in details 😅.

Here is my quick and dirty attempt, I added the following to sklearn/cluster/meson.build and everything builds fine.

py.extension_module(
  'combined',
  ['test1.pyx', 'test2.pyx'],
  cython_args: cython_args,
  subdir: 'sklearn/cluster',
  install: true,
)


py.extension_module(
  'test1',
  ['test1.pyx'],
  cython_args: cython_args,
  subdir: 'sklearn/cluster',
  install: true,
)


py.extension_module(
  'test2',
  ['test2.pyx'],
  cython_args: cython_args,
  subdir: 'sklearn/cluster',
  install: true,
)

Here is what I get when I try to import the built modules:

  • python -c 'from sklearn.cluster import test1, test2' works fine
  • python -c 'from sklearn.cluster import combined' errors:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    from sklearn.cluster import combined
ImportError: dynamic module does not define module export function (PyInit_combined)

The content of the test{1,2}.pyx for explicitness:

# test1.pyx
def test1(): pass
# test2.pyx
def test2(): pass

@rgommers
Copy link
Contributor

Indeed, it's not possible to combine two .pyx files AFAIK. It wouldn't work without special handling, since you must have exactly one PyInit_* function per extension module.

@eli-schwartz
Copy link

eli-schwartz commented Apr 18, 2025

@eli-schwartz thanks for your inputs, super useful! I indeed thought about something like this but my guess is that using two .pyx in one extension module is actually is not possible in Cython (I am certainly not a Cython expert unfortunately so this would definitely need double-checking). Even if this was a Cython feature, I would expect that very few projects use it, since scipy and scikit-learn don't use it.

Given that pxd exists I suppose having multiple pyx could be seen as rather redundant. The only actual potential for a distinction between the two would be the ability to parallel-compile the module unit in individual chunks.

This SO question seems to go towards the same direction although I have to admit I did not read everything in details 😅.

... which appears to be about having a single final *.so which the python interpreter can import multiple modules from, and that's indeed madness and not actually a decision that cython gets to make, but it wasn't what I was talking about.

Because it (multiple source files per installable output artifact) is, however, very much allowed for say vala, which is a language meson supports similarly to cython, in that it is transpiled to C and then the C is compiled.

Maybe it makes sense to only use the file basename, for cython specifically. That's entirely possible, and if this is changed directly in meson then you can take advantage of builtin cython support while still playing nice with the Windows tempdir etc.

Indeed, it's not possible to combine two .pyx files AFAIK. It wouldn't work without special handling, since you must have exactly one PyInit_* function per extension module.

... and e.g. vala handles this because in vala, you have to write your own main() entrypoint, thus it doesn't try to autogenerate one into every file and you don't end up with two entrypoints in one built installable artifact.

@lesteve
Copy link
Member Author

lesteve commented Apr 19, 2025

Maybe it makes sense to only use the file basename, for cython specifically.

That would indeed seem like a good option if it is feasible on the meson side.

if this is changed directly in meson then you can take advantage of builtin cython support while still playing nice with the Windows tempdir etc.

I will be AFK for a few weeks, so it's definitely not going to happen short-term, but it did not seem like the automatic shortening of path from mesonbuild/meson#10788 actually was kicking in in the issues we saw on Windows in a scikit-learn context. Maybe the heuristics of for "filepaths with >5 components" was not met? Full disclosure: I did not read mesonbuild/meson#10788 in enough details, so I'll try to look at it in more details when I am back.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1 to go with this solution for now

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice for meson to fix this, but for now I guess this can do.

@adrinjalali adrinjalali merged commit c26fa16 into scikit-learn:main May 5, 2025
36 checks passed
@lesteve lesteve deleted the reduce-meson-build-nesting branch May 5, 2025 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants