-
-
Notifications
You must be signed in to change notification settings - Fork 13k
flang: experiment with Ninja #235365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
flang: experiment with Ninja #235365
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
-DLLVM_RAM_PER_COMPILE_JOB=5000 | ||
-DLLVM_RAM_PER_LINK_JOB=10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helped on arm64 macOS (3 -> 1.5 hour).
14-x86_64 was slower (1 -> 1.75 hour); however, may just be an unlucky run as 13-x86_64 is the same and LLVM will limit the max jobs to number of logical cores. So 64GB / 5GB ~ 12 jobs should be ignored.
May try a couple more things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For comparison, Fedora approximates 6.114 GB/process though that seems to be applied to total processes like our MAKEFLAGS
- https://src.fedoraproject.org/rpms/flang/blob/rawhide/f/flang.spec#_72.
EDIT: Though I think GCC + ld.bfd usually has worse memory usage than Clang + Apple ld / lld
-DLLVM_RAM_PER_COMPILE_JOB=5000 | ||
-DLLVM_RAM_PER_LINK_JOB=10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to read the available RAM somehow instead of just hard-coding this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On macOS, you can get this with
sysctl -n hw.memsize
On Linux, you can try reading /proc/meminfo
, though this might not work in GitHub Actions/Docker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be scaled on available RAM as it sets a limit to total processes based on available ÷ input
clamped to [1, logical_cores]
.
The "best" input values will likely change per release (though not enough to impact CI runners). I may try to get a more accurate approximation by doing local builds but would still need to hard-code it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this (generated by ChatGPT, but tested locally and in a codespace):
def total_ram_bytes
if OS.mac?
Utils.safe_popen_read("sysctl", "-n", "hw.memsize").to_i
elsif OS.linux?
if File.readable?("/proc/meminfo")
mem_kib = File.read("/proc/meminfo")[/MemTotal:\s+(\d+)\s+kB/, 1]
mem_kib.to_i * 1024
else
pages = Utils.safe_popen_read("getconf", "_PHYS_PAGES").to_i
page_size = Utils.safe_popen_read("getconf", "PAGE_SIZE").to_i
pages * page_size
end
else
odie "Unsupported OS for RAM detection"
end
end
We can probably drop the getconf
fallback on Linux, and maybe add it if finding the MemTotal
entry of /proc/meminfo
doesn't work.
Trying out Ninja again and may retry tuning memory so see if it there is any improvement on arm64 Sequoia.
make
(total core parallel) took about 3 hours (15-arm64) and 1 hour (14 / linux)ninja
(default parallel) cancelled after ~4 hours (15-arm64). Same for 14 / linuxninja
(single link job) ...Afterward, will explore options to handle resource-dir for arm64 Linux
opt_lib.relative_path_from(bin)/"clang"/version.major