Skip to content

Conversation

agozillon
Copy link
Contributor

We have some modifications downstream to compile the flang runtime for amdgpu using clang OpenMP, some more hacky than others to workaround (hopefully temporary) compiler issues. The additions here are the non-hacky alterations.

Main changes:
* Create freestanding versions of memcpy, strlen and memmove, and replace std:: references with these so that we can default to std:: when it's available, or our own Flang implementation when it's not. * Wrap more bits and pieces of the library in declare target wrappers (RT_* macros). * Fix some warnings that'll pose issues with werror on, in this case having the namespace infront of variables passed to templates.

Another minor issues that'll likely still pop up depending on the program you're linking with is that abort will be undefined, it is perhaps possible to solve it with a freestanding implementation as with memcpy etc. but we end up with multiple definitions in this case. An alternative is to create an empty extern "c" version (which can be empty or forwrd on to the builtin).

Co-author: Dan Palermo Dan.Palermo@amd.com

@llvmbot llvmbot added the flang Flang issues not falling into any other category label Aug 8, 2025
Copy link

github-actions bot commented Aug 8, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@agozillon agozillon force-pushed the flang-runtime-tweaks-for-amd-offload branch from a3304f0 to cc89cea Compare August 8, 2025 03:17
@@ -127,8 +127,8 @@ class Elementwise {
const Descriptor &instance_, *from_{nullptr};
std::size_t elements_{instance_.InlineElements()};
std::size_t elementAt_{0};
SubscriptValue subscripts_[common::maxRank];
SubscriptValue fromSubscripts_[common::maxRank];
SubscriptValue subscripts_[maxRank];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove it if you wish, but this flags up as a warning in Clang I believe, that'll cause us to error out in with -werror @dpalermo will know a bit more about this alteration I think :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the warning from clang?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misremembering the reason, it's been a while, I've checked and it's no longer a problem, so I'm happy to revert this :-)

@@ -158,8 +158,8 @@ template <typename STORE, std::size_t minBuffer = 65536> class FileFrame {
// Avoid passing a null pointer, since it would result in an undefined
// behavior.
if (old != nullptr) {
std::memcpy(buffer_, old + start_, chunk);
std::memcpy(buffer_ + chunk, old, length_ - chunk);
Fortran::runtime::memcpy(buffer_, old + start_, chunk);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the Fortran prefix, and using it makes this code inconsistent with other namespace usage in the runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will make the alteration in the next update after the other discussions are resolved

@@ -161,4 +161,4 @@ void std::__libcpp_verbose_abort(char const *format, ...) noexcept(
}
#endif

RT_EXT_API_GROUP_END
RT_EXT_API_GROUP_END
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you just deleted the terminal newline here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thank you! Will re-add in subsequent update.

@@ -23,6 +23,16 @@
#define STD_FILL_N_UNSUPPORTED 1
#endif

#if !defined(STD_MEMSET_UNSUPPORTED) && \
(defined(__CUDACC__) || defined(__CUDA__)) && defined(__CUDA_ARCH__)
#define STD_MEMSET_UNSUPPORTED 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this change existing compilations that work with CUDA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to remove it if we wish, it was to maintain consistency with the rest of the CUDA support below (e.g. memmove) when we added the other bits and pieces, @vzakhari may know better if this is a reasonable addition or not or you of course might as well :-) so please do advise if either of you wish it removed or altered, happy to do so!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe std::memset and std::memcpy are supported by at least nvcc, so I would prefer these additions to be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will do so on the next pass!

@@ -63,6 +73,24 @@
#define STD_TOUPPER_UNSUPPORTED 1
#endif

#if defined(OMP_OFFLOAD_BUILD) || defined(OMP_NOHOST_BUILD)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add this only for the particular OpenMP compiler you are using currently? I think some other OpenMP compilers may support some of these STD methods, so I would prefer not to disable them all for all compilers that may be used for out-of-tree flang-rt builds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, I think we should not do this for the host part of the offload build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add this only for the particular OpenMP compiler you are using currently? I think some other OpenMP compilers may support some of these STD methods, so I would prefer not to disable them all for all compilers that may be used for out-of-tree flang-rt builds.

That makes sense, I believe it should be possible, I think Clang (which is what we use) should have an environment variable specified during compilation to notify what compiler is in use, so should be able to use that if I'm remembering correctly.

In addition, I think we should not do this for the host part of the offload build.

Makes sense as well, I'll change the || -> && and it should only be applicable to the offload build and fall back on the std or builtin implementations.

Copy link
Contributor Author

@agozillon agozillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some replies inline, I'll update the PR once the discussions have been concluded!

@@ -127,8 +127,8 @@ class Elementwise {
const Descriptor &instance_, *from_{nullptr};
std::size_t elements_{instance_.InlineElements()};
std::size_t elementAt_{0};
SubscriptValue subscripts_[common::maxRank];
SubscriptValue fromSubscripts_[common::maxRank];
SubscriptValue subscripts_[maxRank];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove it if you wish, but this flags up as a warning in Clang I believe, that'll cause us to error out in with -werror @dpalermo will know a bit more about this alteration I think :-)

@@ -158,8 +158,8 @@ template <typename STORE, std::size_t minBuffer = 65536> class FileFrame {
// Avoid passing a null pointer, since it would result in an undefined
// behavior.
if (old != nullptr) {
std::memcpy(buffer_, old + start_, chunk);
std::memcpy(buffer_ + chunk, old, length_ - chunk);
Fortran::runtime::memcpy(buffer_, old + start_, chunk);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will make the alteration in the next update after the other discussions are resolved

@@ -161,4 +161,4 @@ void std::__libcpp_verbose_abort(char const *format, ...) noexcept(
}
#endif

RT_EXT_API_GROUP_END
RT_EXT_API_GROUP_END
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thank you! Will re-add in subsequent update.

@@ -23,6 +23,16 @@
#define STD_FILL_N_UNSUPPORTED 1
#endif

#if !defined(STD_MEMSET_UNSUPPORTED) && \
(defined(__CUDACC__) || defined(__CUDA__)) && defined(__CUDA_ARCH__)
#define STD_MEMSET_UNSUPPORTED 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to remove it if we wish, it was to maintain consistency with the rest of the CUDA support below (e.g. memmove) when we added the other bits and pieces, @vzakhari may know better if this is a reasonable addition or not or you of course might as well :-) so please do advise if either of you wish it removed or altered, happy to do so!

@@ -63,6 +73,24 @@
#define STD_TOUPPER_UNSUPPORTED 1
#endif

#if defined(OMP_OFFLOAD_BUILD) || defined(OMP_NOHOST_BUILD)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add this only for the particular OpenMP compiler you are using currently? I think some other OpenMP compilers may support some of these STD methods, so I would prefer not to disable them all for all compilers that may be used for out-of-tree flang-rt builds.

That makes sense, I believe it should be possible, I think Clang (which is what we use) should have an environment variable specified during compilation to notify what compiler is in use, so should be able to use that if I'm remembering correctly.

In addition, I think we should not do this for the host part of the offload build.

Makes sense as well, I'll change the || -> && and it should only be applicable to the offload build and fall back on the std or builtin implementations.

@agozillon agozillon force-pushed the flang-runtime-tweaks-for-amd-offload branch from fc16dff to be67c22 Compare August 11, 2025 17:34
@agozillon
Copy link
Contributor Author

Attempted to address all the feedback in the last commit (plus force pushed once more to fix some formatting)

Copy link
Contributor

@klausler klausler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other APIs from std:: in the runtime that should be replaced?

@@ -15,6 +15,7 @@
#include "flang-rt/runtime/tools.h"
#include "flang-rt/runtime/type-info.h"
#include "flang-rt/runtime/work-queue.h"
#include "flang/Runtime/stop.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this additional include directive necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will double check if the includes are necessary!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No requirement for this one anymore, thank you for pointing it out!

@@ -10,6 +10,7 @@
#include "stack.h"
#include "flang-rt/runtime/descriptor.h"
#include "flang-rt/runtime/terminator.h"
#include "flang-rt/runtime/tools.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this additional include directive necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case it is unfortunately, due needing the memcpy from the fortran::runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But memcpy is defined in freestanding-tools.h, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thank you! I've replaced tools with freestanding-tools.h in the most recent update.

@agozillon
Copy link
Contributor Author

Are there other APIs from std:: in the runtime that should be replaced?

Not that we've found so far, but there will likely be others as the runtime gets built on! So we're happy to address them as they come up.

There is an issue with abort (C abort I think, rather than std::) currently, but the method we use to work around it downstream at the moment isn't particularly upstreamable, we create an local (to the .cpp source file with the issue) extern "C" function in-place of abort to forward to the fortran runtime's abort. Unfortunately, couldn't replicate similar functionality to the memset/memcpy in the PR as we end up with duplicate symbols.

@agozillon
Copy link
Contributor Author

Checked the pointed out includes in the last comments and removed one in the last commit, the other is required for the fortran::runtime::memcpy functionality it seems.

@agozillon agozillon force-pushed the flang-runtime-tweaks-for-amd-offload branch from 3353c60 to 0416575 Compare August 14, 2025 15:00
@agozillon
Copy link
Contributor Author

Updated the PR stack, the main addition was new RT_OFFLOAD_API_GROUP wrappers in temporary-stack.cpp.

But otherwise I rebased and reformatted the PR to fix the code formatting!

But the PR should address all current comments and pass the CI now, please do feel free to add more comments however if you wish anything further changed! :-)

@agozillon
Copy link
Contributor Author

Small ping for further review or approvals if possible please :-)

@@ -204,8 +204,8 @@ void FORTRAN_PROCEDURE_NAME(getarg)(
void FORTRAN_PROCEDURE_NAME(getlog)(char *arg, std::int64_t length) {
#if _REENTRANT || _POSIX_C_SOURCE >= 199506L
if (length >= 1 && getlogin_r(arg, length) == 0) {
auto loginLen{std::strlen(arg)};
std::memset(
auto loginLen{Fortran::runtime::strlen(arg)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just runtime:: is sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went back over these to double check and removed Fortran:: from this one, the others appear to need the namespace prefix at the moment or get undeclared identifier errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? The identifiers are declared in the runtime namespace, no? Maybe you're missing an include or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible and I'm happy to investigate a bit, just assumed it's perhaps intended for those files as we do use Fortran::runtime:: elsewhere in it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I managed to get to the bottom of it, they reside outside of a namespace or in an anonymous namespace, so a using namespace Fortran is required. Should be addressed in the latest commit.

@@ -97,7 +99,7 @@ void DescriptorStorage<COPY_VALUES>::resize(size_type newCapacity) {
// Avoid passing a null pointer, since it would result in an undefined
// behavior.
if (data_ != nullptr) {
memcpy(newData, data_, capacity_ * sizeof(Descriptor *));
Fortran::runtime::memcpy(newData, data_, capacity_ * sizeof(Descriptor *));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just runtime:: suffices.

@@ -276,13 +276,13 @@ static void DateAndTimeUnavailable(Fortran::runtime::Terminator &terminator,
char *zone, std::size_t zoneChars,
const Fortran::runtime::Descriptor *values) {
if (date) {
std::memset(date, static_cast<int>(' '), dateChars);
Fortran::runtime::memset(date, static_cast<int>(' '), dateChars);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just runtime:: is needed here and in the following lines.

Copy link
Contributor Author

@agozillon agozillon Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is necessary to have the Fortran:: prefix in the remaining locations, it incurs a compile time error otherwise. And the Fortran:: prefix is used elsewhere in these files e.g. one of the input parameter in the above case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? what happens?

while (count--) {
*to++ = *from++;
}
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need a return; statement before the closing brace of a function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Happy to adjust.

@klausler klausler requested review from klausler and removed request for klausler August 19, 2025 14:38
@agozillon agozillon force-pushed the flang-runtime-tweaks-for-amd-offload branch from 0416575 to de94a2c Compare August 19, 2025 15:12
@agozillon
Copy link
Contributor Author

rebased on top of main in the last PR and amended the requested changes into the last commit so had to do a force push, so my appologies for that.

The only additional changes should be the remove of the requested return; and the removal of one more Fortran:: namespace prefix, the remaining 2 require it at the moment it seems.

@klausler klausler requested review from klausler and removed request for klausler August 19, 2025 15:30
@agozillon
Copy link
Contributor Author

Last PR should address all the comments currently. I'll git squash and clang-format to make the CI happy once we're happy with the PR (so as to make reviewing any changes easier for the time being).

@agozillon
Copy link
Contributor Author

Small ping for reviewer attention on this please :-)

Copy link
Contributor

@vzakhari vzakhari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with one minor comment. Thank you!

@@ -420,7 +423,7 @@ static void GetDateAndTime(Fortran::runtime::Terminator &terminator, char *date,
auto copyBufferAndPad{
[&](char *dest, std::size_t destChars, std::size_t len) {
auto copyLen{std::min(len, destChars)};
std::memcpy(dest, buffer, copyLen);
Fortran::runtime::memcpy(dest, buffer, copyLen);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Fortran::runtime::memcpy(dest, buffer, copyLen);
runtime::memcpy(dest, buffer, copyLen);

@agozillon
Copy link
Contributor Author

agozillon commented Aug 27, 2025

Thank you both very much! I'll address the last comment, squash and clang-format and land the PR in the next couple of days unless there's any further comments in between :-)

We have some modifications downstream to compile the flang runtime for amdgpu using clang OpenMP, some more
hacky than others to workaround (hopefully temporary) compiler issues. The additions here are the non-hacky
alterations.

Main changes:
	* Create freestanding versions of memcpy, strlen and memmove, and replace std:: references with
	  these so that we can default to std:: when it's available, or our own Flang implementation when
          it's not.
        * Wrap more bits and pieces of the library in declare target wrappers (RT_* macros).
        * Fix some warnings that'll pose issues with werror on, in this case having the namespace infront
          of variables passed to templates.

Another minor issues that'll likely still pop up depending on the program you're linking with is that abort will be
undefined, it is perhaps possible to solve it with a freestanding implementation as with memcpy etc. but we end up
with multiple definitions in this case. An alternative is to create an empty extern "c" version (which can be empty
or forwrd on to the builtin).

Co-author: Dan Palermo Dan.Palermo@amd.com
@agozillon agozillon force-pushed the flang-runtime-tweaks-for-amd-offload branch from 0e5ef24 to e6b83ed Compare August 29, 2025 20:33
@agozillon agozillon changed the title [Flang][OpenMP][Runtime] Minor Flang runtime for OpenMP AMDGPU [Flang][OpenMP][Runtime] Minor Flang runtime for OpenMP AMDGPU modifications Aug 29, 2025
@agozillon agozillon merged commit 30d2cb5 into llvm:main Aug 29, 2025
9 checks passed
mdenson pushed a commit to mdenson/llvm-project that referenced this pull request Aug 29, 2025
…cations (llvm#152631)

We have some modifications downstream to compile the flang runtime for
amdgpu using clang OpenMP, some more hacky than others to workaround
(hopefully temporary) compiler issues. The additions here are the
non-hacky alterations.

Main changes:
* Create freestanding versions of memcpy, strlen and memmove, and
replace std:: references with these so that we can default to std:: when
it's available, or our own Flang implementation when it's not. * Wrap
more bits and pieces of the library in declare target wrappers (RT_*
macros). * Fix some warnings that'll pose issues with werror on, in this
case having the namespace infront of variables passed to templates.

Another minor issues that'll likely still pop up depending on the
program you're linking with is that abort will be undefined, it is
perhaps possible to solve it with a freestanding implementation as with
memcpy etc. but we end up with multiple definitions in this case. An
alternative is to create an empty extern "c" version (which can be empty
or forwrd on to the builtin).

Co-author: Dan Palermo Dan.Palermo@amd.com
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Aug 30, 2025
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Aug 31, 2025
…cations (llvm#152631)

We have some modifications downstream to compile the flang runtime for
amdgpu using clang OpenMP, some more hacky than others to workaround
(hopefully temporary) compiler issues. The additions here are the
non-hacky alterations.

Main changes:
* Create freestanding versions of memcpy, strlen and memmove, and
replace std:: references with these so that we can default to std:: when
it's available, or our own Flang implementation when it's not. * Wrap
more bits and pieces of the library in declare target wrappers (RT_*
macros). * Fix some warnings that'll pose issues with werror on, in this
case having the namespace infront of variables passed to templates.

Another minor issues that'll likely still pop up depending on the
program you're linking with is that abort will be undefined, it is
perhaps possible to solve it with a freestanding implementation as with
memcpy etc. but we end up with multiple definitions in this case. An
alternative is to create an empty extern "c" version (which can be empty
or forwrd on to the builtin).

Co-author: Dan Palermo Dan.Palermo@amd.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants