-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Update MAIAHooksInterface to pin host memory in MAIA device #155541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155541
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New FailuresAs of commit e3237f0 with merge base 12b0213 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot label "topic: not user facing" |
…apli/maia_pin_memory
I'm a bit concerned by the direction here actually. Let's sync on an issue or slack about this before you start working on more of these PRs. |
Thanks for raising this @albanD — I understand your concern and agree that long-term, using the open registration (PrivateUse1) mechanism is the right direction for out-of-tree backend integration. That said, our current backend has already been deeply integrated using the older in-tree approach. We have implemented a wide range of kernels and system components (memory, distributed, collective, dispatch, etc.) based on this, and the transition to PrivateUse1 is non-trivial for us in the short term. We’re not opposed to making this shift, but given the amount of scaffolding already built, we would need to plan and migrate incrementally. We're happy to align with the openReg approach long-term and would appreciate guidance or best practices to gradually restructure our integration around PrivateUse1. Let’s sync up in this issue to discuss a possible migration path that minimizes disruption to our current functionality while moving toward the recommended direction. cc @wschin |
@albanD, as you saw above, we built our code on previous integration approach, so switching to OpenReg will take months to make this simple change happen. From PRs to OpenReg, it seems it's still under development without clear documents. We are investigating for a feasible migration plan to OpenReg. To unblock our production code and lower our risk, could you please re-consider if this PR can be accepted? Some concerns for OpenReg: #155864 (comment) |
Move to use |
This pull request introduces support for the
MAIA
device type in the PyTorch codebase by adding hooks, detection mechanisms, and interface updates. The changes ensure thatMAIA
is integrated alongside other device types likeCUDA
andXPU
. Below is a summary of the key changes:Integration of
MAIA
Device Type:Device Hooks and Detection:
MAIA
hooks in theContext
class to retrieve device-specific hooks and check device availability (hasMAIA
). ([[1]](https://github.com/pytorch/pytorch/pull/155541/files#diff-468e440eb305e0a440c3d600087003dab97dc368fba20ebedb85e68b99f02e63R64-R65)
,[[2]](https://github.com/pytorch/pytorch/pull/155541/files#diff-468e440eb305e0a440c3d600087003dab97dc368fba20ebedb85e68b99f02e63R142-R144)
)hasMAIA
implementation that usedDeviceGuardImpl
. ([aten/src/ATen/Context.hL182-L184](https://github.com/pytorch/pytorch/pull/155541/files#diff-468e440eb305e0a440c3d600087003dab97dc368fba20ebedb85e68b99f02e63L182-L184)
)DeviceAccelerator.cpp
to detect and validateMAIA
as an accelerator device. ([[1]](https://github.com/pytorch/pytorch/pull/155541/files#diff-18f0936bc5ffb161d84cf5e7fd707f633eaedf27ba4f86bd51b2adbe47e71f25R45)
,[[2]](https://github.com/pytorch/pytorch/pull/155541/files#diff-18f0936bc5ffb161d84cf5e7fd707f633eaedf27ba4f86bd51b2adbe47e71f25R62)
)MAIA Hooks Interface:
MAIAHooksInterface
inMAIAHooksInterface.h
with methods for initialization, context checks, and memory management. ([[1]](https://github.com/pytorch/pytorch/pull/155541/files#diff-441db538a658d9b86c83e8093e6116576f92cc6d7e67b3f90741169c2f68badfR3)
,[[2]](https://github.com/pytorch/pytorch/pull/155541/files#diff-441db538a658d9b86c83e8093e6116576f92cc6d7e67b3f90741169c2f68badfR12-R43)
)MAIA_HELP
message for error handling and replaced hardcoded error strings with this constant. ([aten/src/ATen/detail/MAIAHooksInterface.hR12-R43](https://github.com/pytorch/pytorch/pull/155541/files#diff-441db538a658d9b86c83e8093e6116576f92cc6d7e67b3f90741169c2f68badfR12-R43)
)