-
Notifications
You must be signed in to change notification settings - Fork 24.9k
[DO NOT MERGE] Autograd Onboarding Lab #160264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Following parts 1 and 2 of https://github.com/pytorch/pytorch/wiki/Autograd-Onboarding-Lab NOTE: Do NOT merge this diff! Learnings: - When deriving the backwards function analytically, it's easiest to break the forward function out step-by-step and compute the gradient by applying the chain rule - grad_a shows that we must be careful in considering both the local gradient and upstream gradient contributions - gradcheck and gradgradcheck are clever ways of validating the analytical solution using numerical/computational methods - Generally, how to write a test and operator Testing: Run `python3 test/test_autograd_lab.py`
TSIA. Part 3 of onboarding lab https://github.com/pytorch/pytorch/wiki/Autograd-Onboarding-Lab#iii-write-native-composite-function-and-opinfo Learnings: - Using -k option to run only a subset of tests! This can save a lot of time - Components of a native function include 1) registration within `native_functions.yaml`, 2) implementation via .cpp and .h, 3) OpInfo registration within `common_methods_invocations.py` - In cpp, make sure to import any functions that are needed within the `at` namespace. - Native operators added can be accessed via `torch.ops.aten.operator_name`. Tensors resulting from these operators will store a pointer to the backwards function Testing: Run `python3 test/test_ops.py -k attention`, `python3 test/test_autograd_lab.py`
…entation TSIA. Following part 4 of onboarding lab https://github.com/pytorch/pytorch/wiki/Autograd-Onboarding-Lab Learnings: - Gradient expressions in `derivatives.yaml` are essentially templates for c++ code, with pre-defined variables for accessing forward results and their gradients - Consequently, you can create custom functions to call within `derivatives.yaml` by adding them to `FunctionsManual.cpp` - You should specify a gradient expression for each of your differentiable outputs! - If you have multiple differentiable outputs, make sure to specify that in `derivatives.yaml` using `output_differentiability`! - Make sure in `native_functions.yaml` to update the corresponding entry's `dispatch`, specifying `CompositeExplicitAutograd` pointing to the backwards function you defined in `derivatives.yaml` - Tensors can be undefined! If you're uncertain about whether a tensor will be defined or not, make sure to check `tensor.defined()`! Otherwise, avoid operating using the tensor (ex. an output may not be used in the loss function, there for there is no gradient computed for it) NOTE: `test_fake_autocast` kept failing on my code. I've elected to skip this since I don't have enough personal time to dedicate towards debugging how this test works & why it is failing. Testing: Run `python3 test/test_ops.py -k attention`, `python3 test/test_autograd_lab.py`
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160264
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ee968aa with merge base 01f66d0 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Attention! native_functions.yaml was changedIf you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info. Caused by: |
…operator implementation Following recommendations from https://github.com/pytorch/pytorch/wiki/Autograd-Basics Changelist: - For python operator implementation, use `ctx.set_materialize_grads()` & update backwards function to handle/return `None` tensors - Update c++ code to use `sym_sizes` instead of `sizes` - Address test failures by adding entry in `overrides.py` and adding expected test failures for case regarding forward ad and vmap fallback Testing: Run `test_ops.py, test_overrides.py, functorch/test_aotdispatch.py, functorch/test_ops.py, functorch/test_vmap.py, test_proxy_tensor.py, test_autograd_lab.py` with `-k attention` NOTE: There are still some test failures for `functorch/test_ops.py` with `test_grad` and `test_vj`, I'm not sure why (ex. some sort of issue with my implementation, or perhaps some sort of error accumulation from floating point arithmetic?). I don't have much time to look into these, so will leave them for now.
Hi! I'm interested in learning more about the internals of Pytorch & becoming a contributor to the project. As part of this, I've been following along with Core Frontend Onboarding. This branch contains my implementation of the Autograd Onboarding Lab.
For the maintainers of Pytorch, PTAL! No urgency in reviewing this stack of diffs, but would appreciate feedback if you're able to!
cc @albanD (since the onboarding lab page mentions you)