Skip to content

Fix an example: Resolve broadcasting error in attn_bias and attn_mask… #130209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Fix an example: Resolve broadcasting error in attn_bias and attn_mask… #130209

wants to merge 2 commits into from

Conversation

xingyunjohn1
Copy link
Contributor

… addition, fix device assignment for newly created variables in method

Fix an example: Resolve broadcasting error in attn_bias and attn_mask addition, fix device assignment for newly created variables in method

  1. attn_bias += attn_mask would cause a broadcasting error. Because the shape of attn_bias is (L, S), the shape of the output would be expected as (L, S) too. When the shape of input is (N, num_heads, L, S), a broadcasting should be triggered. Then, the shape of the output would be (N, num_heads, L, S), which is unexpected.
  2. attn_bias is a newly created variables in method, which is not assigned device.

This is my retry of #130200 . I used a wrong account in that pr.

… addition, fix device assignment for newly created variables in method
Copy link

pytorch-bot bot commented Jul 6, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130209

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 81dfdda with merge base 9983242 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 8, 2024
Copy link
Contributor

@mikaylagawarecki mikaylagawarecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi @drisspg

@mikaylagawarecki mikaylagawarecki requested a review from drisspg July 10, 2024 18:59
@albanD albanD removed their request for review July 11, 2024 22:00
A more elegant implementation of distribution devices from @mikaylagawarecki.

Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com>
@mikaylagawarecki
Copy link
Contributor

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 18, 2024
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@mikaylagawarecki mikaylagawarecki added release notes: nn release notes category topic: docs topic category labels Jul 18, 2024
@mikaylagawarecki
Copy link
Contributor

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@mikaylagawarecki
Copy link
Contributor

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

DiweiSun pushed a commit to DiweiSun/pytorch that referenced this pull request Jul 22, 2024
pytorch#130209)

… addition, fix device assignment for newly created variables in method

Fix an example: Resolve broadcasting error in attn_bias and attn_mask addition, fix device assignment for newly created variables in method

1. `attn_bias += attn_mask` would cause a broadcasting error. Because the shape of `attn_bias` is (L, S), the shape of the output would be expected as (L, S) too. When the shape of input is (N, num_heads, L, S), a broadcasting should be triggered. Then, the shape of the output would be (N, num_heads, L, S), which is unexpected.
2. `attn_bias` is a newly created variables in method, which is not assigned device.

**This is my retry of pytorch#130200 .** I used a wrong account in that pr.

Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com>
Pull Request resolved: pytorch#130209
Approved by: https://github.com/mikaylagawarecki
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
pytorch#130209)

… addition, fix device assignment for newly created variables in method

Fix an example: Resolve broadcasting error in attn_bias and attn_mask addition, fix device assignment for newly created variables in method

1. `attn_bias += attn_mask` would cause a broadcasting error. Because the shape of `attn_bias` is (L, S), the shape of the output would be expected as (L, S) too. When the shape of input is (N, num_heads, L, S), a broadcasting should be triggered. Then, the shape of the output would be (N, num_heads, L, S), which is unexpected.
2. `attn_bias` is a newly created variables in method, which is not assigned device.

**This is my retry of pytorch#130200 .** I used a wrong account in that pr.

Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com>
Pull Request resolved: pytorch#130209
Approved by: https://github.com/mikaylagawarecki
@xingyunjohn1
Copy link
Contributor Author

xingyunjohn1 commented Aug 18, 2024

This sholud be reopened and merge again for the code has been overrided. @mikaylagawarecki Thank you.

pytorchmergebot pushed a commit that referenced this pull request Sep 9, 2024
#135427)

…` and `attn_mask`, and correct device assignment for newly created variables in the method.

Fix example: Address broadcasting error in the addition of `attn_bias` and `attn_mask`, and correct device assignment for newly created variables in the method.

1. Adding `attn_bias += attn_mask` results in a broadcasting error. The expected shape of `attn_bias` is (L, S), so the output should also have the shape (L, S). However, when the input shape is (N, num_heads, L, S), broadcasting occurs, leading to an output shape of (N, num_heads, L, S), which is not desired.
2. `attn_bias` is a newly created variable within the method, but it is not assigned to the correct device.

**This is my retry of PR #130209 . The PR has been merged into commit `d4a79d4a7c746068d25fe5cf9333495561f4ce1f`, but the modifications were overwritten by subsequent commits.**

Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com>
@mikaylagawarecki  provided a more elegant implementation.
Pull Request resolved: #135427
Approved by: https://github.com/ezyang
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
pytorch#135427)

…` and `attn_mask`, and correct device assignment for newly created variables in the method.

Fix example: Address broadcasting error in the addition of `attn_bias` and `attn_mask`, and correct device assignment for newly created variables in the method.

1. Adding `attn_bias += attn_mask` results in a broadcasting error. The expected shape of `attn_bias` is (L, S), so the output should also have the shape (L, S). However, when the input shape is (N, num_heads, L, S), broadcasting occurs, leading to an output shape of (N, num_heads, L, S), which is not desired.
2. `attn_bias` is a newly created variable within the method, but it is not assigned to the correct device.

**This is my retry of PR pytorch#130209 . The PR has been merged into commit `d4a79d4a7c746068d25fe5cf9333495561f4ce1f`, but the modifications were overwritten by subsequent commits.**

Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com>
@mikaylagawarecki  provided a more elegant implementation.
Pull Request resolved: pytorch#135427
Approved by: https://github.com/ezyang
apakbin pushed a commit to apakbin/pytorch that referenced this pull request Feb 14, 2025
pytorch#135427)

…` and `attn_mask`, and correct device assignment for newly created variables in the method.

Fix example: Address broadcasting error in the addition of `attn_bias` and `attn_mask`, and correct device assignment for newly created variables in the method.

1. Adding `attn_bias += attn_mask` results in a broadcasting error. The expected shape of `attn_bias` is (L, S), so the output should also have the shape (L, S). However, when the input shape is (N, num_heads, L, S), broadcasting occurs, leading to an output shape of (N, num_heads, L, S), which is not desired.
2. `attn_bias` is a newly created variable within the method, but it is not assigned to the correct device.

**This is my retry of PR pytorch#130209 . The PR has been merged into commit `d4a79d4a7c746068d25fe5cf9333495561f4ce1f`, but the modifications were overwritten by subsequent commits.**

Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com>
@mikaylagawarecki  provided a more elegant implementation.
Pull Request resolved: pytorch#135427
Approved by: https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: nn release notes category topic: docs topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants