Skip to content

add try catch around provenance tracking #159266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yushangdi
Copy link
Contributor

@yushangdi yushangdi commented Jul 28, 2025

Summary: Add try-except around provenance tracking logic to make it more safe

Test Plan:
CI

Rollback Plan:

Differential Revision: D79008234

cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

Summary: Add try-except around provenance tracking logic to make it more safe

Test Plan:
CI

Rollback Plan:

Differential Revision: D79008234
Copy link

pytorch-bot bot commented Jul 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159266

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 11da12f with merge base 14d67ee (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79008234

@ezyang
Copy link
Contributor

ezyang commented Jul 28, 2025

I'm kind of down on this sort of thing. While I do understand that it's annoying to have a run fail because of some debugging code, the alternative is that debugging information is just missing and no one knows why (because this sort of thing is a lot harder to fix.)

@yushangdi
Copy link
Contributor Author

I'm kind of down on this sort of thing. While I do understand that it's annoying to have a run fail because of some debugging code, the alternative is that debugging information is just missing and no one knows why (because this sort of thing is a lot harder to fix.)

@ezyang We're trying to enable this provenance tracking debugging information by default for all production models, and the enablement team's people really want these try-except to make sure no production model is impacted by turning this on by default. Since we have the errors logged when we have any exception, we should be able to know why the debugging information is missing by looking at the logs.

@ezyang
Copy link
Contributor

ezyang commented Aug 6, 2025

I wouldn't rely on logview to find these problems, in that case. Why not write it to our Scuba table using signpost or something similar?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants