handle state tensors in training ir path #137240

avikchaudhuri · 2024-10-03T00:28:45Z

Summary: We had attribute assignment detection and handling of registered buffer assignments when using aot_autograd, but not when using just make_fx. Fixed.

Test Plan: expanded coverage of test_state_tensors to use export instead of torch.export.export

Differential Revision: D63802576

pytorch-bot · 2024-10-03T00:28:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137240

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit a6338cd with merge base e80f47f ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 4, 5, lf.linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
'test/inductor/test_cudacodecache.py::TestCUDACodeCache::test_cuda_load'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-10-03T00:29:01Z

This pull request was exported from Phabricator. Differential Revision: D63802576

facebook-github-bot · 2024-10-03T01:22:23Z

This pull request was exported from Phabricator. Differential Revision: D63802576

facebook-github-bot · 2024-10-03T01:26:19Z

This pull request was exported from Phabricator. Differential Revision: D63802576

facebook-github-bot · 2024-10-03T02:03:41Z

This pull request was exported from Phabricator. Differential Revision: D63802576

facebook-github-bot · 2024-10-03T02:06:26Z

This pull request was exported from Phabricator. Differential Revision: D63802576

tugsbayasgalan · 2024-10-03T16:06:53Z

torch/export/_trace.py

In training IR, we are following the convention that we don't return mutated buffers. Could we just write it as inplace update?

If the source code has buf = new_value can we treat it the same as buf.copy_(new_value)?

I think this is ok as long as metadata etc remains the same (aka shape). I mean when we unlift, we just write them as copy_ anyways... So seems ok? What do you think?

tugsbayasgalan · 2024-10-03T17:11:22Z

torch/export/_trace.py

Same here: Can we make it a helper function in aot_autograd and use it here?

This part is going to be substantially different, adding copy nodes instead of mutating buffer. I'll dedup the other part.

tugsbayasgalan · 2024-10-03T17:11:25Z

torch/export/_trace.py

Could we make this a util function in aot_autograd.py and use that here? I think this logic also exist in aot_autograd.py?

Mostly true, although there we are seeing buffers as Functional(Fake(...)) and here just Fake(...).

Could we filter out the other parts? What i am worried is this part is bit complicated to understand so it would be better if we have only place to change stuff. Maybe the util function can take what instance we want to assert on lol.

facebook-github-bot · 2024-10-04T06:50:03Z

This pull request was exported from Phabricator. Differential Revision: D63802576

facebook-github-bot · 2024-10-04T07:53:24Z

This pull request was exported from Phabricator. Differential Revision: D63802576

Summary: Pull Request resolved: pytorch#137240 We had attribute assignment detection and handling of registered buffer assignments when using `aot_autograd`, but not when using just `make_fx`. Fixed. Test Plan: expanded coverage of `test_state_tensors` to use `export` instead of `torch.export.export` Differential Revision: D63802576

facebook-github-bot · 2024-10-04T07:55:44Z

This pull request was exported from Phabricator. Differential Revision: D63802576

facebook-github-bot · 2024-10-04T20:21:48Z

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

pytorchmergebot · 2024-10-04T20:23:40Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

avikchaudhuri requested review from angelayi, tugsbayasgalan, ydwu4 and zhxchen17 as code owners October 3, 2024 00:28

pytorch-bot bot added the release notes: export label Oct 3, 2024

facebook-github-bot added the fb-exported label Oct 3, 2024

avikchaudhuri force-pushed the export-D63802576 branch from 9737ee0 to 4fb24de Compare October 3, 2024 01:22

avikchaudhuri force-pushed the export-D63802576 branch from 4fb24de to f6c61bd Compare October 3, 2024 01:26

avikchaudhuri force-pushed the export-D63802576 branch from f6c61bd to 38b8fae Compare October 3, 2024 02:03

avikchaudhuri force-pushed the export-D63802576 branch from 38b8fae to 603b89b Compare October 3, 2024 02:06

tugsbayasgalan reviewed Oct 3, 2024

View reviewed changes

avikchaudhuri force-pushed the export-D63802576 branch from 603b89b to d8c5687 Compare October 4, 2024 06:50

pytorch-bot bot added the ciflow/inductor label Oct 4, 2024

avikchaudhuri force-pushed the export-D63802576 branch from d8c5687 to 9140277 Compare October 4, 2024 07:53

avikchaudhuri force-pushed the export-D63802576 branch from 9140277 to a6338cd Compare October 4, 2024 07:55

tugsbayasgalan approved these changes Oct 4, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 4, 2024

pytorchmergebot added the merging label Oct 4, 2024

pytorchmergebot closed this in 6a6a8b1 Oct 4, 2024

pytorchmergebot added Merged and removed merging labels Oct 4, 2024

handle state tensors in training ir path #137240

handle state tensors in training ir path #137240

Uh oh!

Conversation

avikchaudhuri commented Oct 3, 2024

Uh oh!

pytorch-bot bot commented Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137240

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

facebook-github-bot commented Oct 3, 2024

Uh oh!

facebook-github-bot commented Oct 3, 2024

Uh oh!

facebook-github-bot commented Oct 3, 2024

Uh oh!

facebook-github-bot commented Oct 3, 2024

Uh oh!

facebook-github-bot commented Oct 3, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 4, 2024

Uh oh!

facebook-github-bot commented Oct 4, 2024

Uh oh!

facebook-github-bot commented Oct 4, 2024

Uh oh!

facebook-github-bot commented Oct 4, 2024

Uh oh!

pytorchmergebot commented Oct 4, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot bot commented Oct 3, 2024 •

edited

Loading