unflatten with specialized graphs per submodule call #137013

avikchaudhuri · 2024-09-30T16:46:52Z

Previously we were making a fairly restrictive assumption when unflattening an exported program: for any submodule, we would assert that the graph of every call to that submodule must be the same. This assertion is load-bearing, i.e., if we simply remove the assertion then we can get incorrect results, as shown by the following example.

    class N(torch.nn.Module):
        def forward(self, x, b):
            if b:
                return x + 1
            else:
                return x + 2
    class M(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.n = N()
        def forward(self, x):
            x0 = x + 3
            x1 = self.n(x0, True)
            x2 = x1 + 4
            x3 = self.n(x2, False)
            return x3 + 5
    m = M()
    inp = (torch.ones(1),)
    print(m(*inp))  # tensor([16.])
    ep = torch.export.export(m, inp)
    print(ep.module()(*inp))  # tensor([16.])
    unflattened = torch.export.unflatten(ep)
    print(unflattened(*inp))  # tensor([15.])

However, this goes against the spirit of specializing graphs when exporting: we should expect that for every call to a submodule we might generate a different graph. The goal of this PR is to fix unflattening to handle multiple specialized graphs corresponding to multiple calls to the same submodule.

The idea is simple: for every call to a child module foo, we will create potentially different child modules foo, foo@1, foo@2, etc. and use those names as targets in callmodule instructions in the parent graph. An immediate consequence of this is that the list of fqns in an unflattened module may not be the same as an exported module. Note that all these variants share the same parameters / buffers, so that multiple calls to the same submodule can share state as expected.

However, as described so far this scheme may end up with needlessly too many submodules. Thus, between calls to the same submodule, if graphs are equal then we optimize away the extra submodules and reuse call names as much as possible. Moreover, when submodules are shared across fqns, we also try to de-duplicate graphs corresponding to their calls as much as possible. Note that no matter what, information about which submodule was called is still preserved, so that if a submodule has to be swapped with another, one can still find all calls to the former submodule and replace them with calls to the latter.

A note on the choice of naming scheme for call names: instead of generating "sibling" modules foo@1, foo@2, etc. for foo, we had considered generating "children" modules foo._1, foo._2, etc. of foo. However this can cause spurious cycles when de-duplicating graphs. E.g., suppose that foo is an alias for bar._1 and foo._1 is an alias for bar, then we must either introduce a cycle or drop the opportunity to optimize. Another idea would be to make foo a dummy module that contains foo._0 corresponding to the first call, but this necessitates too many changes to existing tests and hurts the common case.

Differential Revision: D63642479

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

pytorch-bot · 2024-09-30T16:46:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137013

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 964a4f1 with merge base 2b329d3 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-py3.9-clang10 / test (crossref, 1, 2, lf.linux.2xlarge) (gh) (disabled by #134600, #132861, #132862)
test_transformers.py::TestSDPAPrivateUse1Only::test_fused_sdp_choice_privateuseone

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-09-30T16:47:22Z

This pull request was exported from Phabricator. Differential Revision: D63642479

facebook-github-bot · 2024-09-30T20:50:05Z

This pull request was exported from Phabricator. Differential Revision: D63642479

facebook-github-bot · 2024-09-30T21:30:45Z

This pull request was exported from Phabricator. Differential Revision: D63642479

facebook-github-bot · 2024-09-30T21:36:48Z

This pull request was exported from Phabricator. Differential Revision: D63642479

Summary: Pull Request resolved: pytorch#137013 Test Plan: added test Differential Revision: D63642479

pianpwk · 2024-10-01T16:53:15Z

test/export/test_export.py

@@ -5974,6 +5974,166 @@ def forward(self, x):

        self.assertEqual(gm_flat_non_strict(*inp), gm_flat_strict(*inp))

+    def test_unflatten_multiple_graphs(self):


One clarifying question: what would module swapping look like from the user side?

If they don't make distinctions between specialized graphs, it seems like they'd have to manually switch out n, n@1, and potentially p, p@1 if they're also aliasing? Or if they make distinctions for aliasing/specializations then some subset of those.

I don't have context for what swapping looks like today with aliasing - is it one or multiple swaps? - but the main point is, does this strictly introduce more work for swapping, and could we introduce some unflattener API for this?

Yeah, they will have to update all f@i variants to new f. Maybe we should build some convenience APIs to grab these fqn variants.

pianpwk · 2024-10-01T17:14:15Z

torch/export/unflatten.py

+        for k, seen_module in self.seen_modules[self.module_id][:-1]:
+            num_calls[k] = num_calls.get(k, 0) + 1
+            seen_child_fqn = _call_name(k, num_calls[k])
+            if _check_graph_equivalence(seen_module, self.module):


Does this mean that differently specialized graphs (e.g. N(x, False), N(x, True)) won't share state? As in if we do attribute swaps on foo.bar, it won't have the same change for foo.bar@1 if the computation is different.

I think they will share state because the same params / buffer objects have been assigned to all variants. See assign_attr.

The intended effect is that these variants are like different methods on the same "moral" instance.

Oh, I meant if someone were to modify params based on the original FQNs, like foo.bar.attr = foo.bar.attr.bfloat16(), foo.bar@1.attr won't see the same change? I've debugged some internal FSDP sharding pipelines that'll do this to modify parameter dtype at runtime.

I think this falls under the same category of convenience APIs though, so no big deal

Yeah I don't think it will see those changes unless they do it for all variants at the same time, so yeah, need that API. Any suggestions what that API should look like?

For module swapping we can probably just patch the logic into Angela's _swap_modules() method in unflatten.py? For attributes probably similar: def _swap_attributes(ep: ExportedProgram, attrs_to_swap: Dict[str, Union[Any, Callable[Any -> Any]]):

facebook-github-bot · 2024-10-02T03:59:36Z

This pull request was exported from Phabricator. Differential Revision: D63642479

facebook-github-bot · 2024-10-02T04:06:45Z

This pull request was exported from Phabricator. Differential Revision: D63642479

Summary: Pull Request resolved: pytorch#137013 Test Plan: added test Reviewed By: pianpwk Differential Revision: D63642479

facebook-github-bot · 2024-10-02T15:59:40Z

This pull request was exported from Phabricator. Differential Revision: D63642479

facebook-github-bot · 2024-10-02T18:36:34Z

This pull request was exported from Phabricator. Differential Revision: D63642479

facebook-github-bot · 2024-10-02T18:42:20Z

This pull request was exported from Phabricator. Differential Revision: D63642479

facebook-github-bot · 2024-10-02T19:01:49Z

This pull request was exported from Phabricator. Differential Revision: D63642479

facebook-github-bot · 2024-10-02T19:06:58Z

This pull request was exported from Phabricator. Differential Revision: D63642479

facebook-github-bot · 2024-10-02T19:52:00Z

This pull request was exported from Phabricator. Differential Revision: D63642479

Summary: Pull Request resolved: pytorch#137013 Test Plan: added test Reviewed By: pianpwk Differential Revision: D63642479

facebook-github-bot · 2024-10-02T19:58:18Z

This pull request was exported from Phabricator. Differential Revision: D63642479

facebook-github-bot · 2024-10-03T00:54:07Z

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

pytorchmergebot · 2024-10-03T00:55:38Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

avikchaudhuri requested review from tugsbayasgalan, zhxchen17, ydwu4 and angelayi as code owners September 30, 2024 16:46

facebook-github-bot added the fb-exported label Sep 30, 2024

avikchaudhuri added the release notes: export label Sep 30, 2024

avikchaudhuri force-pushed the export-D63642479 branch from 58d66af to 015c00c Compare September 30, 2024 20:50

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Sep 30, 2024

avikchaudhuri force-pushed the export-D63642479 branch from 015c00c to 2c069f4 Compare September 30, 2024 21:30

avikchaudhuri force-pushed the export-D63642479 branch from 2c069f4 to 57e6e93 Compare September 30, 2024 21:36

avikchaudhuri added a commit to avikchaudhuri/pytorch that referenced this pull request Sep 30, 2024

unflatten with specialized graphs per submodule call (pytorch#137013)

57e6e93

Summary: Pull Request resolved: pytorch#137013 Test Plan: added test Differential Revision: D63642479

pianpwk reviewed Oct 1, 2024

View reviewed changes

pianpwk approved these changes Oct 1, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 1, 2024

avikchaudhuri force-pushed the export-D63642479 branch from 57e6e93 to 65214dd Compare October 2, 2024 03:59

avikchaudhuri force-pushed the export-D63642479 branch from 65214dd to cef6e26 Compare October 2, 2024 04:06

avikchaudhuri force-pushed the export-D63642479 branch from cef6e26 to b335a3b Compare October 2, 2024 04:11

avikchaudhuri force-pushed the export-D63642479 branch from 78ed1fe to 2de7009 Compare October 2, 2024 15:48

avikchaudhuri force-pushed the export-D63642479 branch from 2de7009 to d1a438f Compare October 2, 2024 15:59

avikchaudhuri force-pushed the export-D63642479 branch from d1a438f to 69b5b90 Compare October 2, 2024 18:36

avikchaudhuri force-pushed the export-D63642479 branch from 69b5b90 to e9c333f Compare October 2, 2024 18:42

avikchaudhuri force-pushed the export-D63642479 branch from e9c333f to 8728f9e Compare October 2, 2024 19:01

avikchaudhuri force-pushed the export-D63642479 branch from 8728f9e to f63d2a6 Compare October 2, 2024 19:07

avikchaudhuri force-pushed the export-D63642479 branch from f63d2a6 to dd1f753 Compare October 2, 2024 19:52

unflatten with specialized graphs per submodule call (pytorch#137013)

964a4f1

Summary: Pull Request resolved: pytorch#137013 Test Plan: added test Reviewed By: pianpwk Differential Revision: D63642479

avikchaudhuri force-pushed the export-D63642479 branch from dd1f753 to 964a4f1 Compare October 2, 2024 19:58

pytorchmergebot added the merging label Oct 3, 2024

pytorchmergebot closed this in cd5d1fe Oct 3, 2024

pytorchmergebot added Merged and removed merging labels Oct 3, 2024

		@@ -5974,6 +5974,166 @@ def forward(self, x):

		self.assertEqual(gm_flat_non_strict(inp), gm_flat_strict(inp))

		def test_unflatten_multiple_graphs(self):

unflatten with specialized graphs per submodule call #137013

unflatten with specialized graphs per submodule call #137013

Uh oh!

Conversation

avikchaudhuri commented Sep 30, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137013

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

facebook-github-bot commented Sep 30, 2024

Uh oh!

facebook-github-bot commented Sep 30, 2024

Uh oh!

facebook-github-bot commented Sep 30, 2024

Uh oh!

facebook-github-bot commented Sep 30, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 2, 2024

Uh oh!

facebook-github-bot commented Oct 2, 2024

Uh oh!

facebook-github-bot commented Oct 2, 2024

Uh oh!

facebook-github-bot commented Oct 2, 2024

Uh oh!

facebook-github-bot commented Oct 2, 2024

Uh oh!

facebook-github-bot commented Oct 2, 2024

Uh oh!

facebook-github-bot commented Oct 2, 2024

Uh oh!

facebook-github-bot commented Oct 2, 2024

Uh oh!

facebook-github-bot commented Oct 2, 2024

Uh oh!

facebook-github-bot commented Oct 3, 2024

Uh oh!

pytorchmergebot commented Oct 3, 2024

Merge started

Uh oh!

Uh oh!

avikchaudhuri commented Sep 30, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 30, 2024 •

edited

Loading