codecache: pull out some Graph serialization code into common helpers #141502

aorenste · 2024-11-25T19:30:42Z

Moved some code from FxGraphCache.lookup_graph() which dealt with serializing and deserializing CompiledFxGraph into CompiledFxGraph itself so it can be reused later by Async Compile.

Async Compile will need to serialize the compiled CompiledFxGraph from one process and deserialize it in another - so it's very similar to the cache.

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2024-11-25T19:30:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141502

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 530614a with merge base 4959784 ():

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#141703)
convnext_base
inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#141498)
convnext_base

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…mon helpers" Moved some code from FxGraphCache.lookup_graph() which dealt with serializing and deserializing CompiledFxGraph into CompiledFxGraph itself so it can be reused later by Async Compile. Async Compile will need to serialize the compiled CompiledFxGraph from one process and deserialize it in another - so it's very similar to the cache. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

jamesjwu · 2024-12-03T17:09:43Z

torch/_inductor/output_code.py

+        # models to disk.
+        self.current_callable = None
+
+    def after_deserialization(self, gm: Optional[torch.fx.GraphModule]) -> str:


@ezyang , this is basically just pulling out the code you wanted to refactor into post_compile, right?

Also qq (maybe for @aorenste) do we do this serialization even when the outputcode is generated via AOTI (not CompiledFxGraph)? I.e. does it belong in OutputCode, or does it belong in CompiledFxGraph?

I'm not sure why we want this separate from post_compile (there's probably a good reason, I'm just not seeing it). I am not sure why this isn't a shared method in the top level protocol.

There may not be a good reason. I did all this code before post_compile/OutputCode existed - so it may make sense to merge it all together better.

jamesjwu

The actual purpose of the PR seems good, but we may want to change/bikeshed some names around to figure out the order these functions should be called.

I think it's confusing to both ahve a post_compile step and a after_deserialization step, but I also think after_deserialization is a better name. I guess concretely I think, assuming this after_deserialization always runs on all types of OutputCodes:

Define OutputCode.post_serialize to be this after_deserialization function
Rename CompiledFxGraph.post_compile to CompiledFxGraph.post_serialize, and have CompiledFxGraph inherit from OutputCode's definition by calling OutputCode first.

That said, today CompiledFxGraph isn't even an actual child class of CompiledFxGraph. Not sure what's stopping us from making that happen though. Will wait for @ezyang 's thoughts.

ezyang · 2024-12-03T17:36:20Z

torch/_inductor/codecache.py

        except OSError:
            # Not expected, but in case the PyCodeCache entry is removed from
            # underneath us, treat it as a cache miss and recompile.
-            log.error("Failed to load cached artifact: %s", artifact_path)


Removing the log line seems worse

The log still happens - just inside after_deserialization which will raise an exception (because it doesn't know if ignoring the exception is the right thing or not).

ezyang · 2024-12-03T17:37:52Z

torch/_inductor/codecache.py

@@ -1223,25 +1204,17 @@ def iterate_over_candidates() -> Generator[CompiledFxGraph, None, None]:
                if len(meta.cached_kernel_names) > 0:
                    get_metrics_context().increment("num_triton_bundles", 1)

-        inductor_meta = autotune_cache.inductor_meta_from_config()
-        AutotuneCacheBundler.begin_compile(inductor_meta, code=code)


Moving this after the OSError catch looks like a bug fix

It's not really. The problem is that in order to look up the autotune cache we need the artifact code - but we don't have the artifact code until after after_deserialization() is called. AutotuneCacheBundler.begin_compile() just needs to get called sometime after you have the artifact code and before the first autotune occurs.

The bug fix is having begun the compile and then immediately erroring out (without ending compile)

I see what you're saying. Yeah - the AutotuneCacheBundler is a weird thing because our compile doesn't really have a good place to "wrap" (it would have to be somewhere in the dynamo call bytecode handler). begin_compile really just means "go fetch the compiled artifacts and spit them out on disk as local artifacts". end_compile really just means "collect all the local artifacts and bundle them into a single remote artifact". So not having an end doesn't really mean anything bad - it's supposed to handle that case properly (since it can happen in lots of ways).

torch/_inductor/codecache.py

ezyang · 2024-12-03T18:36:10Z

torch/_inductor/output_code.py

+        # so we serialize their PyCodeCache disk cache location instead.
+        # TODO: This could be better if we're ever able to serialize compiled
+        # models to disk.
+        self.current_callable = None


This is OK for refactoring but I don't want a "mutate the object so I can serialize it" API to be the final way of doing this. Definitely want a better API.

ezyang · 2024-12-03T18:38:48Z

torch/_inductor/output_code.py

+        from .graph import GraphLowering
+
+        # This is used by tests to check the output for specific details.
+        GraphLowering.save_output_code(code)


was moved here

ezyang

This seems like forward progress, probably need to work on this code more though

- Turn fx_codegen_and_compile() into a class (FxCompile) so we can override the implementation. - Pull the current body into an implementation (_InProcessFxCompile) which just performs the existing behavior. - Add an async interface. (See below) The intended future behavior of Async Compile will be to allow dynamo functions to start compiling in the background (and on a separate machine) while we continue to run eager in the foreground. As such we'll need to put the compilation behind some kind of Future implementation - it makes sense to reuse the existing python futures for that. An async function is just a syntactic way to return an asyncio.Future. Because asyncio.run() adds confusion to the stack traces when the called function isn't actually being used in an asynchronous way we also provide a synchronous interface which can be directly called. Pull Request resolved: #141505 Approved by: https://github.com/ezyang ghstack dependencies: #141502

…pytorch#141502) Moved some code from FxGraphCache.lookup_graph() which dealt with serializing and deserializing CompiledFxGraph into CompiledFxGraph itself so it can be reused later by Async Compile. Async Compile will need to serialize the compiled CompiledFxGraph from one process and deserialize it in another - so it's very similar to the cache. Pull Request resolved: pytorch#141502 Approved by: https://github.com/ezyang

- Turn fx_codegen_and_compile() into a class (FxCompile) so we can override the implementation. - Pull the current body into an implementation (_InProcessFxCompile) which just performs the existing behavior. - Add an async interface. (See below) The intended future behavior of Async Compile will be to allow dynamo functions to start compiling in the background (and on a separate machine) while we continue to run eager in the foreground. As such we'll need to put the compilation behind some kind of Future implementation - it makes sense to reuse the existing python futures for that. An async function is just a syntactic way to return an asyncio.Future. Because asyncio.run() adds confusion to the stack traces when the called function isn't actually being used in an asynchronous way we also provide a synchronous interface which can be directly called. Pull Request resolved: pytorch#141505 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#141502

…pytorch#141502) Moved some code from FxGraphCache.lookup_graph() which dealt with serializing and deserializing CompiledFxGraph into CompiledFxGraph itself so it can be reused later by Async Compile. Async Compile will need to serialize the compiled CompiledFxGraph from one process and deserialize it in another - so it's very similar to the cache. Pull Request resolved: pytorch#141502 Approved by: https://github.com/ezyang

- Turn fx_codegen_and_compile() into a class (FxCompile) so we can override the implementation. - Pull the current body into an implementation (_InProcessFxCompile) which just performs the existing behavior. - Add an async interface. (See below) The intended future behavior of Async Compile will be to allow dynamo functions to start compiling in the background (and on a separate machine) while we continue to run eager in the foreground. As such we'll need to put the compilation behind some kind of Future implementation - it makes sense to reuse the existing python futures for that. An async function is just a syntactic way to return an asyncio.Future. Because asyncio.run() adds confusion to the stack traces when the called function isn't actually being used in an asynchronous way we also provide a synchronous interface which can be directly called. Pull Request resolved: pytorch#141505 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#141502

WIP: prepare_for_serialization

057ba69

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels Nov 25, 2024

aorenste changed the title ~~WIP: prepare_for_serialization~~ codecache: make some Graph serialization common Nov 25, 2024

aorenste changed the title ~~codecache: make some Graph serialization common~~ codecache: pull out some Graph serialization code into common helpers Nov 25, 2024

aorenste mentioned this pull request Nov 25, 2024

Structured compile_fx #141505

Closed

aorenste added the topic: not user facing topic category label Nov 25, 2024

aorenste requested a review from oulgen November 26, 2024 17:31

aorenste marked this pull request as ready for review November 26, 2024 17:31

aorenste added 2 commits November 26, 2024 09:37

aorenste mentioned this pull request Nov 27, 2024

pickler for GraphModule #141659

Closed

aorenste requested a review from jamesjwu November 27, 2024 15:21

aorenste mentioned this pull request Nov 27, 2024

[POC] AOTInductor as Inductor backend #141700

Closed

aorenste added 2 commits December 2, 2024 09:01

aorenste mentioned this pull request Dec 3, 2024

MetaTensorDesc changes for reconstructing proper FakeTensors #141926

Closed

jamesjwu requested a review from ezyang December 3, 2024 17:08

jamesjwu reviewed Dec 3, 2024

View reviewed changes

ezyang reviewed Dec 3, 2024

View reviewed changes

torch/_inductor/codecache.py Show resolved Hide resolved

ezyang reviewed Dec 3, 2024

View reviewed changes

ezyang approved these changes Dec 3, 2024

View reviewed changes

pytorchmergebot added the Merged label Dec 3, 2024

pytorchmergebot closed this in 02147fe Dec 3, 2024

github-actions bot deleted the gh/aorenste/145/head branch January 3, 2025 02:07

codecache: pull out some Graph serialization code into common helpers #141502

codecache: pull out some Graph serialization code into common helpers #141502

Uh oh!

Conversation

aorenste commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141502

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesjwu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aorenste commented Nov 25, 2024 •

edited

Loading

pytorch-bot bot commented Nov 25, 2024 •

edited

Loading