[aoti] Remove dir after packaging #140022

angelayi · 2024-11-07T17:32:09Z

Update AOTI to return a list of files that it generates when aot_inductor.package=True. Then we will only package the files that are in that list.

This should fix the caching issue and hopefully #140053.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

pytorch-bot · 2024-11-07T17:32:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140022

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 464e2a8 with merge base 22dfb5b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-11-07T17:32:24Z

@angelayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

desertfire

This doesn't solve the problem when different runs can generate different cubin files. We can end up with including unnecessary cubin files.

I think a better way to solve this is in codecache.py. Using an unique subdirectory to store .so and other relevant files and package afterwards. This way, all the previous auto-tuning results are still kept and we will not package unnecessary files.

torch/_inductor/package/package.py

desertfire · 2024-11-07T19:31:36Z

Also your test script should still be added as a unit test. It should work with some tweaks.

desertfire · 2024-11-07T19:34:07Z

This doesn't solve the problem when different runs can generate different cubin files. We can end up with including unnecessary cubin files.

I think a better way to solve this is in codecache.py. Using an unique subdirectory to store .so and other relevant files and package afterwards. This way, all the previous auto-tuning results are still kept and we will not package unnecessary files.

I see you are actually deleting the whole directory afterwards. This doesn't solve the keep caching request that Henry raised.

henrylhtsang · 2024-11-07T19:40:16Z

This can unblock (previously will run into errors). But would like to see a way to cache things so iteration speed can be better.

For torch.compile, subsequent runs takes 1/10 of the time to compile due to local cache. It would be nice if AOTI can have similar features.

facebook-github-bot · 2024-11-09T01:08:35Z

@angelayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

desertfire

LGTM overall.

The CI "out of disk" failure is real. I think it's related to the fact that when we call aoti_load_package, we unzip the files to the top level /tmp directory, which will not removed after running each benchmark. The large weight files gradually eat the disk and eventually triggers the error.

torch/_inductor/compile_fx.py

torch/_inductor/package/package.py

torch/_inductor/__init__.py

malfet · 2024-11-12T01:01:26Z

torch/_inductor/__init__.py

This feels like a complicated API, why not always return a list, albeit for .so it will be a single element list

I don't want to break any existing callsites 😅 but yes! I can fix the rest of the callsites to return a list in a followup.

malfet · 2024-11-12T01:05:04Z

torch/_inductor/package/package.py

Perhaps a discussion for different PR, but should it really be .so on all platforms? Wouldn't it be more reasonable to check for sysconfig.get_config_var('EXT_SUFFIX')?

torch/_inductor/codegen/cpp_wrapper_gpu.py

This reverts commit ba136a7. Reverted #140022 on behalf of https://github.com/angelayi due to sorry I realized I need to land from internal ([comment](#140022 (comment)))

pytorchmergebot · 2024-11-13T14:43:21Z

@angelayi your PR has been successfully reverted.

Summary: Update AOTI to return a list of files that it generates when `aot_inductor.package=True`. Then we will only package the files that are in that list. This should fix the [caching issue](https://fb.workplace.com/groups/1028545332188949/permalink/1081702043539944/) and hopefully #140053. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov Reviewed By: pianpwk Differential Revision: D65862850 Pulled By: angelayi

facebook-github-bot · 2024-11-13T18:20:53Z

This pull request was exported from Phabricator. Differential Revision: D65862850

Summary: Reland #140022 Test Plan: CI Differential Revision: D65929964

Summary: Reland #140022 Test Plan: CI Differential Revision: D65929964 Pull Request resolved: #140675 Approved by: https://github.com/desertfire

Update AOTI to return a list of files that it generates when `aot_inductor.package=True`. Then we will only package the files that are in that list. This should fix the [caching issue](https://fb.workplace.com/groups/1028545332188949/permalink/1081702043539944/) and hopefully pytorch#140053. Pull Request resolved: pytorch#140022 Approved by: https://github.com/larryliu0820, https://github.com/desertfire, https://github.com/malfet

This reverts commit 8c6abe5. Reverted pytorch#140022 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the lint failure is legit ([comment](pytorch#140022 (comment)))

Update AOTI to return a list of files that it generates when `aot_inductor.package=True`. Then we will only package the files that are in that list. This should fix the [caching issue](https://fb.workplace.com/groups/1028545332188949/permalink/1081702043539944/) and hopefully pytorch#140053. Pull Request resolved: pytorch#140022 Approved by: https://github.com/larryliu0820, https://github.com/desertfire, https://github.com/malfet

This reverts commit ba136a7. Reverted pytorch#140022 on behalf of https://github.com/angelayi due to sorry I realized I need to land from internal ([comment](pytorch#140022 (comment)))

) Summary: Reland pytorch#140022 Test Plan: CI Differential Revision: D65929964 Pull Request resolved: pytorch#140675 Approved by: https://github.com/desertfire

angelayi requested a review from desertfire November 7, 2024 17:32

pytorch-bot bot added ciflow/inductor module: inductor labels Nov 7, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 7, 2024

angelayi added topic: bug fixes topic category release notes: inductor labels Nov 7, 2024

larryliu0820 approved these changes Nov 7, 2024

View reviewed changes

desertfire requested changes Nov 7, 2024

View reviewed changes

torch/_inductor/package/package.py Outdated Show resolved Hide resolved

angelayi force-pushed the angelayi/package_cache branch from e7a3730 to dbc512a Compare November 9, 2024 01:05

angelayi requested a review from desertfire November 9, 2024 01:08

angelayi force-pushed the angelayi/package_cache branch 2 times, most recently from 48bb762 to b4339ac Compare November 11, 2024 17:56

angelayi requested review from avikchaudhuri, tugsbayasgalan, ydwu4 and zhxchen17 as code owners November 11, 2024 17:56

desertfire reviewed Nov 11, 2024

View reviewed changes

torch/_inductor/compile_fx.py Outdated Show resolved Hide resolved

torch/_inductor/package/package.py Outdated Show resolved Hide resolved

desertfire self-requested a review November 11, 2024 20:18

angelayi force-pushed the angelayi/package_cache branch from b4339ac to f7ebff5 Compare November 11, 2024 21:57

Jack-Khuu mentioned this pull request Nov 12, 2024

AOTI filesize regression *.pt2 filesize is bigger than .*so pytorch/torchchat#1365

Closed

malfet approved these changes Nov 12, 2024

View reviewed changes

angelayi force-pushed the angelayi/package_cache branch 2 times, most recently from f3e385f to 423e0cb Compare November 12, 2024 01:29

desertfire approved these changes Nov 12, 2024

View reviewed changes

pytorchmergebot reopened this Nov 13, 2024

facebook-github-bot force-pushed the angelayi/package_cache branch from bd5b0d1 to 464e2a8 Compare November 13, 2024 18:20

angelayi mentioned this pull request Nov 14, 2024

[reland] [aoti] Selectively package AOTI generated files #140675

Closed

pytorch-bot bot pushed a commit that referenced this pull request Nov 14, 2024

[reland] [aoti] Remove dir after packaging

ccf8fd2

Summary: Reland #140022 Test Plan: CI Differential Revision: D65929964

angelayi closed this Nov 14, 2024

github-actions bot deleted the angelayi/package_cache branch December 15, 2024 02:16

[aoti] Remove dir after packaging #140022

[aoti] Remove dir after packaging #140022

Uh oh!

Conversation

angelayi commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140022

✅ No Failures

Uh oh!

facebook-github-bot commented Nov 7, 2024

Uh oh!

desertfire left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

desertfire commented Nov 7, 2024

Uh oh!

desertfire commented Nov 7, 2024

Uh oh!

henrylhtsang commented Nov 7, 2024

Uh oh!

facebook-github-bot commented Nov 9, 2024

Uh oh!

desertfire left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

malfet Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

angelayi Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

malfet Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pytorchmergebot commented Nov 13, 2024

Uh oh!

facebook-github-bot commented Nov 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

angelayi commented Nov 7, 2024 •

edited

Loading

pytorch-bot bot commented Nov 7, 2024 •

edited

Loading