CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 24.7k
[aoti] Remove dir after packaging #140022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140022
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 464e2a8 with merge base 22dfb5b ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@angelayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't solve the problem when different runs can generate different cubin files. We can end up with including unnecessary cubin files.
I think a better way to solve this is in codecache.py. Using an unique subdirectory to store .so and other relevant files and package afterwards. This way, all the previous auto-tuning results are still kept and we will not package unnecessary files.
Also your test script should still be added as a unit test. It should work with some tweaks. |
I see you are actually deleting the whole directory afterwards. This doesn't solve the keep caching request that Henry raised. |
This can unblock (previously will run into errors). But would like to see a way to cache things so iteration speed can be better. For torch.compile, subsequent runs takes 1/10 of the time to compile due to local cache. It would be nice if AOTI can have similar features. |
e7a3730
to
dbc512a
Compare
@angelayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
48bb762
to
b4339ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall.
The CI "out of disk" failure is real. I think it's related to the fact that when we call aoti_load_package
, we unzip the files to the top level /tmp
directory, which will not removed after running each benchmark. The large weight files gradually eat the disk and eventually triggers the error.
b4339ac
to
f7ebff5
Compare
@@ -143,7 +146,8 @@ def aot_compile( | |||
options: Optional dict of config options. See `torch._inductor.config`. | |||
|
|||
Returns: | |||
Path to the generated shared library | |||
Path to the generated shared library, or a list of files generated by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a complicated API, why not always return a list, albeit for .so it will be a single element list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to break any existing callsites 😅 but yes! I can fix the rest of the callsites to return a list in a followup.
if file == "": | ||
continue | ||
|
||
if file.endswith(".so"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a discussion for different PR, but should it really be .so
on all platforms? Wouldn't it be more reasonable to check for sysconfig.get_config_var('EXT_SUFFIX')
?
f3e385f
to
423e0cb
Compare
This reverts commit ba136a7. Reverted #140022 on behalf of https://github.com/angelayi due to sorry I realized I need to land from internal ([comment](#140022 (comment)))
@angelayi your PR has been successfully reverted. |
Summary: Update AOTI to return a list of files that it generates when `aot_inductor.package=True`. Then we will only package the files that are in that list. This should fix the [caching issue](https://fb.workplace.com/groups/1028545332188949/permalink/1081702043539944/) and hopefully #140053. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov Reviewed By: pianpwk Differential Revision: D65862850 Pulled By: angelayi
bd5b0d1
to
464e2a8
Compare
This pull request was exported from Phabricator. Differential Revision: D65862850 |
Summary: Reland #140022 Test Plan: CI Differential Revision: D65929964
Summary: Reland #140022 Test Plan: CI Differential Revision: D65929964 Pull Request resolved: #140675 Approved by: https://github.com/desertfire
Update AOTI to return a list of files that it generates when `aot_inductor.package=True`. Then we will only package the files that are in that list. This should fix the [caching issue](https://fb.workplace.com/groups/1028545332188949/permalink/1081702043539944/) and hopefully pytorch#140053. Pull Request resolved: pytorch#140022 Approved by: https://github.com/larryliu0820, https://github.com/desertfire, https://github.com/malfet
This reverts commit 8c6abe5. Reverted pytorch#140022 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the lint failure is legit ([comment](pytorch#140022 (comment)))
Update AOTI to return a list of files that it generates when `aot_inductor.package=True`. Then we will only package the files that are in that list. This should fix the [caching issue](https://fb.workplace.com/groups/1028545332188949/permalink/1081702043539944/) and hopefully pytorch#140053. Pull Request resolved: pytorch#140022 Approved by: https://github.com/larryliu0820, https://github.com/desertfire, https://github.com/malfet
This reverts commit ba136a7. Reverted pytorch#140022 on behalf of https://github.com/angelayi due to sorry I realized I need to land from internal ([comment](pytorch#140022 (comment)))
) Summary: Reland pytorch#140022 Test Plan: CI Differential Revision: D65929964 Pull Request resolved: pytorch#140675 Approved by: https://github.com/desertfire
Update AOTI to return a list of files that it generates when
aot_inductor.package=True
. Then we will only package the files that are in that list.This should fix the caching issue and hopefully #140053.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov