CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 24.7k
[inductor][memory] restructuring memory.py and turn on the flag #137205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
π Helpful Linksπ§ͺ See artifacts and rendered test results at hud.pytorch.org/pr/137205
Note: Links to docs will display an error until the docs builds have been completed. β No FailuresAs of commit 3cba60a with merge base 10a34dc ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot label "topic: not user facing" |
@@ -28,48 +29,35 @@ class MemoryPlanningInfoForBuffer: | |||
succ_nodes: OrderedSet[BaseSchedulerNode] = dataclasses.field( | |||
default_factory=OrderedSet | |||
) | |||
outdegree: int = 0 # this is used only in topological_sort_lpmf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In MemoryPlanningInfoForBuffer
, I am only keeping the attributes that are static. For dynamically set attributes (e.g., outdegree
), will keep track of them during the function that needs them.
Similarly, removed e.g., memory_to_free
and indegree
from MemoryPlanningInfoForNode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great ! i had one concern about potential O(n^2) but turns out that is not a real issue (commenting on other pr for continuity). Can we please submit a dashboard run first ? if we are at all worried about riskiness of change it can also make sense to add a JK for enablement, or enable in OSS first then fbcode
@@ -259,7 +259,7 @@ def autotune_remote_cache_default() -> Optional[bool]: | |||
] | |||
|
|||
# enable operator reordering for peak memory optimization | |||
reorder_for_peak_memory = os.environ.get("TORCHINDUCTOR_REORDER_FOR_PEAK_MEMORY") == "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we do a dashboard run yet ? could we do that before landing ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's easiest to do via ghstack but i think you can also push to origin/main or something I forget exactly. would recommend ghstack.
If you've already submitted this pr, the easiest to do that is just to git reset --soft HEAD~1; git commit -m "tmp" ; ghstack
and benchmark this commit as another pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to figure this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry one other thing - we should test out the various implementations on ci runs as well, if possible. unless you already have clear idea that one is pareto optimal (compilation time, average memory change, worst memory change).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are at all worried about riskiness of change it can also make sense to add a JK for enablement, or enable in OSS first then fbcode
I was initially very concerned about this, but as I am trying this flag on for more internal models, I am becoming less concerned. Though there are many potential situations I am not aware of, so I think it would be the best to leave this decision to the PyTorch team. You guys are are the experts here :) One possibility is that once we turn it on, I can monitor it carefully for a while?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems all the dashboard run are done. This is the screenshot for the the summary (seems nothing is signficant).

Individual breakdowns can be found here:
- gh/xuanzhang816/1/head -- using all three heuristics
- gh/xuanzhang816/2/head -- lpmf only
- gh/xuanzhang816/3/head -- bfs only
- gh/xuanzhang816/4/head -- dfs only
@pytorchmergebot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
@pytorchbot drci |
cd8531c
to
d046fea
Compare
@pytorchmergebot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_distributed, 1, 1, linux.g5.12xlarge.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
@pytorchmergebot merge -f "remaining tests queued for too long" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Addressing additional comments given in PR #134874
cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @rec