CARVIEW |
Select Language
HTTP/2 200
date: Tue, 22 Jul 2025 23:16:11 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
x-repository-download: git clone https://github.com/pytorch/pytorch.git
etag: W/"d79957a721e224fd38093213cb688bfa"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=GLDwtyMh%2FFLYUo3X7zIderAno6CZ1ejjnIx1N6pk5a4Ox5ZBfgw7tJC2VxqYFipsIo1wWkPr1uY2X4uYggWRQROt4NoqWVQgZwN%2BcnQT1zcVYtOwRLwn8D9M3bW35mpPxMQN6jMgyI4ybtAlgn3VcgXlERB98WXKUgtyPu0XGvymKn8OjfFnLVfyiy3aqsbLQsrsj%2FAOy5SyN3KX8kxJi5lMZxgWmD3zJ3N2K%2BoreTslYZLtMoA4Jx9uBzk06NUP9IlEZnCGS1A7gxA8gkDerw%3D%3D--XbKhJA9uHp7u1IgL--VSWzmF8vi7Re4M5vDtGPkA%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.221058980.1753226170; Path=/; Domain=github.com; Expires=Wed, 22 Jul 2026 23:16:10 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Wed, 22 Jul 2026 23:16:10 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: 9BDC:36F97A:1BCEEF:25E7DF:68801BBA
[c10d][Partial-Graph Overlap] Support calling .wait_tensor() within c… · pytorch/pytorch@362ca54 · GitHub
Copy file name to clipboardExpand all lines: test/distributed/test_c10d_functional_native.py
Copy file name to clipboardExpand all lines: test/distributed/test_c10d_nccl.py
Copy file name to clipboardExpand all lines: test/distributed/test_inductor_collectives.py
Copy file name to clipboardExpand all lines: torch/csrc/distributed/c10d/Functional.cpp
Copy file name to clipboard
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 24.7k
Commit 362ca54
[c10d][Partial-Graph Overlap] Support calling .wait_tensor() within compiled region on output tensor of eager
This PR aims to support the following use case:
```python
def all_reduce_eager(x):
y = x * x
req = dist.all_reduce(y, op=dist.ReduceOp.SUM, async_op=True)
assert isinstance(req, torch.distributed.Work)
return y
@torch.compile(fullgraph=True)
def all_reduce_wait_compiled(y):
torch.ops.c10d_functional.wait_tensor(y)
return y * y
```
where the collective is issued in eager (with `async_op=True`) but waited in compiled region.
This is important for internal use cases such as TorchRec, where we issue collectives in eager for SparseArch all_to_all but want to wait for them in compiled region at beginning of OverArch, so that the all_to_all can be overlapped with the DenseArch compute that runs in parallel.
------
Test commands:
- `pytest -rA test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_eager_async_allreduce_inductor_wait`
- `pytest -rA test/test_fx.py::TestDCE::test_keep_collectives`
- `pytest -rA test/test_fx.py::TestDCE::test_keep_collectives_no_overload`
- `pytest -rA test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_unwaited`
- `pytest -rA test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_work_registry`
- `pytest -rA test/distributed/test_c10d_nccl.py::CommTest::test_unwaited`
- `pytest -rA test/distributed/test_c10d_nccl.py::CommTest::test_work_registry`
- `pytest -rA test/distributed/_tensor/test_tensor_ops.py::DistTensorOpsTest::test_equal`
- `pytest -rA test/distributed/_tensor/test_random_ops.py::DistTensorRandomOpTest::test_manual_seed`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_ddp_baseline_aot_eager_multiprocess`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_fsdp_aot_eager`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_fsdp_setattr`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_fsdp_unspecialized_forced_getattr_inline`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_fsdp_unspecialized_forced_getattr_no_inline`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_asymmetric_compilation`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_compiler_collectives_automatic_dynamic_scalar`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_compiler_collectives_automatic_dynamic_speculation_divergence`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_compiler_collectives_automatic_dynamic_tensor`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_compiler_collectives_dim_mismatch`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_compiler_collectives_graph_break_empty_graph_still_collective`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_compiler_collectives_missing_source`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_compiler_collectives_scalar_missing_source`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_compiler_collectives_type_mismatch`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_ddp_activation_checkpointing`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_ddp_baseline_aot_eager_multiprocess`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_fsdp_activation_checkpointing`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_fsdp_aot_eager`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_fsdp_inductor`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_fsdp_setattr`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_fsdp_unspecialized_forced_getattr_inline`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_fsdp_unspecialized_forced_getattr_no_inline`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_hf_bert_ddp_aot_eager`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_hf_bert_ddp_aot_eager_static_graph`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_hf_bert_ddp_inductor`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_hf_bert_ddp_inductor_static_graph`
- `pytest -rA test/distributed/test_dynamo_distributed.py::TestMultiProc::test_hf_bert_fsdp_activation_checkpointing`
- `pytest -rA test/distributed/_tensor/test_experimental_ops.py::DistOtherOpsTest::test_bernoulli`
- `pytest -rA test/distributed/_tensor/test_dtensor_compile.py::TestDTensorCompileE2E::test_tp_compile_fullgraph_is_seq_parallel_True`
- `pytest -rA test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_allreduce_inductor_cudagraph_trees`
- `python benchmarks/dynamo/torchbench.py --ci --accuracy --timing --explain --inductor --device cuda --inference --bfloat16 --total-partitions 2 --partition-id 1 --output inference_torchbench.csv --only moco`
------
Differential Revision: [D64511994](https://our.internmc.facebook.com/intern/diff/D64511994)
Pull Request resolved: #137763
Approved by: https://github.com/yifuwangasync_op=True
collective (#137763)1 parent a170ff4 commit 362ca54Copy full SHA for 362ca54
File tree
Expand file treeCollapse file tree
13 files changed
+417
-125
lines changedFilter options
- test/distributed
- torch/csrc/distributed/c10d
Expand file treeCollapse file tree
13 files changed
+417
-125
lines changedtest/distributed/test_c10d_functional_native.py
Copy file name to clipboardExpand all lines: test/distributed/test_c10d_functional_native.py+18Lines changed: 18 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
405 | 405 |
| |
406 | 406 |
| |
407 | 407 |
| |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
408 | 424 |
| |
409 | 425 |
| |
410 | 426 |
| |
411 | 427 |
| |
412 | 428 |
| |
413 | 429 |
| |
414 | 430 |
| |
| 431 | + | |
415 | 432 |
| |
416 | 433 |
| |
417 | 434 |
| |
418 | 435 |
| |
419 | 436 |
| |
| 437 | + | |
420 | 438 |
| |
421 | 439 |
| |
422 | 440 |
| |
|
test/distributed/test_c10d_nccl.py
Copy file name to clipboardExpand all lines: test/distributed/test_c10d_nccl.py+42Lines changed: 42 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
3088 | 3088 |
| |
3089 | 3089 |
| |
3090 | 3090 |
| |
| 3091 | + | |
| 3092 | + | |
| 3093 | + | |
| 3094 | + | |
| 3095 | + | |
| 3096 | + | |
| 3097 | + | |
| 3098 | + | |
| 3099 | + | |
| 3100 | + | |
| 3101 | + | |
| 3102 | + | |
| 3103 | + | |
| 3104 | + | |
| 3105 | + | |
| 3106 | + | |
| 3107 | + | |
| 3108 | + | |
| 3109 | + | |
| 3110 | + | |
| 3111 | + | |
| 3112 | + | |
| 3113 | + | |
| 3114 | + | |
| 3115 | + | |
| 3116 | + | |
| 3117 | + | |
| 3118 | + | |
| 3119 | + | |
| 3120 | + | |
| 3121 | + | |
| 3122 | + | |
| 3123 | + | |
| 3124 | + | |
| 3125 | + | |
| 3126 | + | |
| 3127 | + | |
| 3128 | + | |
| 3129 | + | |
| 3130 | + | |
| 3131 | + | |
| 3132 | + | |
3091 | 3133 |
| |
3092 | 3134 |
| |
3093 | 3135 |
| |
|
test/distributed/test_inductor_collectives.py
Copy file name to clipboardExpand all lines: test/distributed/test_inductor_collectives.py+71-1Lines changed: 71 additions & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
1 | 1 |
| |
| 2 | + | |
2 | 3 |
| |
3 | 4 |
| |
4 | 5 |
| |
| |||
14 | 15 |
| |
15 | 16 |
| |
16 | 17 |
| |
17 |
| - | |
| 18 | + | |
18 | 19 |
| |
19 | 20 |
| |
20 | 21 |
| |
| |||
28 | 29 |
| |
29 | 30 |
| |
30 | 31 |
| |
| 32 | + | |
31 | 33 |
| |
32 | 34 |
| |
33 | 35 |
| |
| |||
245 | 247 |
| |
246 | 248 |
| |
247 | 249 |
| |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
248 | 318 |
| |
249 | 319 |
| |
250 | 320 |
| |
|
torch/csrc/distributed/c10d/Functional.cpp
Copy file name to clipboardExpand all lines: torch/csrc/distributed/c10d/Functional.cpp+4-95Lines changed: 4 additions & 95 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
6 | 6 |
| |
7 | 7 |
| |
8 | 8 |
| |
9 |
| - | |
10 | 9 |
| |
11 | 10 |
| |
12 | 11 |
| |
13 | 12 |
| |
14 |
| - | |
15 |
| - | |
16 |
| - | |
17 |
| - | |
18 |
| - | |
19 |
| - | |
20 |
| - | |
21 |
| - | |
22 |
| - | |
23 |
| - | |
24 |
| - | |
25 |
| - | |
26 |
| - | |
27 |
| - | |
28 |
| - | |
29 |
| - | |
30 |
| - | |
31 |
| - | |
32 |
| - | |
33 |
| - | |
34 |
| - | |
35 |
| - | |
36 |
| - | |
37 |
| - | |
38 |
| - | |
39 |
| - | |
40 |
| - | |
41 |
| - | |
42 |
| - | |
43 |
| - | |
44 |
| - | |
45 |
| - | |
46 |
| - | |
47 |
| - | |
48 |
| - | |
49 |
| - | |
50 |
| - | |
51 |
| - | |
52 |
| - | |
53 |
| - | |
54 |
| - | |
55 |
| - | |
56 |
| - | |
57 |
| - | |
58 |
| - | |
59 |
| - | |
60 |
| - | |
61 |
| - | |
62 |
| - | |
63 |
| - | |
64 |
| - | |
65 |
| - | |
66 |
| - | |
67 |
| - | |
68 |
| - | |
69 |
| - | |
70 |
| - | |
71 |
| - | |
72 |
| - | |
73 |
| - | |
74 |
| - | |
75 |
| - | |
76 |
| - | |
77 |
| - | |
78 |
| - | |
79 |
| - | |
80 |
| - | |
81 |
| - | |
82 |
| - | |
83 | 13 |
| |
84 | 14 |
| |
85 | 15 |
| |
| |||
112 | 42 |
| |
113 | 43 |
| |
114 | 44 |
| |
115 |
| - | |
116 | 45 |
| |
117 | 46 |
| |
118 | 47 |
| |
| |||
135 | 64 |
| |
136 | 65 |
| |
137 | 66 |
| |
138 |
| - | |
139 |
| - | |
140 |
| - | |
141 | 67 |
| |
142 | 68 |
| |
143 | 69 |
| |
| |||
178 | 104 |
| |
179 | 105 |
| |
180 | 106 |
| |
181 |
| - | |
182 |
| - | |
183 |
| - | |
184 | 107 |
| |
185 | 108 |
| |
186 | 109 |
| |
| |||
202 | 125 |
| |
203 | 126 |
| |
204 | 127 |
| |
205 |
| - | |
206 | 128 |
| |
207 | 129 |
| |
208 | 130 |
| |
| |||
238 | 160 |
| |
239 | 161 |
| |
240 | 162 |
| |
241 |
| - | |
242 |
| - | |
243 |
| - | |
244 | 163 |
| |
245 | 164 |
| |
246 | 165 |
| |
| |||
272 | 191 |
| |
273 | 192 |
| |
274 | 193 |
| |
275 |
| - | |
276 | 194 |
| |
277 | 195 |
| |
278 | 196 |
| |
| |||
284 | 202 |
| |
285 | 203 |
| |
286 | 204 |
| |
287 |
| - | |
288 | 205 |
| |
289 | 206 |
| |
290 | 207 |
| |
| |||
296 | 213 |
| |
297 | 214 |
| |
298 | 215 |
| |
299 |
| - | |
300 |
| - | |
301 |
| - | |
302 |
| - | |
303 |
| - | |
304 |
| - | |
305 |
| - | |
306 |
| - | |
307 | 216 |
| |
308 | 217 |
| |
309 | 218 |
| |
| |||
389 | 298 |
| |
390 | 299 |
| |
391 | 300 |
| |
392 |
| - | |
| 301 | + | |
393 | 302 |
| |
394 | 303 |
| |
395 | 304 |
| |
| |||
438 | 347 |
| |
439 | 348 |
| |
440 | 349 |
| |
441 |
| - | |
| 350 | + | |
442 | 351 |
| |
443 | 352 |
| |
444 | 353 |
| |
| |||
493 | 402 |
| |
494 | 403 |
| |
495 | 404 |
| |
496 |
| - | |
| 405 | + | |
497 | 406 |
| |
498 | 407 |
| |
499 | 408 |
| |
| |||
549 | 458 |
| |
550 | 459 |
| |
551 | 460 |
| |
552 |
| - | |
| 461 | + | |
553 | 462 |
| |
554 | 463 |
| |
555 | 464 |
| |
|
torch/csrc/distributed/c10d/Functional.hpp
Copy file name to clipboard+1-9Lines changed: 1 addition & 9 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
1 | 1 |
| |
2 | 2 |
| |
3 |
| - | |
4 |
| - | |
5 |
| - | |
6 |
| - | |
7 |
| - | |
8 |
| - | |
9 |
| - | |
10 |
| - | |
11 |
| - | |
| 3 | + |
You can’t perform that action at this time.
0 commit comments