CARVIEW |
Select Language
HTTP/2 200
date: Wed, 23 Jul 2025 04:08:07 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
x-repository-download: git clone https://github.com/pytorch/pytorch.git
etag: W/"f497558b707ed9c561dcf3fc11b05176"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=cmo947Aj66UAbrPU5ZSirU2RQ3k0IVfPBn9Aa%2BFNH9vz6gvoUa%2Bd9r2ueeFVSADjXM6eHhS6PRRA0i2nsL7vPCYYjY0fwm0NO%2FTQD4%2BgrJJ4MsDmxXOICQkCs06idBO87gohqcC8GA%2ByinckqbEyMC1mbuowiG81zH5LYW%2F8MrNFxP2uORdpqRbWSWWNGGPPjoGOKRTd%2FAHedYMkmFoq8gClybpiyzHNJOZeOuPKAtvOLXS754Jd9eaEsOQYguITj1%2B4t5X52UkEVFXDVDdtXw%3D%3D--%2Bm6VWdVTSw8Ytpg8--VY4eKcTo1kLia2aeqa7t2A%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.1800517246.1753243687; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 04:08:07 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Thu, 23 Jul 2026 04:08:07 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: 8E08:17EAA:320195:456BD2:68806027
Fix rendezvous error due to EtcdStore get method not waiting in some … · pytorch/pytorch@18525e1 · GitHub
Copy file name to clipboardExpand all lines: test/distributed/elastic/rendezvous/etcd_rendezvous_backend_test.py
Copy file name to clipboardExpand all lines: torch/distributed/elastic/rendezvous/etcd_store.py
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 24.7k
Commit 18525e1
Fix rendezvous error due to EtcdStore get method not waiting in some cases (#137056)
Fixes #132950
This fixes an issue in `torch/distributed/elastic/rendezvous/etcd_store.py` where the [get method](https://github.com/pytorch/pytorch/blob/v2.4.0/torch/distributed/elastic/rendezvous/etcd_store.py#L60) does not wait as expected when no keys have been written under the store prefix yet (and therefore the store prefix key does not exist). This was because the `_try_wait_get` method would error out immediately [here](https://github.com/alenawang/pytorch/blob/main/torch/distributed/elastic/rendezvous/etcd_store.py#L179) if the prefix was not found instead of continuing to the etcd watch.
This was causing upstream issues where distributed jobs using etcd-v2 could not get past the initial rendezvous at all (details in issue #132950).
We added a test demonstrating this issue and the fix. Without the fix the test fails with `etcd.EtcdKeyNotFound: Key not found : /torch/elastic/store` instead of waiting for the first key to be written; with the fix the test waits properly.
Co-authored-by: tarat44 <32471142+tarat44@users.noreply.github.com>
Pull Request resolved: #137056
Approved by: https://github.com/fduwjj
Co-authored-by: tarat44 <32471142+tarat44@users.noreply.github.com>1 parent f108f88 commit 18525e1Copy full SHA for 18525e1
File tree
Expand file treeCollapse file tree
2 files changed
+48
-11
lines changedFilter options
- test/distributed/elastic/rendezvous
- torch/distributed/elastic/rendezvous
Expand file treeCollapse file tree
2 files changed
+48
-11
lines changedtest/distributed/elastic/rendezvous/etcd_rendezvous_backend_test.py
Copy file name to clipboardExpand all lines: test/distributed/elastic/rendezvous/etcd_rendezvous_backend_test.py+32Lines changed: 32 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
7 | 7 |
| |
8 | 8 |
| |
9 | 9 |
| |
| 10 | + | |
| 11 | + | |
10 | 12 |
| |
11 | 13 |
| |
12 | 14 |
| |
| |||
18 | 20 |
| |
19 | 21 |
| |
20 | 22 |
| |
| 23 | + | |
21 | 24 |
| |
22 | 25 |
| |
23 | 26 |
| |
| |||
146 | 149 |
| |
147 | 150 |
| |
148 | 151 |
| |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + |
torch/distributed/elastic/rendezvous/etcd_store.py
Copy file name to clipboardExpand all lines: torch/distributed/elastic/rendezvous/etcd_store.py+16-11Lines changed: 16 additions & 11 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
176 | 176 |
| |
177 | 177 |
| |
178 | 178 |
| |
179 |
| - | |
180 |
| - | |
181 |
| - | |
182 |
| - | |
183 |
| - | |
184 |
| - | |
185 |
| - | |
186 |
| - | |
187 |
| - | |
188 |
| - | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
189 | 193 |
| |
190 | 194 |
| |
191 | 195 |
| |
192 | 196 |
| |
193 | 197 |
| |
194 | 198 |
| |
| 199 | + | |
195 | 200 |
| |
196 | 201 |
| |
197 | 202 |
| |
198 | 203 |
| |
199 |
| - | |
| 204 | + | |
200 | 205 |
| |
201 | 206 |
| |
202 | 207 |
| |
|
You can’t perform that action at this time.
0 commit comments