CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 24.7k
[CI] Introduces experiment awsa100
to inductor-perf-compare.yml
workflow using _runner-determinator.yml
#138204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138204
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 970eb24 with merge base 354bc3a ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
…orkflow using `_runner-determinator.yml`
…orkflow using `_runner-determinator.yml`
Successfully rebased |
68e70f2
to
0688c84
Compare
@@ -13,30 +13,42 @@ concurrency: | |||
permissions: read-all | |||
|
|||
jobs: | |||
get-label-type: | |||
name: get-label-type | |||
get-build-label-type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Feels kinda weird to call this the build label since this is technically the default, cross-fleet label. It makes sense in the context of this file, but this isn't a convention that can be copied over to other workflows
Naming things is hard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just keep the job name as get-label-type
?
Feels confusing to me TBH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps default-label-prefix
could be a new convention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay, this is finally here. Let me know if you plan to switch to a different runner label as awsa100.linux.gpu.a100
sounds weird
@huydhn indeed it sounds very weird, this is done so we can leverage the current experiment stack without the need to perform some complex rewrite of the tooling. The idea is to migrate to |
@pytorchbot merge -f "changes can't impact trunk" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…orkflow using `_runner-determinator.yml` (#138204) Adds the job `get-test-label-type` in `.github/workflows/inductor-perf-compare.yml` checking for the experiment `awsa100`. It is then used by the job `linux-focal-cuda12_1-py3_10-gcc9-inductor-build` to define the prefix for the runners that will run the benchmark. Those runners temporarily accept the labels `awsa100.linux.gcp.a100` and `linux.aws.a100`. This is used so we can migrate via experimentation from `linux.gcp.a100`. After successfully experiment with those instances we will remove those labels and update the workflows to use `linux.aws.a100` and decomisson the gcp fleet. Pull Request resolved: #138204 Approved by: https://github.com/ZainRizvi, https://github.com/huydhn
Adds the job
get-test-label-type
in.github/workflows/inductor-perf-compare.yml
checking for the experimentawsa100
.It is then used by the job
linux-focal-cuda12_1-py3_10-gcc9-inductor-build
to define the prefix for the runners that will run the benchmark.Those runners temporarily accept the labels
awsa100.linux.gcp.a100
andlinux.aws.a100
. This is used so we can migrate via experimentation fromlinux.gcp.a100
. After successfully experiment with those instances we will remove those labels and update the workflows to uselinux.aws.a100
and decomisson the gcp fleet.