CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 24.7k
[CI] Adds support for selecting experiments for workflows on runner determinator #137614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137614
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 3118452 with merge base b71d0ac ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@@ -174,6 +182,13 @@ def parse_args() -> Any: | |||
required=True, | |||
help="Current GitHub ref type, branch or tag", | |||
) | |||
parser.add_argument( | |||
"--check-experiments", | |||
type=_str_comma_separated_to_set, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL: you can pass in a function here!
do_check = True | ||
if check_experiments: | ||
if experiment_name not in check_experiments: | ||
exp_list = ", ".join(check_experiments) | ||
log.info(f"Skipping experiment '{experiment_name}', as it is not in the check_experiments list: {exp_list}") | ||
do_check = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When an experiment is not a default experiment and not a check-experiment, even if the user is explicitly opted in the experiment should be disabled for them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh good point...
exp_list = ", ".join(check_experiments) | ||
log.info(f"Skipping experiment '{experiment_name}', as it is not in the check_experiments list: {exp_list}") | ||
do_check = False | ||
elif not experiment_settings.default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if check experiments is enabled, then we should ignore the default
setting and only use the check
experiments settings.
Else you'll pull in the lf
prefix as well
@@ -174,6 +182,13 @@ def parse_args() -> Any: | |||
required=True, | |||
help="Current GitHub ref type, branch or tag", | |||
) | |||
parser.add_argument( | |||
"--check-experiments", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming suggestion:
"--check-experiments", | |
"--eligible-experiments", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please commit the suggested changes from pytorch's linter.
""" | ||
prefix = rd.get_runner_prefix(settings_text, ["User2"], USER_BRANCH, {"otherExp"}) | ||
self.assertEqual( | ||
"otherExp.", prefix, "Runner prefix not correct for User2" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | |
prefix = rd.get_runner_prefix(settings_text, ["User2"], USER_BRANCH, {"otherExp"}) | |
self.assertEqual( | |
"otherExp.", prefix, "Runner prefix not correct for User2" | |
) | |
""" | |
prefix = rd.get_runner_prefix( | |
settings_text, ["User2"], USER_BRANCH, {"otherExp"} | |
) | |
self.assertEqual("otherExp.", prefix, "Runner prefix not correct for User2") | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please commit the suggested changes from pytorch's linter.
for experiment_name, experiment_settings in settings.experiments.items(): | ||
enabled = False | ||
|
||
if not experiment_settings.all_branches and is_exception_branch(branch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for experiment_name, experiment_settings in settings.experiments.items(): | |
enabled = False | |
if not experiment_settings.all_branches and is_exception_branch(branch): | |
for experiment_name, experiment_settings in settings.experiments.items(): | |
if not experiment_settings.all_branches and is_exception_branch(branch): |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…eterminator (#137614) adds a `default` tag to experiment configurations, allowing to remove some experiments by default on the random draw: ``` experiments: lf: rollout_perc: 25 otherExp: rollout_perc: 25 default: false --- ``` and includes the configuration to filter what experiments are of interest for a particular workflow (comma separated): ``` get-test-label-type: name: get-test-label-type uses: ./.github/workflows/_runner-determinator.yml with: ... check_experiments: "awsa100" ``` The end goal, is to enable us to run multiple experiments, that are independent from one another. For example, while we still runs the LF infra experiment, we want to migrate other runners leveraging the current solution. A immediate UC is for the A100 instances, where we want to migrate to AWS. Those new instances will during the migration period be labeled both `awsa100.linux.gcp.a100` and `linux.aws.a100`. Once the experiment ends, we will remove the first confusing one. ``` jobs: get-build-label-type: name: get-build-label-type uses: ./.github/workflows/_runner-determinator.yml with: ... get-test-label-type: name: get-test-label-type uses: ./.github/workflows/_runner-determinator.yml with: ... check_experiments: "awsa100" linux-focal-cuda12_1-py3_10-gcc9-inductor-build: name: cuda12.1-py3.10-gcc9-sm80 uses: ./.github/workflows/_linux-build.yml needs: - get-build-label-type - get-test-label-type with: runner_prefix: "${{ needs.get-build-label-type.outputs.label-type }}" ... test-matrix: | { include: [ { config: "inductor_huggingface_perf_compare", shard: 1, num_shards: 1, runner: "${{ needs.get-test-label-type.outputs.label-type }}linux.gcp.a100" }, ... ]} ... ``` ``` experiments: lf: rollout_perc: 50 awsa100: rollout_perc: 50 default: false ``` Pull Request resolved: #137614 Approved by: https://github.com/malfet
adds a
default
tag to experiment configurations, allowing to remove some experiments by default on the random draw:and includes the configuration to filter what experiments are of interest for a particular workflow (comma separated):
The end goal, is to enable us to run multiple experiments, that are independent from one another. For example, while we still runs the LF infra experiment, we want to migrate other runners leveraging the current solution. A immediate UC is for the A100 instances, where we want to migrate to AWS.
Those new instances will during the migration period be labeled both
awsa100.linux.gcp.a100
andlinux.aws.a100
. Once the experiment ends, we will remove the first confusing one.