[CI] Adds support for selecting experiments for workflows on runner determinator #137614

jeanschmidt · 2024-10-09T18:22:27Z

adds a default tag to experiment configurations, allowing to remove some experiments by default on the random draw:

        experiments:
            lf:
                rollout_perc: 25
            otherExp:
                rollout_perc: 25
                default: false
        ---

and includes the configuration to filter what experiments are of interest for a particular workflow (comma separated):

  get-test-label-type:
    name: get-test-label-type
    uses: ./.github/workflows/_runner-determinator.yml
    with:
      ...
      check_experiments: "awsa100"

The end goal, is to enable us to run multiple experiments, that are independent from one another. For example, while we still runs the LF infra experiment, we want to migrate other runners leveraging the current solution. A immediate UC is for the A100 instances, where we want to migrate to AWS.

Those new instances will during the migration period be labeled both awsa100.linux.gcp.a100 and linux.aws.a100. Once the experiment ends, we will remove the first confusing one.

jobs:
  get-build-label-type:
    name: get-build-label-type
    uses: ./.github/workflows/_runner-determinator.yml
    with:
      ...
  get-test-label-type:
    name: get-test-label-type
    uses: ./.github/workflows/_runner-determinator.yml
    with:
      ...
      check_experiments: "awsa100"
      
  linux-focal-cuda12_1-py3_10-gcc9-inductor-build:
    name: cuda12.1-py3.10-gcc9-sm80
    uses: ./.github/workflows/_linux-build.yml
    needs:
      - get-build-label-type
      - get-test-label-type
    with:
      runner_prefix: "${{ needs.get-build-label-type.outputs.label-type }}"
      ...
      test-matrix: |
        { include: [
          { config: "inductor_huggingface_perf_compare", shard: 1, num_shards: 1, runner: "${{ needs.get-test-label-type.outputs.label-type }}linux.gcp.a100" },
          ...
        ]}
      ...

experiments:
    lf:
        rollout_perc: 50
    awsa100:
        rollout_perc: 50
         default: false

pytorch-bot · 2024-10-09T18:22:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137614

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3118452 with merge base b71d0ac ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ZainRizvi · 2024-10-09T21:24:44Z

.github/scripts/runner_determinator.py

@@ -174,6 +182,13 @@ def parse_args() -> Any:
        required=True,
        help="Current GitHub ref type, branch or tag",
    )
+    parser.add_argument(
+        "--check-experiments",
+        type=_str_comma_separated_to_set,


TIL: you can pass in a function here!

ZainRizvi · 2024-10-09T21:26:54Z

.github/scripts/runner_determinator.py

+            do_check = True
+            if check_experiments:
+                if experiment_name not in check_experiments:
+                    exp_list = ", ".join(check_experiments)
+                    log.info(f"Skipping experiment '{experiment_name}', as it is not in the check_experiments list: {exp_list}")
+                    do_check = False


When an experiment is not a default experiment and not a check-experiment, even if the user is explicitly opted in the experiment should be disabled for them

ohh good point...

ZainRizvi · 2024-10-09T21:28:41Z

.github/scripts/runner_determinator.py

+                    exp_list = ", ".join(check_experiments)
+                    log.info(f"Skipping experiment '{experiment_name}', as it is not in the check_experiments list: {exp_list}")
+                    do_check = False
+            elif not experiment_settings.default:


if check experiments is enabled, then we should ignore the default setting and only use the check experiments settings.

Else you'll pull in the lf prefix as well

ZainRizvi · 2024-10-09T21:30:08Z

.github/scripts/runner_determinator.py

@@ -174,6 +182,13 @@ def parse_args() -> Any:
        required=True,
        help="Current GitHub ref type, branch or tag",
    )
+    parser.add_argument(
+        "--check-experiments",


naming suggestion:

Suggested change

"--check-experiments",

"--eligible-experiments",

github-actions

Please commit the suggested changes from pytorch's linter.

github-actions · 2024-10-10T23:54:15Z

.github/scripts/test_runner_determinator.py

+        """
+        prefix = rd.get_runner_prefix(settings_text, ["User2"], USER_BRANCH, {"otherExp"})
+        self.assertEqual(
+            "otherExp.", prefix, "Runner prefix not correct for User2"
+        )
+


Suggested change

"""

prefix = rd.get_runner_prefix(settings_text, ["User2"], USER_BRANCH, {"otherExp"})

self.assertEqual(

"otherExp.", prefix, "Runner prefix not correct for User2"

)

"""

prefix = rd.get_runner_prefix(

settings_text, ["User2"], USER_BRANCH, {"otherExp"}

)

self.assertEqual("otherExp.", prefix, "Runner prefix not correct for User2")

github-actions

Please commit the suggested changes from pytorch's linter.

github-actions · 2024-10-11T00:00:20Z

.github/scripts/runner_determinator.py

    for experiment_name, experiment_settings in settings.experiments.items():
-        enabled = False

        if not experiment_settings.all_branches and is_exception_branch(branch):


Suggested change

for experiment_name, experiment_settings in settings.experiments.items():

enabled = False

if not experiment_settings.all_branches and is_exception_branch(branch):

for experiment_name, experiment_settings in settings.experiments.items():

if not experiment_settings.all_branches and is_exception_branch(branch):

outdated

issues addressed

jeanschmidt · 2024-10-11T16:38:43Z

@pytorchbot merge

pytorchmergebot · 2024-10-11T16:40:36Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…eterminator (#137614) adds a `default` tag to experiment configurations, allowing to remove some experiments by default on the random draw: ``` experiments: lf: rollout_perc: 25 otherExp: rollout_perc: 25 default: false --- ``` and includes the configuration to filter what experiments are of interest for a particular workflow (comma separated): ``` get-test-label-type: name: get-test-label-type uses: ./.github/workflows/_runner-determinator.yml with: ... check_experiments: "awsa100" ``` The end goal, is to enable us to run multiple experiments, that are independent from one another. For example, while we still runs the LF infra experiment, we want to migrate other runners leveraging the current solution. A immediate UC is for the A100 instances, where we want to migrate to AWS. Those new instances will during the migration period be labeled both `awsa100.linux.gcp.a100` and `linux.aws.a100`. Once the experiment ends, we will remove the first confusing one. ``` jobs: get-build-label-type: name: get-build-label-type uses: ./.github/workflows/_runner-determinator.yml with: ... get-test-label-type: name: get-test-label-type uses: ./.github/workflows/_runner-determinator.yml with: ... check_experiments: "awsa100" linux-focal-cuda12_1-py3_10-gcc9-inductor-build: name: cuda12.1-py3.10-gcc9-sm80 uses: ./.github/workflows/_linux-build.yml needs: - get-build-label-type - get-test-label-type with: runner_prefix: "${{ needs.get-build-label-type.outputs.label-type }}" ... test-matrix: | { include: [ { config: "inductor_huggingface_perf_compare", shard: 1, num_shards: 1, runner: "${{ needs.get-test-label-type.outputs.label-type }}linux.gcp.a100" }, ... ]} ... ``` ``` experiments: lf: rollout_perc: 50 awsa100: rollout_perc: 50 default: false ``` Pull Request resolved: #137614 Approved by: https://github.com/malfet

Adds support for select experiment on runner determinator

a3afe45

jeanschmidt requested a review from a team as a code owner October 9, 2024 18:22

pytorch-bot bot added the topic: not user facing topic category label Oct 9, 2024

jeanschmidt added 10 commits October 9, 2024 12:01

.

1c1d3c6

.

fc3ea8a

.

ba7661b

.

fecd267

.

040d907

.

aafccfa

.

ec48c95

.

8d25e13

.

f2cf288

.

a86f7c3

jeanschmidt changed the title ~~Adds support for select experiment on runner determinator~~ Adds support for selecting experiments for workflows on runner determinator Oct 9, 2024

removing test label gather

3293788

ZainRizvi previously requested changes Oct 9, 2024

View reviewed changes

jeanschmidt added 4 commits October 9, 2024 16:05

.

7738b0f

.

c81ac84

.

a0abe3f

.

fb9a363

github-actions bot requested changes Oct 10, 2024

View reviewed changes

jeanschmidt added 2 commits October 10, 2024 16:55

.

556033c

.

c031aca

github-actions bot previously requested changes Oct 11, 2024

View reviewed changes

.

9c3fa3d

jeanschmidt requested a review from ZainRizvi October 11, 2024 00:08

malfet approved these changes Oct 11, 2024

View reviewed changes

.

3118452

jeanschmidt changed the title ~~Adds support for selecting experiments for workflows on runner determinator~~ [CI] Adds support for selecting experiments for workflows on runner determinator Oct 11, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 11, 2024

pytorchmergebot added the merging label Oct 11, 2024

jeanschmidt self-assigned this Oct 11, 2024

zxiiro self-requested a review October 11, 2024 18:18

pytorchmergebot added the Merged label Oct 11, 2024

pytorchmergebot closed this in 2cb983a Oct 11, 2024

pytorchmergebot removed the merging label Oct 11, 2024

github-actions bot deleted the jeanschmidt/runner_determinator_default branch November 11, 2024 02:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Adds support for selecting experiments for workflows on runner determinator #137614

[CI] Adds support for selecting experiments for workflows on runner determinator #137614

Uh oh!

jeanschmidt commented Oct 9, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 9, 2024 •

edited

Loading

Uh oh!

ZainRizvi Oct 9, 2024

Uh oh!

ZainRizvi Oct 9, 2024

Uh oh!

jeanschmidt Oct 9, 2024

Uh oh!

ZainRizvi Oct 9, 2024

Uh oh!

ZainRizvi Oct 9, 2024

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot Oct 10, 2024

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot Oct 11, 2024

Uh oh!

jeanschmidt commented Oct 11, 2024

Uh oh!

pytorchmergebot commented Oct 11, 2024

Uh oh!

Uh oh!

[CI] Adds support for selecting experiments for workflows on runner determinator #137614

[CI] Adds support for selecting experiments for workflows on runner determinator #137614

Uh oh!

Conversation

jeanschmidt commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137614

✅ No Failures

Uh oh!

ZainRizvi Oct 9, 2024

Choose a reason for hiding this comment

Uh oh!

ZainRizvi Oct 9, 2024

Choose a reason for hiding this comment

Uh oh!

jeanschmidt Oct 9, 2024

Choose a reason for hiding this comment

Uh oh!

ZainRizvi Oct 9, 2024

Choose a reason for hiding this comment

Uh oh!

ZainRizvi Oct 9, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Oct 10, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Oct 11, 2024

Choose a reason for hiding this comment

Uh oh!

jeanschmidt commented Oct 11, 2024

Uh oh!

pytorchmergebot commented Oct 11, 2024

Merge started

Uh oh!

Uh oh!

jeanschmidt commented Oct 9, 2024 •

edited

Loading

pytorch-bot bot commented Oct 9, 2024 •

edited

Loading