CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 24.7k
Use amazon linux 2023 runners for Docker builds #136544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
đź”— Helpful Linksđź§Ş See artifacts and rendered test results at hud.pytorch.org/pr/136544
Note: Links to docs will display an error until the docs builds have been completed. âś… You can merge normally! (4 Unrelated Failures)As of commit be8ec13 with merge base eac04fe ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
dismissing until the docker builds actually work
b0ba475
to
6863c4c
Compare
537d542
to
e618fc4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not resolve the fix in test-infra first and then merge the pytorch PR after it starts consuming those test-infra changes? That would help test the changes in a clean environment
.ci/docker/conda/Dockerfile
Outdated
@@ -5,6 +5,7 @@ FROM centos:7 as base | |||
ENV LC_ALL en_US.UTF-8 | |||
ENV LANG en_US.UTF-8 | |||
ENV LANGUAGE en_US.UTF-8 | |||
RUN echo "ulimit is $(ulimit -n)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove
@pytorchbot merge -f "Lint is green, let's test it in prod..." |
@@ -37,6 +37,12 @@ esac | |||
|
|||
( | |||
set -x | |||
# TODO: Remove LimitNOFILE=1048576 patch once https://github.com/pytorch/test-infra/issues/5712 | |||
# is resolved. This patch is required in order to fix timing out of Docker build on Amazon Linux 2023. | |||
sudo sed -i s/LimitNOFILE=infinity/LimitNOFILE=1048576/ /usr/lib/systemd/system/docker.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please add a link to the issue that you discovered which explains the problem in more detail
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Migrate these builds to linux 2023. We want to build and test the Docker images in CD. Looks like we are hitting this issue: docker/buildx#379 when trying to build Docker on Amazon Linux 2023. Conda Docker build is timing out. While Manywheel is executing but failing because BUILDKIT is turned off: https://github.com/pytorch/pytorch/actions/runs/11036043157/job/30653543264?pr=136544 Proposed Solution is to fix it in user_data . Please see: pytorch/test-infra#5712 I see docker builds are executed successfully here: https://github.com/pytorch/pytorch/actions/runs/11040149229/job/30667448668?pr=136544 Workaround timeout problem (reported in https://bugzilla.redhat.com/show_bug.cgi?id=1537564 ) by configuring number of open files per container to 1048576 Pull Request resolved: pytorch#136544 Approved by: https://github.com/ZainRizvi Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Migrate these builds to linux 2023. We want to build and test the Docker images in CD.
Looks like we are hitting this issue: docker/buildx#379 when trying to build Docker on Amazon Linux 2023.
Conda Docker build is timing out. While Manywheel is executing but failing because BUILDKIT is turned off: https://github.com/pytorch/pytorch/actions/runs/11036043157/job/30653543264?pr=136544
Proposed Solution is to fix it in user_data . Please see: pytorch/test-infra#5712
I see docker builds are executed successfully here: https://github.com/pytorch/pytorch/actions/runs/11040149229/job/30667448668?pr=136544
Workaround timeout problem (reported in https://bugzilla.redhat.com/show_bug.cgi?id=1537564 ) by configuring number of open files per container to 1048576