CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Windows snapshotter touch ups and new functionality #6918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2fd2bba
to
907ddeb
Compare
High level comments:
|
@dcantah just let me know when you're ready for review. |
|
2d46b08
to
e0b0e4a
Compare
@dcantah - Yea I was getting ahead of myself I guess. Was thinking about |
Also, I should have read 2 first lol. You just blew my mind. Are you telling me that the writable layer is intrinsically tied to the prepared layers once a single container is running? That doesn't sound right. It's just an empty disk right? Its irrespective of the chain underneath it right? |
Sorry that was poorly worded/just plain wrong. wcifs will have a handle to files that are accessed in the scratch during the containers uptime. |
a9851ad
to
d7d66fe
Compare
@@ -183,7 +182,9 @@ func (c *criService) CreateContainer(ctx context.Context, r *runtime.CreateConta | |||
|
|||
log.G(ctx).Debugf("Container %q spec: %#+v", id, spew.NewFormatter(spec)) | |||
|
|||
snapshotterOpt := snapshots.WithLabels(snapshots.FilterInheritedLabels(config.Annotations)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not pass annotations as snapshot labels anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the reasoning is that the only label names that passed this check either have two /
, or are containerd.io/snapshot.ref
. The first seems valid as long as we decide to not care about CRI API outside of Kubelet (a decent decision IMO). The second we should double check to see if anyone is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I got rid of it because for actual Kubernetes usage it was doing nothing, and to be more specific we'd never even get as far as Containerd as an annotation in the containerd.io/snapshot/* form would fail validation. It seemed sane to me to not leave in something that would only take effect for crictl/some other cri client that isn't the Kubelet itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that's a good point on the second.. let me check. I think that's actually used for remote snapshotters now that I'm thinking of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, the label is set by the client itself during unpack. I think we're ok here. Don't see anybody passing this as an annotation to be set for the rootfs snapshot (which wouldn't make sense afaik)
containerd/pkg/unpack/unpacker.go
Line 297 in 42c6be8
labels[labelSnapshotRef] = chainID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is safe enough since k8s doesn't seem to provide a mechanism for the user to set annotations directly on the container spec, and even if they it seems unlikely someone would be manually setting containerd.io/snapshot.ref
with a correct chain ID and depending on this behavior. So while technically a breaking change this seems okay to me. Let's leave this thread open to see if another maintainer agrees though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
d7d66fe
to
65314c9
Compare
65314c9
to
c16b04a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
c16b04a
to
ca09fca
Compare
ca09fca
to
bdf4bda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks Danny!
Seems this needs a rebase. Will do shortly |
This change does a couple things to remove some cruft/unused functionality in the Windows snapshotter, as well as add a way to specify the rootfs size in bytes for a Windows container via a new field added in the CRI api in k8s 1.24. Setting the rootfs/scratch volume size was assumed to be working prior to this but turns out not to be the case. Previously I'd added a change to pass any annotations in the containerd snapshot form (containerd.io/snapshot/*) as labels for the containers rootfs snapshot. This was added as a means for a client to be able to provide containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb as an annotation and have that be translated to a label and ultimately set the size for the scratch volume in Windows. However, this actually only worked if interfacing with the CRI api directly (crictl) as Kubernetes itself will fail to validate annotations that if split by "/" end up with > 2 parts, which the snapshot labels will (containerd.io / snapshot / foobarbaz). With this in mind, passing the annotations and filtering to containerd.io/snapshot/* is moot, so I've removed this code in favor of a new `snapshotterOpts()` function that will return platform specific snapshotter options if ones exist. Now on Windows we can just check if RootfsSizeInBytes is set on the WindowsContainerResources struct and then return a snapshotter option that sets the right label. So all in all this change: - Gets rid of code to pass CRI annotations as labels down to snapshotters. - Gets rid of the functionality to create a 1GB sized scratch disk if the client provided a size < 20GB. This code is not used currently and has a few logical shortcomings as it won't be able to create the disk if a container is already running and using the same base layer. WCIFS (driver that handles the unioning of windows container layers together) holds open handles to some files that we need to delete to create the 1GB scratch disk is the underlying problem. - Deprecates the containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb label in favor of a new containerd.io/snapshot/windows/rootfs.sizebytes label. The previous label/annotation wasn't being used by us, and from a cursory github search wasn't being used by anyone else either. Now that there is a CRI field to specify the size, this should just be a field that users can set on their pod specs and don't need to concern themselves with what it eventually gets translated to, but non-CRI clients can still use the new label/deprecated label as usual. - Add test to cri integration suite to validate expanding the rootfs size. Signed-off-by: Daniel Canter <dcanter@microsoft.com>
bdf4bda
to
44e12dc
Compare
This change does a couple things to remove some cruft/unused functionality
in the Windows snapshotter, as well as add a way to specify the rootfs
size in bytes for a Windows container via a new field added in the CRI api in
k8s 1.24. Setting the rootfs/scratch volume size was assumed to be working
prior to this but turns out not to be the case.
Previously I'd added a change to pass any annotations in the containerd
snapshot form (containerd.io/snapshot/*) as labels for the containers
rootfs snapshot. This was added as a means for a client to be able to provide
containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb as an
annotation and have that be translated to a label and ultimately set the
size for the scratch volume in Windows. However, this actually only worked if
interfacing with the CRI api directly (crictl) as Kubernetes itself will
fail to validate annotations that if split by "/" end up with > 2 parts,
which the snapshot labels will (containerd.io / snapshot / foobarbaz).
With this in mind, passing the annotations and filtering to
containerd.io/snapshot/* is moot, so I've removed this code in favor of
a new
snapshotterOpts()
function that will return platform specificsnapshotter options if ones exist. Now on Windows we can just check if
RootfsSizeInBytes is set on the WindowsContainerResources struct and
then return a snapshotter option that sets the right label.
So all in all this change:
Gets rid of code to pass CRI annotations as labels down to snapshotters.
Gets rid of the functionality to create a 1GB sized scratch disk if
the client provided a size < 20GB. This code is not used currently and
has a few logical shortcomings as it won't be able to create the disk
if a container is already running and using the same base layer. WCIFS
(driver that handles the unioning of windows container layers together)
holds open handles to some files that we need to delete to create the
1GB scratch disk is the underlying problem.
Deprecates the containerd.io/snapshot/io.microsoft.container.storage.rootfs.size-gb
label in favor of a new containerd.io/snapshot.rootfs.size-bytes label.
The previous label/annotation wasn't being used by us, and from a cursory
github search wasn't being used by anyone else either. Now that there is a CRI
field to specify the size, this should just be a field that users can set
on their pod specs and don't need to concern themselves with what it eventually
gets translated to, but I've added a more generic label for non-CRI clients.
Verified rootfs size work via: