CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Use Windows matcher when on Windows platform in all code paths #6491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @jsturtevant. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
5635531
to
f01414f
Compare
/ok-to-test |
/hold |
This looks good based on the previous investigation I did: #6168 (comment). Although we probably don't want to unset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some tests? Windows' platform handling seems really tricky...
There is a bit more going on. I have a partial fix that I can push up but it caused some other issues around LOCW support on windows. The LCOW support doesn't really work since the matcher would never pass the osPrefix version check The main thing is on Windows some code paths (such as image pulling) are using the windows matcher and other it is using linux matcher. This is because the Ordered function news up a matcher that will be Linux default matcher that doesn't include OSVersion Prefix matching. In the case of Windows image pull scenario with ctr this means it matched and pulled both Windows version as described in containerd/nerdctl#751 (comment: |
I added 5ae90c6 commit to demonstrate my comments in #6491 (comment) In particular that:
|
I just opened a PR to address the
Would LCOW use the default matcher though? Given that in the LCOW case, the container effectively runs "remotely", wouldn't it require a matcher that's created based on that environment? |
platforms/compare_test.go
Outdated
@@ -75,6 +81,18 @@ func TestOnly(t *testing.T) { | |||
}, | |||
}, | |||
}, | |||
{ | |||
platform: fmt.Sprintf("windows/amd64//%s", buildStr+".1"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly orthogonal, but wondering if the string representation for osversion (and features) was already "formalised" . I recall there were some discussion about this in the past (but don't recall the outcome). We should probably have some of this added to the OCI specs at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if this valid string. The parser doesn't actually work in this case but was trying to demonstrate this isn't really handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! Yes, I also was looking at Parse() not yet supporting it (and left that for a later exercise).
Empty //
may work for missing components (but could potentially be problematic if someone (eg) uses path.Join()
to construct the string).
Perhaps there's alternatives (/-/
for explicitly omitting a field, or /*/
); in any case, likely something for further discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussing with someone else, and an alternative for the format could be os/arch/variant@os-version[+feature+feature...]
(we should look at the +
symbol though, in case it may be used in URLs (where +
can be ambiguous)
This might be the case but I not sure every architecture/variant of Linux will run on Windows so we may need a custom one? Maybe @kevpar knows? |
At least But I guess @kevpar and/or @katiewasnothere may have more insight |
58af887
to
e77fbb9
Compare
I've updated to use the appropriate matcher on a given OS. On Windows it will use a Windows Matcher which uses the default matcher and only adds the additional checks if the platform specific is Windows. Otherwise, the behavior should be the same which is to use the default matcher. This should cover the comment #6491 (comment) |
e77fbb9
to
b41c18e
Compare
This last commit works with the original issue reported (containerd/nerdctl#751 )
|
platforms/platforms.go
Outdated
@@ -127,15 +127,15 @@ type Matcher interface { | |||
Match(platform specs.Platform) bool | |||
} | |||
|
|||
// NewMatcher returns a simple matcher based on the provided platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best to have the exported function in the non-platform specific file then call into non-exported implementations. This will also need an other implementation besides just Linux and Windows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also need an other implementation besides just Linux and Windows
I've updated to include other implementations, it uses the same defaults as previous except Windows.
Best to have the exported function in the non-platform specific file then call into non-exported implementations
I don't think I fully understood what I should do here. could you explain a bit more? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewMatcher
is an exported function. So it should be in platforms.go which calls the un-exported newDefaultMatcher
in the various platforms_.go files. But actually thats another point. Why is NewDefaultMatcher
exported? There are no callers right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I see what I did and I've updated
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
12e159e
to
83c8038
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I've addressed all the feedback and this is ready for review. The node test failure was a Prow pod timeout. not sure if the same commands work here: |
One thing I just realised; so this changes the default on windows to match on architecture and os-version, correct? I think that's correct for containers running with process isolation, but would this be an issue when running with hyper-v isolation? (not sure of that's supported by containerd, but I know dockerd allows switching) |
@kevpar - Is there such a thing as x86 hypervisor isolated on Windows? I've never seen such an image to my knowledge. Would Seb's comment actually be possible? |
Hyper-V isolation is going to be messy... You'd have to expose a flag to Or you could always pull via the Hyper-V matching rules (a superset of the process isolation matching rules), and warn the user if the resulting image choice would not work in process isolation. Both of those options seem undesirable to me. The least-awful option might well be: Note in the documentation that if you want to use Hyper-V isolation to run an older image than your current host, pull the specific image you want to use, not an image index. That said, since the client code can provide its own matcher, perhaps this is a client thing to implement? Defaulting to process-isolation rules seems fine to me. Edit: Reading the above comments, I realise I'm assuming we're talking about Hyper-V isolation in regards to wider version matching. I'd never even considered the idea of cross-architecture support. I didn't think x86 would even be a thing in containers (32-bits is so last millennium...) and I don't expect to see Arm64 Windows Server outside my most feverish of dreams for a while, yet. |
Yea, ..., no good answer here imo. I think I agree with @thaJeztah earlier comment. Windows should match Windows process execution by default. Only the caller has enough context to know the intent of the container that will be scheduled next. IE: The caller knows if the container config is isolated or not. But it does not tell that to containerd when pulling images. But it CAN by providing a different matcher when pulling images. So I think the right default is Windows process works as expected, and if hypervisor is needed caller can control it. |
The windows matcher did match on os-version already but wasn't being used in every code path. For example, If a client used the I am under the impression that Hyper-v containers are not supported right now and agree that we should use Windows Process as the default and leave it up the client to provide a different matcher for now. Maybe @dcantah or @kevpar knows more about the state of hyper-v contianers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
SGTM |
@jsturtevant - Any chance you have seen this?
Seems related to the change. But I can look closer if you are unaware. Oddly Images pull just fine, and unpack just fine. Here is for the gen index:
You can see the above warning. Here is for the
|
I am not, my suspicion is the matcher or string to match with is wired up wrong for that code path in ctr. This looks similar to what I was seeing nerdctl when I turned on debug mode. As a point of reference, We have a nightly job for containerd which has been running well since this change merged https://testgrid.k8s.io/sig-windows-master-release#aks-engine-windows-containerd-nightly |
Oh, I see. It looks like we have broken the unpacker for platform agnostic variants when the Host os doesn't match (meaning won't unpack so Xenon cant even run). So for example: On WS2022:
But if you add the
The |
@TBBle - I've already forgotten :(. Did we decide it was "ok" to break hv isolated pulls when the host os doesnt match the image hahaha? And that to do this would require the caller to pass in a custom matcher? I think my brain is giving out |
The fact that it worked before was accidental IMO. I am still not sure if hyper-v upstream is fully functional, talking with @dcantah it sounds like it isn't |
My $0.02: I think for cases like nerdctl and ctr we should probably have separate flag(s) to control the exact matching behavior so that cases like hypervisor-isolation can work. In CRI my view is we should add fields to One difficulty is last I checked containerd doesn't have great support for multi-arch images. I think there were places where it's assumed we're only working with a single platform at a time. |
So funny. |
@kevpar - Agreed. I think when I get around to it I will add something like: |
Yeah, that's what I took away from the conversation as well: Take Linux out of the containerd default Windows-host matcher, and let tools that are happy to support Hyper-V isolation and/or WCOW (i.e. Xenon aka UVM) provide a matcher to suit. It might be feasible to add a function that creates a Xenon-capable matcher, to save some code duplication in clients, and hence centralise any particular complex or specific rules that might appear. It might also make sense to put something like that in hcsshim or other central-but-Windows-host-specific repo instead. But that assumes a tool that wants to integrate Xenon wants to accept both kinds of matches, so maybe it's not that valuable after all. For CRI, I noticed that
It does make sense that if you are using |
@TBBle - That's exactly why we added I think we have consensus haha. We are good to go with the code as is. |
When looking into containerd/nerdctl#751 I found that the platform Normalization function is dropping the
OSVersion
because it was marked as depreciated but this value is used throughout the windowscontainerd/platforms/defaults_windows.go
Line 36 in 9c676e9
There maybe more going on here than just this change. For example, this may be why this code uses
Ordered
here and then adds theosVersionPrefix
.This change does make it so that nerdctl will work with multiarch manifests but there may need to be more work done here.