-
Notifications
You must be signed in to change notification settings - Fork 41.1k
kubelet: invoke part of image verification policy earlier #133114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Hi @atykhyy. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
pkg/kubelet/images/image_manager.go
Outdated
@@ -178,7 +178,8 @@ func (m *imageManager) EnsureImageExists(ctx context.Context, objRef *v1.ObjectR | |||
return "", message, err | |||
} | |||
|
|||
if imageRef != "" && !utilfeature.DefaultFeatureGate.Enabled(features.KubeletEnsureSecretPulledImages) { | |||
// run the part of image verification policy that does not depend on credentials before looking them up | |||
if imageRef != "" && !(utilfeature.DefaultFeatureGate.Enabled(features.KubeletEnsureSecretPulledImages) && m.imagePullManager.MustAttemptImagePull(requestedImage, imageRef, nil, nil)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. BTW, after the feature goes to GA, this if
block can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after the feature goes to GA, this if block can be removed
If this if block is removed, the problem described in #133079 will reappear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after the feature goes to GA, this if block can be removed
If this if block is removed, the problem described in #133079 will reappear.
Then just fine for now. But it seems that this still needs an update once this is Beta or GA.
/lgtm cancel |
e62c461
to
b46d36d
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: atykhyy The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ac91d11
to
edb3a0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should keep MustAttemptImagePull()
the single place that evaluates the policies and performs the decision whether to pull based on the underlying cache state.
We can do that by performing the actual credential pull JIT and caching the result.
This PR is also missing unit tests for the fixed behavior.
pkg/kubelet/images/image_manager.go
Outdated
if err != nil { | ||
return "", err.Error(), err | ||
} | ||
cachedCreds := m.imagePullManager.LookupImagePullCredentials(image, imageRef) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part should not be externalized from MustAttemptImagePull()
to keep a single point for authorization decisions based on the policy and cache credentials.
Instead, I would expect the keyring.Lookup(repoToPull)
to be called JIT inside of MustAttemptImagePull()
.
This way, you would end up having something like this:
type CachedImageCredentials interface {
Get() ([]credentialprovider.TrackedAuthConfig, error)
}
type getCachedImageCredentials func() ([]credentialprovider.TrackedAuthConfig, error)
func (f getCachedCredentials) Get() ([]credentialprovider.TrackedAuthConfig, error) {
return f()
}
func getImageCredentialsForPodFunc(
fullyQualifiedImage string,
pullSecrets []v1.Secret,
podSandboxConfig *runtimeapi.PodSandboxConfig,
) getCachedImageCredentials {
var (
called bool
credsCached []credentialprovider.TrackedAuthConfig
errCached error
)
return func() (creds []credentialprovider.TrackedAuthConfig, err error) {
if called {
return credsCached, errCached
}
defer func() {
credsCached = creds
errCached = err
called = true
}()
// setup the keyrings and do a Lookup(fullyQualifiedImage)
})
}
You can then update the MustAttemptImagePull()
to take CachedCredentials
as an argument.
From image_manager.go
, you then just do:
cachedCredentials := getImageCredentialsForPodFunc(<arguments>)
if len(imageRef) > 0 {
...
m.imagePullManager.MustAttemptImagePull(<arguments>, cachedCredentials)
}
...
credentials, err := cachedCredentials.Get()
if err != nil {
return "", err.Error(), err
}
return m.pullImage(..., credentials, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part should not be externalized from
MustAttemptImagePull()
to keep a single point for authorization decisions based on the policy and cache credentials.
OK, I'll rewrite it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see a reason to make pullmanager
aware of []credentialprovider.TrackedAuthConfig
so I sliced it in a slightly different way.
pkg/kubelet/images/image_manager.go
Outdated
@@ -257,6 +245,37 @@ func (m *imageManager) EnsureImageExists(ctx context.Context, objRef *v1.ObjectR | |||
return m.pullImage(ctx, logPrefix, objRef, pod.UID, requestedImage, spec, pullCredentials, podSandboxConfig) | |||
} | |||
|
|||
func (m *imageManager) getPullCredentials(image string, pod *v1.Pod, pullSecrets []v1.Secret, podSandboxConfig *runtimeapi.PodSandboxConfig) ([]credentialprovider.TrackedAuthConfig, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pod
is unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used for pod.Spec.ServiceAccountName
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, sorry, must have overlooked it
|
||
func (f *PullManager) MustAttemptImagePull(image, imageRef string, cachedCreds kubeletconfiginternal.ImagePullCredentials, podSecrets []kubeletconfiginternal.ImagePullSecret, podServiceAccount *kubeletconfiginternal.ImagePullServiceAccount) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below cachedCreds.NodePodsAccessible
block is where I'd expect the JIT call to CachedCredentials.Get()
@@ -52,6 +52,11 @@ type ImagePullManager interface { | |||
// | |||
// `image` is the content of the pod's container `image` field. | |||
RecordImagePullFailed(image string) | |||
// LookupImagePullCredentials gets cached image pull credentials matching `image`, | |||
// and evaluates the part of the policy which does not depend on credentials. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// and evaluates the part of the policy which does not depend on credentials.
A function called "Lookup" shouldn't do any kinds of policy evaluations. That's another reason why I'd rather have a JIT credential call fetch built inside of MustAttemptImagePull
@@ -62,7 +67,7 @@ type ImagePullManager interface { | |||
// was found in the cache. | |||
// | |||
// `image` is the content of the pod's container `image` field. | |||
MustAttemptImagePull(image, imageRef string, credentials []kubeletconfiginternal.ImagePullSecret, serviceAccount *kubeletconfiginternal.ImagePullServiceAccount) bool | |||
MustAttemptImagePull(image, imageRef string, cachedCreds kubeletconfiginternal.ImagePullCredentials, secrets []kubeletconfiginternal.ImagePullSecret, serviceAccount *kubeletconfiginternal.ImagePullServiceAccount) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll probably have to add an error to the return of MustAttemptImagePull()
for the cases where initializing the keyrings fails as we would ideally want the caller to short-circuit.
965d77d
to
4f00f5c
Compare
pkg/kubelet/images/image_manager.go
Outdated
m.logIt(objRef, v1.EventTypeNormal, events.PulledImage, logPrefix, msg, klog.Info) | ||
return imageRef, msg, nil | ||
} | ||
lookupPullCredentials := sync.OnceValues(func() ([]credentialprovider.TrackedAuthConfig, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using sync.OnceValues
here like this would likely result in a continuous memory growth, which is why I avoided it in my original suggestion. I was thinking about it too in the very beginning though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or maybe not? Would sync.OnceValues
ever be garbage collected? Tbh I am not sure at this point.
Either way, I think we can extract lookupPullCredentials
from here to its own function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
sync.OnceValues
here like this would likely result in a continuous memory growth,
Why? Wouldn't that be a bug in Go runtime if sync.Once
leaked memory? The getPodCredentials
closure isn't stored in any struct. Looking at the source code, I see nothing that would prevent the Once
object closed over by OnceValues
from getting garbage collected once EnsureImageExists()
returns. I tried a simple test that mimics this use of OnceValues
and everything seems to get cleaned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract
lookupPullCredentials
I.e. so that that place looks like this?
lookupPullCredentials := sync.OnceValues(func() ([]credentialprovider.TrackedAuthConfig, error) {
return m.lookupPullCredentials(spec.Image, pod, podSandboxConfig, pullSecrets, externalCredentialProviderKeyring)
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Wouldn't that be a bug in Go runtime if sync.Once leaked memory?
I guess so. sync.Once
is generally used a bit differently, even throughout the codebase here, which is what throws me off. I looked at the source, too, aside from the atomic and _nocopy, nothing appears to be that much out of ordinary. I suppose it's safe?
As for the lookup credentials, something more like
func lookupPullCredentialsFunc(<args>) (func() ([]credentialprovider.TrackedAuthConfig, error)) {
return sync.OnceValues(func() ([]credentialprovider.TrackedAuthConfig, error) {
// lookup code here
})
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it's safe?
I believe so. sync.OnceFunc()
and friends are old and established, if they leaked memory in regular use it would have been caught a decade ago, and their return values being closures users can do nothing bad with them (like copying a sync.Once
or a mutex by value).
As for the lookup credentials, something more like
I see what you mean, but what does that achieve? Do you plan to use this new function elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just for clarity, makes the code less cluttered.
edc594b
to
01a2cc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments, otherwise lgtm
@@ -736,7 +741,9 @@ type mockImagePullManagerWithTracking struct { | |||
recordedCredentials *kubeletconfiginternal.ImagePullCredentials | |||
} | |||
|
|||
func (m *mockImagePullManagerWithTracking) MustAttemptImagePull(image, imageRef string, podSecrets []kubeletconfiginternal.ImagePullSecret, podServiceAccount *kubeletconfiginternal.ImagePullServiceAccount) bool { | |||
func (m *mockImagePullManagerWithTracking) MustAttemptImagePull(image, imageRef string, getPodCredentials pullmanager.GetPodCredentials) (bool, error) { | |||
podSecrets, podServiceAccount, _ := getPodCredentials() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move right above m.lastSecrets
, return error if it occured
@@ -435,6 +435,7 @@ func TestFileBasedImagePullManager_MustAttemptImagePull(t *testing.T) { | |||
pullingFiles []string | |||
expectedPullRecord *kubeletconfiginternal.ImagePulledRecord | |||
want bool | |||
wantPodCred bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wantPodCredsRequested
would be a more fitting name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but then there would be a lot of spurious diffs due to go fmt
whitespace changes. I prefer not to pollute git blame
if I can help it. But if it's okay then sure I'll change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It's true it was easier to review this way, but feel free to change the name now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR went one more size up :|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea, wouldn't it be better to disregard whitespace changes when computing PR size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea, wouldn't it be better to disregard whitespace changes when computing PR size?
I can see how that would be useful in most if not all cases, yes
Some decisions about image pull credential verification can be applied without reference to image pull credentials: e.g. if the policy is NeverVerify, or if it is NeverVerifyPreloadedImages and the image is preloaded, or if is NeverVerifyAllowListedImages and the image is white-listed. In these cases, there is no need to look up credentials.
01a2cc3
to
f920aea
Compare
/lgtm |
LGTM label has been added. Git tree hash: a59a38a0eeabc7071457b229db20b3873c24d026
|
@atykhyy: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
return m.pullImage(ctx, logPrefix, objRef, pod.UID, requestedImage, spec, pullCredentials, podSandboxConfig) | ||
} | ||
|
||
func (m *imageManager) makeLookupPullCredentialsFunc(image string, pod *v1.Pod, pullSecrets []v1.Secret, podSandboxConfig *runtimeapi.PodSandboxConfig) func() ([]credentialprovider.TrackedAuthConfig, error) { | ||
return sync.OnceValues(func() ([]credentialprovider.TrackedAuthConfig, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to wrap it in OnceValues? It is only called once anyways as far as I can tell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the returned lookup function might be called twice in some cases:
- JIT during the
MustAttemptImagePull()
- just before calling
m.pullImage()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my question was mostly because I do not know exact behavior of sync.OnceValues
. Will it retry if the first attempt resulted in err
? Or we want to cache the error. So I thought it is easier just not have it =).
I will need to review the sequence you suggested. I plan to take it again later during 1.35, please ping if you think it is critical to do earleir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it retry if the first attempt resulted in
err
?
No. It caches both values.
Or we want to cache the error
An error in that place shows that something is seriously misconfigured, so it is unlikely that retrying makes sense. A mere failure to lookup credentials from the keyring does not return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A mere failure to lookup credentials from the keyring does not return an error.
Not because of this PR, I mean. The relevant method's signature does not include error
.
// pkg/credentialprovider/keyring.go
type DockerKeyring interface {
Lookup(image string) ([]TrackedAuthConfig, bool)
}
I plan to take it again later during 1.35, please ping if you think it is critical to do earleir
I think this change should be in place when KubeletEnsureSecretPulledImages
feature gate goes GA and default to enabled, because at that point it will become enabled by some managed (cloud) Kubernetes providers with no option to disable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An error in that place shows that something is seriously misconfigured, so it is unlikely that retrying makes sense. A mere failure to lookup credentials from the keyring does not return an error.
Besides, I would like to make sure the result of the credential lookup matches what the authz check received in the context of a single EnsureImageExists()
call.
I will need to review the sequence you suggested. I plan to take it again later during 1.35, please ping if you think it is critical to do earlier
This likely prevents us from moving the feature to Beta. It's not critical right now, but let's please make sure we pay attention to this PR early in 1.35.
@stlaz in the KEP secretly pulled images - when it says "reverification" of secrets, is it really a re-verification that secrets are valid or that they are the same as was used to pull? Changing the sequence like this changes the assumption on whether secrets should be there. I wanted to ensure that this is exactly what is expected. |
} | ||
|
||
if imageRef != "" { | ||
if !utilfeature.DefaultFeatureGate.Enabled(features.KubeletEnsureSecretPulledImages) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keeping this code block where it was before will simlpify the review. Took me some time to understand what happened here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what the code originally looked like before #133079. I'd rather it stayed here.
It verifies the pod has access to the coordinates of a secret previously used to pull successfully, OR access to identical content (hashed) of a credential previously used to pull successfully. |
Ok, it means we definitely do not need to pull these creds if the setting is to not verify. Thank you |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Some decisions about image pull credential verification can be applied without reference to image pull credentials: if the policy is NeverVerify, or if it is NeverVerifyPreloadedImages and the image is preloaded, or if is NeverVerifyAllowListedImages and the image is white-listed. In these cases, there is no need to look up credentials.
Which issue(s) this PR is related to:
Related PR: #133079
Special notes for your reviewer:
Does this PR introduce a user-facing change?
No.
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: