-
Notifications
You must be signed in to change notification settings - Fork 41.1k
client-go certificate: context-aware APIs and logging #129338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client-go certificate: context-aware APIs and logging #129338
Conversation
/retest pod scheduling timeouts |
/assign @liggitt |
/assign @stlaz |
@@ -385,29 +402,26 @@ func (m *manager) ServerHealthy() bool { | |||
func (m *manager) Stop() { | |||
m.clientAccessLock.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't need the lock anymore, either - while its description says otherwise, as far as I can tell it was overloaded here only to prevent concurrent access to the stopped
variable. Cancelling a context should be thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Removed.
if m.logf == nil { | ||
m.logf = func(format string, args ...interface{}) { klog.V(2).Infof(format, args...) } | ||
// The name gets included through contextual logging. | ||
logger := klog.Background().V(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, if I wanted to do a debug print further in the code using level 4 as the debug level, with this default logger that's V(2)
, whereas with the logger imported from context that's V(4)
.
The verbose level should likely be explicitly specified everywhere, or we should keep logf
in some shape and form for logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The verbose level should likely be explicitly specified everywhere
Okay. It's a bit more obvious that way.
I didn't do it before because theoretically, a user could have provided a config.Logf
implementation which logged at a verbosity higher than V(2)
. This is not supported anymore because the caller can only make a logger less verbose, but not more verbose (go-logr/logr#335 (comment)).
But I think that's fine.
// we need to map it's cancellation to our Done method. | ||
go func() { | ||
<-ctx.Done() | ||
m.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is sad but seems necessary 🙁
} | ||
close(m.stopCh) | ||
m.stopped = true | ||
m.cancel(errors.New("asked to stop")) | ||
} | ||
|
||
// Start will start the background work of rotating the certificates. | ||
func (m *manager) Start() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we followed the common pattern from Kube controllers, this would be the place where you'd wire in your context.
Would this be an unacceptable API break? (/me is fearfully looking back at 1.18's https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1601-client-go-context)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be an API break, which is why I decided to not do it. It would break existing users, too (just picking one example):
https://github.com/kubevirt/kubevirt/blob/4af3a3b5901e412327cf211ed1446891d49d3183/cmd/virt-exportproxy/virt-exportproxy.go#L87
This is similar to how http.Request was extended in the standard library: a request can optionally be constructed with a context, which then gets stored and used by methods.
if config.Ctx != nil && (*config.Ctx).Done() != nil { | ||
ctx := *config.Ctx | ||
// If we have been passed a context and it has a Done channel, then | ||
// we need to map it's cancellation to our Done method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
// we need to map it's cancellation to our Done method. | |
// we need to map its cancellation to our Done method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
return false | ||
} | ||
} | ||
|
||
return true | ||
} | ||
|
||
func (m *manager) certSatisfiesTemplate() bool { | ||
func (m *manager) certSatisfiesTemplate(ctx context.Context) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only needs the logger, not the full context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
if sleepInterval := deadline.Sub(m.now()); sleepInterval > 0 { | ||
m.logf("%s: Waiting %v for next certificate rotation", m.name, sleepInterval) | ||
logger.Info("Waiting for next certificate rotation", "delay", sleepInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this would help me a bit more when debugging
logger.Info("Waiting for next certificate rotation", "delay", sleepInterval) | |
logger.Info("Waiting for next certificate rotation", "sleep", sleepInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
@@ -606,22 +622,23 @@ func (m *manager) rotateCerts() (bool, error) { | |||
// the template will not trigger a renewal. | |||
// | |||
// Requires certAccessLock to be locked. | |||
func (m *manager) certSatisfiesTemplateLocked() bool { | |||
func (m *manager) certSatisfiesTemplateLocked(ctx context.Context) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only needs the logger, not the full context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. This also rippled through to some other internal methods, which is fine.
@@ -127,7 +144,7 @@ func (s *fileStore) Current() (*tls.Certificate, error) { | |||
if pairFileExists, err := fileExists(pairFile); err != nil { | |||
return nil, err | |||
} else if pairFileExists { | |||
klog.Infof("Loading cert/key pair from %q.", pairFile) | |||
s.logger.Info("Loading cert/key pair", "file", pairFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
s.logger.Info("Loading cert/key pair", "file", pairFile) | |
s.logger.Info("Loading cert/key pair from a file", "filePath", pairFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
@@ -67,14 +68,30 @@ type FileStore interface { | |||
// updates will be written to the ${certDirectory} directory and | |||
// ${certDirectory}/${pairNamePrefix}-current.pem will be created as a soft | |||
// link to the currently selected cert/key pair. | |||
// | |||
// Contextual logging: NewFileStoreWithLogger should be used instead of NewFileStore in code which supports contextual logging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this one not getting deprecated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not all users are required to support contextual logging. For those who don't care, continuing to use NewFileStore
is fine. We don't want to create unnecessary churn in the ecosystem.
4176039
to
e00d62c
Compare
e00d62c
to
6e0fde9
Compare
/test pull-kubernetes-apidiff
That was a conscious choice, because maintaining Logf would make the code more complex and this should break less consumers. |
/test pull-kubernetes-unit
Looks like performance wasn't as expected. Unrelated, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing the comments. Mostly only nits now.
It might be worth checking if we can unit test the Config.Ctx (nil, non-nil) is being passed properly through.
@@ -732,22 +746,22 @@ func (m *manager) generateCSR() (template *x509.CertificateRequest, csrPEM []byt | |||
// Generate a new private key. | |||
privateKey, err := ecdsa.GenerateKey(elliptic.P256(), cryptorand.Reader) | |||
if err != nil { | |||
return nil, nil, nil, nil, fmt.Errorf("%s: unable to generate a new private key: %v", m.name, err) | |||
return nil, nil, nil, nil, fmt.Errorf("unable to generate a new private key: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're fixing these parts, let's also use %w
instead of just %v
for errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but then let's also do it consistently everywhere (there was one additional line that I hadn't touched): https://github.com/kubernetes/kubernetes/compare/5cbf34e8cf258aa9f97b7c5cb8b3747236d6dc3b..f9051901cee8d8ff4aed3db27ff495a706f1a487
FWIW, I am not a fan of this blanket "always wrap errors" linter check because it should be a case-by-case decision ("does the caller need to know about the underlying error?"). In this case, it might actually be useful.
} | ||
der, err := x509.MarshalECPrivateKey(privateKey) | ||
if err != nil { | ||
return nil, nil, nil, nil, fmt.Errorf("%s: unable to marshal the new key to DER: %v", m.name, err) | ||
return nil, nil, nil, nil, fmt.Errorf("unable to marshal the new key to DER: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use %w
instead of just %v
for errors
} | ||
csrPEM, err = cert.MakeCSRFromTemplate(privateKey, template) | ||
if err != nil { | ||
return nil, nil, nil, nil, fmt.Errorf("%s: unable to create a csr from the private key: %v", m.name, err) | ||
return nil, nil, nil, nil, fmt.Errorf("unable to create a csr from the private key: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use %w
instead of just %v
for errors
@@ -38,6 +38,7 @@ import ( | |||
"k8s.io/client-go/kubernetes/fake" | |||
certificatesclient "k8s.io/client-go/kubernetes/typed/certificates/v1beta1" | |||
clienttesting "k8s.io/client-go/testing" | |||
"k8s.io/klog/v2/ktesting" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we're not testing that the Config.Ctx
is actually being successfully passed through since Start()
is historically untested via unit tests. Ot's likely subject to integration tests though.
Is unit testing the Start()
method something that we could now reasonably do, at least in terms of the context and logger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this seems like a gap, also without this PR.
I added two tests, TestStop (no context) and TestContext (with context). Both are similar (start with no certificate, depend on background goroutine to retrieve cert), with TestContext using context cancellation and checking that contextual logging works. Here's its output:
go test -v -run=TestContext
=== RUN TestContext
certificate_manager.go:422: I0228 10:24:54.718214] certificate: Certificate rotation is enabled
certificate_manager.go:566: I0228 10:24:54.718442] certificate: Rotating certificates
reflector.go:430: I0228 10:24:54.719542] certificate: Caches populated type="*v1.CertificateSigningRequest" reflector="k8s.io/client-go/tools/watch/informerwatcher.go:162"
csr.go:270: I0228 10:24:54.719780] certificate: Certificate signing request is issued csr=""
certificate_manager_test.go:1170: Waiting for manager to stop...
--- PASS: TestContext (0.00s)
PASS
ok k8s.io/client-go/util/certificate 0.026s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could have use goleak to check that goroutines spawned by Start
really stopped, but that's ugly. Instead I turned Start
into run
with tracking of worker goroutines. This could be added to the public API, except that extending an interface is also an API break - not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for this. I know the Start()
method wasn't really tested, which is awkward, and so I'm really happy about the test changes you made 👍
/test pull-kubernetes-unit |
6e0fde9
to
5cbf34e
Compare
For NewManager, the Config struct gets changed (not extended!) so that the caller can provide a logger instead of just a logging function. Breaking the API was chosen because it avoids having to maintain two different log calls in various places (one for printf-style logging, one for structured logging). RequestCertificateWithContext is an extension. It enables getting rid of context.TODO calls. NewFileStoreWithLogger also is an extension.
5cbf34e
to
f905190
Compare
@stlaz: updated. Can I perhaps motivate you to look at some of the other related pending PRs in https://github.com/kubernetes/kubernetes/pulls?q=is%3Aopen+is%3Apr+author%3Apohly+label%3Awg%2Fstructured-logging? 😄 |
@pohly: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/lgtm I'm not sure I'll be able to look into your other PRs today but I'll see what I can do next week 🙂 |
LGTM label has been added. Git tree hash: b074d99f7b3bce024d2a133fccba1d40cab5473a
|
/assign @sttts For approval. Review is done. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pohly, sttts The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Part of #129125
Special notes for your reviewer:
For NewManager, the Config struct gets changed (not extended!) so that the caller can provide a logger instead of just a logging function. Breaking the API was chosen because it avoids having to maintain two different log calls in various places (one for printf-style logging, one for structured logging).
RequestCertificateWithContext is an extension. It enables getting rid of context.TODO calls.
NewFileStoreWithLogger also is an extension.
Does this PR introduce a user-facing change?
/wg structured-logging