Skip to content

client-go certificate: context-aware APIs and logging #129338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 28, 2025

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Dec 20, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Part of #129125

Special notes for your reviewer:

For NewManager, the Config struct gets changed (not extended!) so that the caller can provide a logger instead of just a logging function. Breaking the API was chosen because it avoids having to maintain two different log calls in various places (one for printf-style logging, one for structured logging).

RequestCertificateWithContext is an extension. It enables getting rid of context.TODO calls.

NewFileStoreWithLogger also is an extension.

Does this PR introduce a user-facing change?

NONE

/wg structured-logging

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 20, 2024
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels Dec 20, 2024
@pohly
Copy link
Contributor Author

pohly commented Dec 20, 2024

/retest

pod scheduling timeouts

pohly added a commit to pohly/kubernetes that referenced this pull request Dec 20, 2024
@fedebongio
Copy link
Contributor

/assign @liggitt
Jordan, who could be another good person for reviewing this? Thanks!
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 2, 2025
pohly added a commit to pohly/kubernetes that referenced this pull request Jan 14, 2025
pohly added a commit to pohly/kubernetes that referenced this pull request Jan 14, 2025
pohly added a commit to pohly/kubernetes that referenced this pull request Jan 20, 2025
pohly added a commit to pohly/kubernetes that referenced this pull request Jan 22, 2025
pohly added a commit to pohly/kubernetes that referenced this pull request Jan 28, 2025
pohly added a commit to pohly/kubernetes that referenced this pull request Feb 7, 2025
@ibihim
Copy link
Contributor

ibihim commented Feb 24, 2025

/assign @stlaz

@@ -385,29 +402,26 @@ func (m *manager) ServerHealthy() bool {
func (m *manager) Stop() {
m.clientAccessLock.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need the lock anymore, either - while its description says otherwise, as far as I can tell it was overloaded here only to prevent concurrent access to the stopped variable. Cancelling a context should be thread-safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Removed.

if m.logf == nil {
m.logf = func(format string, args ...interface{}) { klog.V(2).Infof(format, args...) }
// The name gets included through contextual logging.
logger := klog.Background().V(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, if I wanted to do a debug print further in the code using level 4 as the debug level, with this default logger that's V(2), whereas with the logger imported from context that's V(4).

The verbose level should likely be explicitly specified everywhere, or we should keep logf in some shape and form for logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verbose level should likely be explicitly specified everywhere

Okay. It's a bit more obvious that way.

I didn't do it before because theoretically, a user could have provided a config.Logf implementation which logged at a verbosity higher than V(2). This is not supported anymore because the caller can only make a logger less verbose, but not more verbose (go-logr/logr#335 (comment)).

But I think that's fine.

// we need to map it's cancellation to our Done method.
go func() {
<-ctx.Done()
m.Stop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is sad but seems necessary 🙁

}
close(m.stopCh)
m.stopped = true
m.cancel(errors.New("asked to stop"))
}

// Start will start the background work of rotating the certificates.
func (m *manager) Start() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we followed the common pattern from Kube controllers, this would be the place where you'd wire in your context.

Would this be an unacceptable API break? (/me is fearfully looking back at 1.18's https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1601-client-go-context)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be an API break, which is why I decided to not do it. It would break existing users, too (just picking one example):
https://github.com/kubevirt/kubevirt/blob/4af3a3b5901e412327cf211ed1446891d49d3183/cmd/virt-exportproxy/virt-exportproxy.go#L87

This is similar to how http.Request was extended in the standard library: a request can optionally be constructed with a context, which then gets stored and used by methods.

if config.Ctx != nil && (*config.Ctx).Done() != nil {
ctx := *config.Ctx
// If we have been passed a context and it has a Done channel, then
// we need to map it's cancellation to our Done method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Suggested change
// we need to map it's cancellation to our Done method.
// we need to map its cancellation to our Done method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return false
}
}

return true
}

func (m *manager) certSatisfiesTemplate() bool {
func (m *manager) certSatisfiesTemplate(ctx context.Context) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only needs the logger, not the full context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

if sleepInterval := deadline.Sub(m.now()); sleepInterval > 0 {
m.logf("%s: Waiting %v for next certificate rotation", m.name, sleepInterval)
logger.Info("Waiting for next certificate rotation", "delay", sleepInterval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this would help me a bit more when debugging

Suggested change
logger.Info("Waiting for next certificate rotation", "delay", sleepInterval)
logger.Info("Waiting for next certificate rotation", "sleep", sleepInterval)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@@ -606,22 +622,23 @@ func (m *manager) rotateCerts() (bool, error) {
// the template will not trigger a renewal.
//
// Requires certAccessLock to be locked.
func (m *manager) certSatisfiesTemplateLocked() bool {
func (m *manager) certSatisfiesTemplateLocked(ctx context.Context) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only needs the logger, not the full context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. This also rippled through to some other internal methods, which is fine.

@@ -127,7 +144,7 @@ func (s *fileStore) Current() (*tls.Certificate, error) {
if pairFileExists, err := fileExists(pairFile); err != nil {
return nil, err
} else if pairFileExists {
klog.Infof("Loading cert/key pair from %q.", pairFile)
s.logger.Info("Loading cert/key pair", "file", pairFile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
s.logger.Info("Loading cert/key pair", "file", pairFile)
s.logger.Info("Loading cert/key pair from a file", "filePath", pairFile)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@@ -67,14 +68,30 @@ type FileStore interface {
// updates will be written to the ${certDirectory} directory and
// ${certDirectory}/${pairNamePrefix}-current.pem will be created as a soft
// link to the currently selected cert/key pair.
//
// Contextual logging: NewFileStoreWithLogger should be used instead of NewFileStore in code which supports contextual logging.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one not getting deprecated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not all users are required to support contextual logging. For those who don't care, continuing to use NewFileStore is fine. We don't want to create unnecessary churn in the ecosystem.

@pohly pohly force-pushed the log-client-go-certificates branch from 4176039 to e00d62c Compare February 27, 2025 10:35
@pohly
Copy link
Contributor Author

pohly commented Feb 27, 2025

@stlaz: Thanks for the review! I think I addressed all of your comments, please take another look.

@pohly pohly force-pushed the log-client-go-certificates branch from e00d62c to 6e0fde9 Compare February 27, 2025 10:40
@pohly
Copy link
Contributor Author

pohly commented Feb 27, 2025

/test pull-kubernetes-apidiff

Incompatible changes:
- ./util/certificate.Config.Logf: removed

That was a conscious choice, because maintaining Logf would make the code more complex and this should break less consumers.

@pohly
Copy link
Contributor Author

pohly commented Feb 27, 2025

/test pull-kubernetes-unit

    queueset_test.go:382: TestTooWide client 0 last=true expectFair=true margin=0.33 got an Average of 1.3955555555555557 and the expected average was 1.2

Looks like performance wasn't as expected. Unrelated, right?

Copy link
Member

@stlaz stlaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing the comments. Mostly only nits now.

It might be worth checking if we can unit test the Config.Ctx (nil, non-nil) is being passed properly through.

@@ -732,22 +746,22 @@ func (m *manager) generateCSR() (template *x509.CertificateRequest, csrPEM []byt
// Generate a new private key.
privateKey, err := ecdsa.GenerateKey(elliptic.P256(), cryptorand.Reader)
if err != nil {
return nil, nil, nil, nil, fmt.Errorf("%s: unable to generate a new private key: %v", m.name, err)
return nil, nil, nil, nil, fmt.Errorf("unable to generate a new private key: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're fixing these parts, let's also use %w instead of just %v for errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but then let's also do it consistently everywhere (there was one additional line that I hadn't touched): https://github.com/kubernetes/kubernetes/compare/5cbf34e8cf258aa9f97b7c5cb8b3747236d6dc3b..f9051901cee8d8ff4aed3db27ff495a706f1a487

FWIW, I am not a fan of this blanket "always wrap errors" linter check because it should be a case-by-case decision ("does the caller need to know about the underlying error?"). In this case, it might actually be useful.

}
der, err := x509.MarshalECPrivateKey(privateKey)
if err != nil {
return nil, nil, nil, nil, fmt.Errorf("%s: unable to marshal the new key to DER: %v", m.name, err)
return nil, nil, nil, nil, fmt.Errorf("unable to marshal the new key to DER: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use %w instead of just %v for errors

}
csrPEM, err = cert.MakeCSRFromTemplate(privateKey, template)
if err != nil {
return nil, nil, nil, nil, fmt.Errorf("%s: unable to create a csr from the private key: %v", m.name, err)
return nil, nil, nil, nil, fmt.Errorf("unable to create a csr from the private key: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use %w instead of just %v for errors

@@ -38,6 +38,7 @@ import (
"k8s.io/client-go/kubernetes/fake"
certificatesclient "k8s.io/client-go/kubernetes/typed/certificates/v1beta1"
clienttesting "k8s.io/client-go/testing"
"k8s.io/klog/v2/ktesting"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're not testing that the Config.Ctx is actually being successfully passed through since Start() is historically untested via unit tests. Ot's likely subject to integration tests though.

Is unit testing the Start() method something that we could now reasonably do, at least in terms of the context and logger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this seems like a gap, also without this PR.

I added two tests, TestStop (no context) and TestContext (with context). Both are similar (start with no certificate, depend on background goroutine to retrieve cert), with TestContext using context cancellation and checking that contextual logging works. Here's its output:

go test -v -run=TestContext
=== RUN   TestContext
    certificate_manager.go:422: I0228 10:24:54.718214] certificate: Certificate rotation is enabled
    certificate_manager.go:566: I0228 10:24:54.718442] certificate: Rotating certificates
    reflector.go:430: I0228 10:24:54.719542] certificate: Caches populated type="*v1.CertificateSigningRequest" reflector="k8s.io/client-go/tools/watch/informerwatcher.go:162"
    csr.go:270: I0228 10:24:54.719780] certificate: Certificate signing request is issued csr=""
    certificate_manager_test.go:1170: Waiting for manager to stop...
--- PASS: TestContext (0.00s)
PASS
ok  	k8s.io/client-go/util/certificate	0.026s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have use goleak to check that goroutines spawned by Start really stopped, but that's ugly. Instead I turned Start into run with tracking of worker goroutines. This could be added to the public API, except that extending an interface is also an API break - not worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for this. I know the Start() method wasn't really tested, which is awkward, and so I'm really happy about the test changes you made 👍

@stlaz
Copy link
Member

stlaz commented Feb 27, 2025

/test pull-kubernetes-unit
failed to find an available node

@pohly pohly force-pushed the log-client-go-certificates branch from 6e0fde9 to 5cbf34e Compare February 28, 2025 09:27
For NewManager, the Config struct gets changed (not extended!) so that the
caller can provide a logger instead of just a logging function. Breaking the
API was chosen because it avoids having to maintain two different log calls in
various places (one for printf-style logging, one for structured logging).

RequestCertificateWithContext is an extension. It enables getting rid of
context.TODO calls.

NewFileStoreWithLogger also is an extension.
@pohly pohly force-pushed the log-client-go-certificates branch from 5cbf34e to f905190 Compare February 28, 2025 09:37
@pohly
Copy link
Contributor Author

pohly commented Feb 28, 2025

@stlaz: updated.

Can I perhaps motivate you to look at some of the other related pending PRs in https://github.com/kubernetes/kubernetes/pulls?q=is%3Aopen+is%3Apr+author%3Apohly+label%3Awg%2Fstructured-logging? 😄

@k8s-ci-robot
Copy link
Contributor

@pohly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-apidiff 6e0fde9 link false /test pull-kubernetes-apidiff
pull-kubernetes-apidiff-client-go f905190 link false /test pull-kubernetes-apidiff-client-go
pull-kubernetes-linter-hints f905190 link false /test pull-kubernetes-linter-hints

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@stlaz
Copy link
Member

stlaz commented Feb 28, 2025

/lgtm
Happy to see the unit tests improvements.

I'm not sure I'll be able to look into your other PRs today but I'll see what I can do next week 🙂

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 28, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b074d99f7b3bce024d2a133fccba1d40cab5473a

@pohly
Copy link
Contributor Author

pohly commented Feb 28, 2025

/assign @sttts

For approval. Review is done.

@sttts
Copy link
Contributor

sttts commented Feb 28, 2025

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 28, 2025
@k8s-ci-robot k8s-ci-robot merged commit c92dde6 into kubernetes:master Feb 28, 2025
13 of 15 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Feb 28, 2025
@github-project-automation github-project-automation bot moved this from Changes Requested to Closed / Done in SIG Auth Feb 28, 2025
pohly added a commit to pohly/kubernetes that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants