Skip to content

client-go auth: contextual logging #129336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Dec 20, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Part of #129125

Special notes for your reviewer:

No API changes are needed. In one case, a context is passed in via the http.Request. In others there is simply no need to change the calls.

Does this PR introduce a user-facing change?

NONE

/wg structured-logging

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 20, 2024
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels Dec 20, 2024
Comment on lines 108 to 109
//nolint:logcheck // This looks like debug output which shouldn't occur in practice.
klog.V(2).InfoS("unexpected exec plugin return error type", "type", reflect.TypeOf(err).String(), "err", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we update this instead of ignoring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can update to structured logging, but doing contextual logging didn't seem worth the chain of API changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, it is already using structured logging. The suppression is for the "contextual logging" aspect.

I can rewrite as klog.Background().V(2).Info instead of suppressing the linter, but proper conversion (= avoiding the global logger) probably isn't worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another look at how this code here could get connected to a context of its caller. I got as far as this, which is incomplete:

diff --git a/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/exec.go b/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/exec.go
index 652e59a2db4..f1441d82f49 100644
--- a/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/exec.go
+++ b/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/exec.go
@@ -18,6 +18,7 @@ package exec
 
 import (
 	"bytes"
+	"context"
 	"crypto/tls"
 	"crypto/x509"
 	"errors"
@@ -206,7 +207,10 @@ func newAuthenticator(c *cache, isTerminalFunc func(int) bool, config *api.ExecC
 
 	// these functions are made comparable and stored in the cache so that repeated clientset
 	// construction with the same rest.Config results in a single TLS cache and Authenticator
-	a.getCert = &transport.GetCertHolder{GetCert: a.cert}
+	a.getCert = &transport.GetCertHolder{GetCert: func() (*tls.Certificate, error) {
+		// TODO (?): change or extend the c.TLS.GetCertHolder API
+		return a.cert(context.TODO())
+	}}
 	a.dial = &transport.DialHolder{Dial: defaultDialer.DialContext}
 
 	return c.put(key, a), nil
@@ -343,7 +347,7 @@ func (r *roundTripper) RoundTrip(req *http.Request) (*http.Response, error) {
 	}
 
 	ctx := req.Context()
-	creds, err := r.a.getCreds()
+	creds, err := r.a.getCreds(ctx)
 	if err != nil {
 		return nil, fmt.Errorf("getting credentials: %v", err)
 	}
@@ -356,7 +360,7 @@ func (r *roundTripper) RoundTrip(req *http.Request) (*http.Response, error) {
 		return nil, err
 	}
 	if res.StatusCode == http.StatusUnauthorized {
-		if err := r.a.maybeRefreshCreds(creds); err != nil {
+		if err := r.a.maybeRefreshCreds(ctx, creds); err != nil {
 			klog.FromContext(ctx).Error(err, "Refreshing credentials failed")
 		}
 	}
@@ -370,15 +374,15 @@ func (a *Authenticator) credsExpired() bool {
 	return a.now().After(a.exp)
 }
 
-func (a *Authenticator) cert() (*tls.Certificate, error) {
-	creds, err := a.getCreds()
+func (a *Authenticator) cert(ctx context.Context) (*tls.Certificate, error) {
+	creds, err := a.getCreds(ctx)
 	if err != nil {
 		return nil, err
 	}
 	return creds.cert, nil
 }
 
-func (a *Authenticator) getCreds() (*credentials, error) {
+func (a *Authenticator) getCreds(ctx context.Context) (*credentials, error) {
 	a.mu.Lock()
 	defer a.mu.Unlock()
 
@@ -386,7 +390,7 @@ func (a *Authenticator) getCreds() (*credentials, error) {
 		return a.cachedCreds, nil
 	}
 
-	if err := a.refreshCredsLocked(); err != nil {
+	if err := a.refreshCredsLocked(ctx); err != nil {
 		return nil, err
 	}
 
@@ -395,7 +399,7 @@ func (a *Authenticator) getCreds() (*credentials, error) {
 
 // maybeRefreshCreds executes the plugin to force a rotation of the
 // credentials, unless they were rotated already.
-func (a *Authenticator) maybeRefreshCreds(creds *credentials) error {
+func (a *Authenticator) maybeRefreshCreds(ctx context.Context, creds *credentials) error {
 	a.mu.Lock()
 	defer a.mu.Unlock()
 
@@ -406,12 +410,12 @@ func (a *Authenticator) maybeRefreshCreds(creds *credentials) error {
 		return nil
 	}
 
-	return a.refreshCredsLocked()
+	return a.refreshCredsLocked(ctx)
 }
 
 // refreshCredsLocked executes the plugin and reads the credentials from
 // stdout. It must be called while holding the Authenticator's mutex.
-func (a *Authenticator) refreshCredsLocked() error {
+func (a *Authenticator) refreshCredsLocked(ctx context.Context) error {
 	interactive, err := a.interactiveFunc()
 	if err != nil {
 		return fmt.Errorf("exec plugin cannot support interactive mode: %w", err)
@@ -443,7 +447,7 @@ func (a *Authenticator) refreshCredsLocked() error {
 	}
 
 	err = cmd.Run()
-	incrementCallsMetric(err)
+	incrementCallsMetric(ctx, err)
 	if err != nil {
 		return a.wrapCmdRunErrorLocked(err)
 	}
diff --git a/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/exec_test.go b/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/exec_test.go
index 291590eca86..606037b9496 100644
--- a/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/exec_test.go
+++ b/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/exec_test.go
@@ -42,6 +42,7 @@ import (
 	"k8s.io/client-go/pkg/apis/clientauthentication"
 	"k8s.io/client-go/tools/clientcmd/api"
 	"k8s.io/client-go/transport"
+	"k8s.io/klog/v2/ktesting"
 	testingclock "k8s.io/utils/clock/testing"
 )
 
@@ -790,6 +791,7 @@ func TestRefreshCreds(t *testing.T) {
 
 	for _, test := range tests {
 		t.Run(test.name, func(t *testing.T) {
+			_, ctx := ktesting.NewTestContext(t)
 			c := test.config
 
 			if c.Command == "" {
@@ -813,7 +815,7 @@ func TestRefreshCreds(t *testing.T) {
 			a.stderr = stderr
 			a.environ = func() []string { return nil }
 
-			if err := a.refreshCredsLocked(); err != nil {
+			if err := a.refreshCredsLocked(ctx); err != nil {
 				if !test.wantErr {
 					t.Errorf("get token %v", err)
 				} else if !strings.Contains(err.Error(), test.wantErrSubstr) {
@@ -1190,6 +1192,7 @@ func TestInstallHintRateLimit(t *testing.T) {
 
 	for _, test := range tests {
 		t.Run(test.name, func(t *testing.T) {
+			_, ctx := ktesting.NewTestContext(t)
 			c := api.ExecConfig{
 				Command:         "does not exist",
 				APIVersion:      "client.authentication.k8s.io/v1beta1",
@@ -1209,7 +1212,7 @@ func TestInstallHintRateLimit(t *testing.T) {
 
 			count := 0
 			for i := 0; i < test.calls; i++ {
-				err := a.refreshCredsLocked()
+				err := a.refreshCredsLocked(ctx)
 				if strings.Contains(err.Error(), c.InstallHint) {
 					count++
 				}
diff --git a/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/metrics.go b/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/metrics.go
index d25122f360a..cad7750b098 100644
--- a/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/metrics.go
+++ b/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/metrics.go
@@ -17,6 +17,7 @@ limitations under the License.
 package exec
 
 import (
+	"context"
 	"errors"
 	"io/fs"
 	"os/exec"
@@ -90,7 +91,7 @@ func (c *certificateExpirationTracker) set(a *Authenticator, t time.Time) {
 // incrementCallsMetric increments a global metrics counter for the number of calls to an exec
 // plugin, partitioned by exit code. The provided err should be the return value from
 // exec.Cmd.Run().
-func incrementCallsMetric(err error) {
+func incrementCallsMetric(ctx context.Context, err error) {
 	execExitError := &exec.ExitError{}
 	execError := &exec.Error{}
 	pathError := &fs.PathError{}
@@ -105,8 +106,7 @@ func incrementCallsMetric(err error) {
 		metrics.ExecPluginCalls.Increment(failureExitCode, pluginNotFoundError)
 
 	default: // We don't know about this error type.
-		//nolint:logcheck // This looks like debug output which shouldn't occur in practice.
-		klog.V(2).InfoS("unexpected exec plugin return error type", "type", reflect.TypeOf(err).String(), "err", err)
+		klog.FromContext(ctx).V(2).Info("unexpected exec plugin return error type", "type", reflect.TypeOf(err).String(), "err", err)
 		metrics.ExecPluginCalls.Increment(failureExitCode, clientInternalError)
 	}
 }
diff --git a/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/metrics_test.go b/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/metrics_test.go
index 61360abdb7c..43219795839 100644
--- a/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/metrics_test.go
+++ b/staging/src/k8s.io/client-go/plugin/pkg/client/auth/exec/metrics_test.go
@@ -25,6 +25,7 @@ import (
 	"github.com/google/go-cmp/cmp"
 	"k8s.io/client-go/tools/clientcmd/api"
 	"k8s.io/client-go/tools/metrics"
+	"k8s.io/klog/v2/ktesting"
 )
 
 type mockExpiryGauge struct {
@@ -115,6 +116,7 @@ func (f *mockCallsMetricCounter) Increment(exitCode int, errorType string) {
 }
 
 func TestCallsMetric(t *testing.T) {
+	_, ctx := ktesting.NewTestContext(t)
 	const (
 		goodOutput = `{
 			"kind": "ExecCredential",
@@ -152,7 +154,7 @@ func TestCallsMetric(t *testing.T) {
 		// Run refresh creds twice so that our test validates that the metrics are set correctly twice
 		// in a row with the same authenticator.
 		refreshCreds := func() {
-			if err := a.refreshCredsLocked(); (err == nil) != (exitCode == 0) {
+			if err := a.refreshCredsLocked(ctx); (err == nil) != (exitCode == 0) {
 				if err != nil {
 					t.Fatalf("wanted no error, but got %q", err.Error())
 				} else {
@@ -182,7 +184,7 @@ func TestCallsMetric(t *testing.T) {
 			t.Fatal(err)
 		}
 		a.stderr = io.Discard
-		if err := a.refreshCredsLocked(); err == nil {
+		if err := a.refreshCredsLocked(ctx); err == nil {
 			t.Fatal("expected the authenticator to fail because the plugin does not exist")
 		}
 		wantCallsMetrics = append(wantCallsMetrics, mockCallsMetric{exitCode: 1, errorType: "plugin_not_found_error"})

The problem is that

// Callback that returns a TLS client certificate. CertData, CertFile, KeyData and KeyFile supercede this field.
// This struct indirection is used to make transport configs cacheable.
GetCertHolder *GetCertHolder
is a public API and cannot simply be changed to take a context. Or do we feel that such a breaking change is justified?

Extending it in a backward-compatible manner is difficult because wrapper functions or structs break the caching aspect of GetCertHolder - I tried...

Instead of //nolint:logcheck, a klog.TODO is more appropriate, with a comment explaining that addressing the TODO is difficult and perhaps not worth it. I'll push an update with that.

Comment on lines 129 to 130
//nolint:logcheck // Should not happen.
klog.V(2).Infof("%s auth provider field depricated, refresh request don't send scopes",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can totally happen so we should update it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But only if used incorrectly, right?

Do you prefer structured logging here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structured logging is possible, but contextual logging implies changing the factory API:

// Factory generates an AuthProvider plugin.
//
// clusterAddress is the address of the current cluster.
// config is the initial configuration for this plugin.
// persister allows the plugin to save updated configuration.
type Factory func(clusterAddress string, config map[string]string, persister AuthProviderConfigPersister) (AuthProvider, error)

I'm leaving it as klog.TODO with a comment explaining this.

@pohly
Copy link
Contributor Author

pohly commented Dec 20, 2024

/test pull-kubernetes-e2e-gce

"failed to create fsnotify watcher: too many open files"

pohly added a commit to pohly/kubernetes that referenced this pull request Dec 20, 2024
@fedebongio
Copy link
Contributor

/cc @deads2k
/triage accepted

@k8s-ci-robot k8s-ci-robot requested a review from deads2k January 2, 2025 21:14
@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 2, 2025
@pohly pohly force-pushed the log-client-go-auth branch from a392191 to 2738b12 Compare January 13, 2025 17:13
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 13, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
Once this PR has been reviewed and has the lgtm label, please assign mikedanese for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

No API changes are needed. In one case, a context is passed in via the
http.Request. In others there is simply no need to change the calls.
@pohly pohly force-pushed the log-client-go-auth branch from 2738b12 to e4a9be7 Compare January 13, 2025 19:02
pohly added a commit to pohly/kubernetes that referenced this pull request Jan 13, 2025
@pohly
Copy link
Contributor Author

pohly commented Jan 13, 2025

@enj: did my changes address your comments? Please take another look.

@k8s-ci-robot
Copy link
Contributor

@pohly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-capz-azure-file-vmss e4a9be7 link false /test pull-kubernetes-e2e-capz-azure-file-vmss
pull-kubernetes-e2e-capz-azure-file-windows e4a9be7 link false /test pull-kubernetes-e2e-capz-azure-file-windows
pull-kubernetes-e2e-capz-azure-disk-vmss e4a9be7 link false /test pull-kubernetes-e2e-capz-azure-disk-vmss
pull-kubernetes-e2e-capz-azure-file e4a9be7 link false /test pull-kubernetes-e2e-capz-azure-file

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

pohly added a commit to pohly/kubernetes that referenced this pull request Jan 14, 2025
pohly added a commit to pohly/kubernetes that referenced this pull request Jan 14, 2025
pohly added a commit to pohly/kubernetes that referenced this pull request Jan 20, 2025
pohly added a commit to pohly/kubernetes that referenced this pull request Jan 22, 2025
pohly added a commit to pohly/kubernetes that referenced this pull request Jan 28, 2025
pohly added a commit to pohly/kubernetes that referenced this pull request Feb 7, 2025
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 13, 2025
@pohly
Copy link
Contributor Author

pohly commented Apr 14, 2025

/remove-lifecycle stale

ping @enj

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 14, 2025
pohly added a commit to pohly/kubernetes that referenced this pull request Apr 16, 2025
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 13, 2025
@pohly
Copy link
Contributor Author

pohly commented Jul 14, 2025

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.
Projects
Status: Changes Requested
Development

Successfully merging this pull request may close these issues.

5 participants