Skip to content

authentication: when cert verification fails store details in audit log #130540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ import (
asn1util "k8s.io/apimachinery/pkg/apis/asn1"
utilerrors "k8s.io/apimachinery/pkg/util/errors"
"k8s.io/apimachinery/pkg/util/sets"
"k8s.io/apiserver/pkg/audit"
"k8s.io/apiserver/pkg/authentication/authenticator"
"k8s.io/apiserver/pkg/authentication/user"
"k8s.io/apiserver/pkg/features"
Expand All @@ -38,6 +39,10 @@ import (
"k8s.io/component-base/metrics/legacyregistry"
)

const (
CertificateErrorAuditAnnotation = "authentication.k8s.io/certificate-error"
)

/*
* By default, the following metric is defined as falling under
* ALPHA stability level https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/1209-metrics-stability/kubernetes-control-plane-metrics-stability.md#stability-classes)
Expand Down Expand Up @@ -134,6 +139,15 @@ func NewDynamic(verifyOptionsFn VerifyOptionFunc, user UserConversion) *Authenti
return &Authenticator{verifyOptionsFn, user}
}

func certificateErrorIdentifier(c *x509.Certificate) string {
return fmt.Sprintf(
"CN=%s, Issuer=%s, SN=%d",
c.Subject.CommonName,
c.Issuer.CommonName,
c.SerialNumber,
)[:512]
}

// AuthenticateRequest authenticates the request using presented client certificates
func (a *Authenticator) AuthenticateRequest(req *http.Request) (*authenticator.Response, bool, error) {
if req.TLS == nil || len(req.TLS.PeerCertificates) == 0 {
Expand Down Expand Up @@ -184,6 +198,14 @@ func (a *Authenticator) AuthenticateRequest(req *http.Request) (*authenticator.R
clientCertificateExpirationHistogram.WithContext(req.Context()).Observe(remaining.Seconds())
chains, err := req.TLS.PeerCertificates[0].Verify(optsCopy)
if err != nil {
ctx := req.Context()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean - we need this context so that AddAuditAnnotation would store annotations in it

audit.AddAuditAnnotation(ctx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need an integration test to show that all the wiring actually works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need rate limiting on this to prevent audit log spam if a broken client is hot looping requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not issuing a new audit log event, this merely updates the existing one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that, you are still increasing the size of audit events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we can truncate CN and Issuer CN in GetHumanCertDetail to avoid audit log growing too large, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why are we concerned about the size of audit log event - users can enable audit-log-truncate-enabled and tweak it with audit-log-truncate-max-event-size setting.

if a client makes a request which contains an unacceptable client-cert AND a valid bearer token, will this check the cert, fail, audit-log the cert details, then validate the token, authenticate the request, and result in an audit event with authentication.k8s.io/certificate-error annotations that actually succeeded?

Yes, iiuc unionAuthRequestHandler will pass it to the next handler. I think its desirable as it the only way I can think of to notify admins that certificate has expired / no longer valid

I don't have a firm handle on the exposure implications of logging issuer / subject for failed requests

Note that "logging" is a good word here - we're not storing certificate details in kube-apiserver logs, we're keeping them alongside other important private info - source IPs, query args for requests, sometimes even entire request / responses etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why are we concerned about the size of audit log event - users can enable audit-log-truncate-enabled and tweak it with audit-log-truncate-max-event-size setting.

The annotations field in the audit event says values should be short, and we pay serialization cost for excessively large values even if we then truncate them. If this PR is intended to assist admins in finding invalid certificates, let's just pick a reasonable bound to prevent abuse and truncate the issuer and subject CN.

Yes, iiuc unionAuthRequestHandler will pass it to the next handler. I think its desirable as it the only way I can think of to notify admins that certificate has expired / no longer valid

If the request is actually failing as a result, including the issuer/subject CN of a presented client cert is probably ok? If the request actually succeeded, it's pretty weird to do. I'd still like feedback from the folks mentioned in https://github.com/kubernetes/kubernetes/pull/130540/files#r2058912959 on how concerning it is to log the issuer / subject of every unknown cert we encounter

Copy link
Contributor Author

@vrutkovs vrutkovs May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My calculations for maximum length of this string: max length(common name) + len("usages, serving") + len(" groups[]") + max length(organization) + len(" validServingFor=[]") + 3 * max length(domain defined attribute value length for IP) + 3 * max length(domain defined attribute value length for DNS names) + len(" issuer=") + len(" ( to (now=))") + 3*timestamp based on RFC3280 assuming that the certificate has 3 IPs and 3 DNS names. 64 + 16 + 10 + 64 + 20 + 384 + 384 + 9 + 15 + 90 makes 1056, so it should be safe to truncate the annotation to 1024 (we may cut part of current time, which isn't crucial since audit event will have the timestamp anyway). WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably don't need anything but the issuer and subject common names for identifying the certificate, the SANs / groups / timestamps seem less relevant for figuring out what client has the failing cert

Copy link
Contributor Author

@vrutkovs vrutkovs May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, I started with that in 7d29cf0 and then Mo recommended reusing GetHumanCertDetail. I'll update it to use custom function and limit it to 512 to fit serial number and two common names

CertificateErrorAuditAnnotation,
fmt.Sprintf("certificate %s failed: %v", certificateErrorIdentifier(req.TLS.PeerCertificates[0]),
err),
)
req = req.WithContext(ctx)

return nil, false, fmt.Errorf(
"verifying certificate %s failed: %w",
certificateIdentifier(req.TLS.PeerCertificates[0]),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,5 +62,5 @@ func GetHumanCertDetail(certificate *x509.Certificate) string {
}

return fmt.Sprintf("%q [%s]%s%s issuer=%q (%v to %v (now=%v))", humanName, strings.Join(usages, ","), groupString, servingString, signerHumanName, certificate.NotBefore.UTC(), certificate.NotAfter.UTC(),
time.Now().UTC())
time.Now().Truncate(time.Second).UTC())
}
150 changes: 150 additions & 0 deletions test/integration/auth/certs_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
package auth

import (
"context"
"crypto/ecdsa"
"crypto/elliptic"
"crypto/rand"
"crypto/x509"
"crypto/x509/pkix"
"encoding/pem"
"fmt"
"math/big"
"os"
"testing"
"time"

metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
auditinternal "k8s.io/apiserver/pkg/apis/audit"
auditv1 "k8s.io/apiserver/pkg/apis/audit/v1"
kubex509 "k8s.io/apiserver/pkg/authentication/request/x509"

clientset "k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
utiltesting "k8s.io/client-go/util/testing"
kubeapiservertesting "k8s.io/kubernetes/cmd/kube-apiserver/app/testing"
"k8s.io/kubernetes/test/integration/framework"
"k8s.io/kubernetes/test/utils"
)

var (
auditPolicy = `
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Request
resources:
- group: ""
resources: ["pods"]
verbs: ["get"]
`
)

func TestCerts(t *testing.T) {
logFile, err := os.CreateTemp("", "audit.log")
if err != nil {
t.Fatalf("Failed to create audit log file: %v", err)
}
defer utiltesting.CloseAndRemove(t, logFile)

policyFile, err := os.CreateTemp("", "audit-policy.yaml")
if err != nil {
t.Fatalf("Failed to create audit policy file: %v", err)
}
if _, err := policyFile.Write([]byte(auditPolicy)); err != nil {
t.Fatalf("Failed to write audit policy file: %v", err)
}

s := kubeapiservertesting.StartTestServerOrDie(t, kubeapiservertesting.NewDefaultTestServerOptions(), []string{
"--audit-policy-file", policyFile.Name(),
"--audit-log-version", "audit.k8s.io/v1",
"--audit-log-mode", "blocking",
"--audit-log-path", logFile.Name(),
}, framework.SharedEtcd())
defer s.TearDownFn()

// Generate self-signed certificate
key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
if err != nil {
t.Fatal(err)
}
keyRaw, err := x509.MarshalECPrivateKey(key)
if err != nil {
t.Fatal(err)
}
serialNumberLimit := new(big.Int).Lsh(big.NewInt(1), 128)
serialNumber, err := rand.Int(rand.Reader, serialNumberLimit)
if err != nil {
t.Fatal(err)
}
commonName := "test-cn"
notBefore := time.Now().Truncate(time.Second)
notAfter := notBefore.Add(time.Hour)
cert := &x509.Certificate{
SerialNumber: serialNumber,
Subject: pkix.Name{CommonName: commonName},
NotBefore: notBefore,
NotAfter: notAfter,

KeyUsage: x509.KeyUsageKeyEncipherment | x509.KeyUsageDigitalSignature,
ExtKeyUsage: []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth},
BasicConstraintsValid: true,
}
certRaw, err := x509.CreateCertificate(rand.Reader, cert, cert, key.Public(), key)
if err != nil {
t.Fatal(err)
}

// Use self-signed certificate in client config
clientConfig := rest.CopyConfig(s.ClientConfig)
clientConfig.BearerToken = ""
clientConfig.CertData = pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: certRaw})
clientConfig.KeyData = pem.EncodeToMemory(&pem.Block{Type: "PRIVATE KEY", Bytes: keyRaw})
client, err := clientset.NewForConfig(clientConfig)
if err != nil {
t.Fatalf("error occurred: %v", err)
}

// Make a request using the client
podName := "foobar"
currentTime := time.Now().Truncate(time.Second)
_, err = client.CoreV1().Pods("default").Get(context.TODO(), podName, metav1.GetOptions{})
if err == nil {
t.Fatal("expected error, but it was nil")
}

// Verify that audit log has the expected entry
stream, err := os.OpenFile(logFile.Name(), os.O_RDWR, 0600)
if err != nil {
t.Errorf("unexpected error: %v", err)
}
defer stream.Close()

annotationDetails := fmt.Sprintf(`certificate "%s" [client] issuer="<self>" (%s to %s (now=%s)) failed: x509: certificate signed by unknown authority`, commonName, notBefore.UTC(), notAfter.UTC(), currentTime.UTC())
expectedEvents := []utils.AuditEvent{
{
Level: auditinternal.LevelRequest,
Stage: auditinternal.StageResponseStarted,
RequestURI: fmt.Sprintf("/api/v1/namespaces/default/pods/%s", podName),
Verb: "get",
Resource: "pods",
Namespace: "default",
Code: 401,
CustomAuditAnnotations: map[string]string{
kubex509.CertificateErrorAuditAnnotation: annotationDetails,
},
},
}

auditAnnotationFilter := func(key, val string) bool {
return key == kubex509.CertificateErrorAuditAnnotation
}

missing, err := utils.CheckAuditLinesFiltered(stream, expectedEvents, auditv1.SchemeGroupVersion, auditAnnotationFilter)
if err != nil {
t.Errorf("unexpected error checking audit lines: %v", err)
}
if len(missing.MissingEvents) > 0 {
t.Errorf("failed to get expected events -- missing: %s", missing)
}
}