Skip to content

authentication: when cert verification fails store details in audit log #130540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

vrutkovs
Copy link
Contributor

@vrutkovs vrutkovs commented Mar 3, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

In some cases finding the invalid cert by its serial number is tedious. However we can't expose more PII in kube-apiserver logs.

This commit will add an audit log annotations with additional details: common name, issuer common name and serial number

Alternative solution to #130137

Which issue(s) this PR fixes:

Fixes #129600

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Audit log entry for a request failed due to invalid certificate now has additional information - namely certificate's Common Name, Issuer Common Name and Serial number

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 3, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @vrutkovs. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot requested review from dims and liggitt March 3, 2025 17:21
@k8s-ci-robot k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels Mar 3, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 3, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Auth Mar 3, 2025
@vrutkovs
Copy link
Contributor Author

vrutkovs commented Mar 3, 2025

/cc @enj @stlaz

@k8s-ci-robot k8s-ci-robot requested review from enj and stlaz March 3, 2025 17:24
@yongruilin
Copy link
Contributor

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Mar 6, 2025
@benjaminapetersen
Copy link
Member

/label api-review

And @vrutkovs mind adding an integration test? Thx!

@k8s-ci-robot k8s-ci-robot added the api-review Categorizes an issue or PR as actively needing an API review. label Mar 17, 2025
@benjaminapetersen benjaminapetersen moved this from Needs Triage to Changes Requested in SIG Auth Mar 17, 2025
Copy link
Member

@enj enj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass

@@ -184,6 +194,14 @@ func (a *Authenticator) AuthenticateRequest(req *http.Request) (*authenticator.R
clientCertificateExpirationHistogram.WithContext(req.Context()).Observe(remaining.Seconds())
chains, err := req.TLS.PeerCertificates[0].Verify(optsCopy)
if err != nil {
ctx := req.Context()
audit.AddAuditAnnotation(ctx,
"authentication.k8s.io/certificate-error",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that needs API review.

@@ -184,6 +194,14 @@ func (a *Authenticator) AuthenticateRequest(req *http.Request) (*authenticator.R
clientCertificateExpirationHistogram.WithContext(req.Context()).Observe(remaining.Seconds())
chains, err := req.TLS.PeerCertificates[0].Verify(optsCopy)
if err != nil {
ctx := req.Context()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean - we need this context so that AddAuditAnnotation would store annotations in it

@@ -184,6 +194,14 @@ func (a *Authenticator) AuthenticateRequest(req *http.Request) (*authenticator.R
clientCertificateExpirationHistogram.WithContext(req.Context()).Observe(remaining.Seconds())
chains, err := req.TLS.PeerCertificates[0].Verify(optsCopy)
if err != nil {
ctx := req.Context()
audit.AddAuditAnnotation(ctx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need an integration test to show that all the wiring actually works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need rate limiting on this to prevent audit log spam if a broken client is hot looping requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not issuing a new audit log event, this merely updates the existing one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that, you are still increasing the size of audit events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we can truncate CN and Issuer CN in GetHumanCertDetail to avoid audit log growing too large, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why are we concerned about the size of audit log event - users can enable audit-log-truncate-enabled and tweak it with audit-log-truncate-max-event-size setting.

if a client makes a request which contains an unacceptable client-cert AND a valid bearer token, will this check the cert, fail, audit-log the cert details, then validate the token, authenticate the request, and result in an audit event with authentication.k8s.io/certificate-error annotations that actually succeeded?

Yes, iiuc unionAuthRequestHandler will pass it to the next handler. I think its desirable as it the only way I can think of to notify admins that certificate has expired / no longer valid

I don't have a firm handle on the exposure implications of logging issuer / subject for failed requests

Note that "logging" is a good word here - we're not storing certificate details in kube-apiserver logs, we're keeping them alongside other important private info - source IPs, query args for requests, sometimes even entire request / responses etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why are we concerned about the size of audit log event - users can enable audit-log-truncate-enabled and tweak it with audit-log-truncate-max-event-size setting.

The annotations field in the audit event says values should be short, and we pay serialization cost for excessively large values even if we then truncate them. If this PR is intended to assist admins in finding invalid certificates, let's just pick a reasonable bound to prevent abuse and truncate the issuer and subject CN.

Yes, iiuc unionAuthRequestHandler will pass it to the next handler. I think its desirable as it the only way I can think of to notify admins that certificate has expired / no longer valid

If the request is actually failing as a result, including the issuer/subject CN of a presented client cert is probably ok? If the request actually succeeded, it's pretty weird to do. I'd still like feedback from the folks mentioned in https://github.com/kubernetes/kubernetes/pull/130540/files#r2058912959 on how concerning it is to log the issuer / subject of every unknown cert we encounter

Copy link
Contributor Author

@vrutkovs vrutkovs May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My calculations for maximum length of this string: max length(common name) + len("usages, serving") + len(" groups[]") + max length(organization) + len(" validServingFor=[]") + 3 * max length(domain defined attribute value length for IP) + 3 * max length(domain defined attribute value length for DNS names) + len(" issuer=") + len(" ( to (now=))") + 3*timestamp based on RFC3280 assuming that the certificate has 3 IPs and 3 DNS names. 64 + 16 + 10 + 64 + 20 + 384 + 384 + 9 + 15 + 90 makes 1056, so it should be safe to truncate the annotation to 1024 (we may cut part of current time, which isn't crucial since audit event will have the timestamp anyway). WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably don't need anything but the issuer and subject common names for identifying the certificate, the SANs / groups / timestamps seem less relevant for figuring out what client has the failing cert

Copy link
Contributor Author

@vrutkovs vrutkovs May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, I started with that in 7d29cf0 and then Mo recommended reusing GetHumanCertDetail. I'll update it to use custom function and limit it to 512 to fit serial number and two common names

@@ -110,6 +111,15 @@ func certificateIdentifier(c *x509.Certificate) string {
)
}

func certificateErrorIdentifier(c *x509.Certificate) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetHumanCertDetail might be better, though it could make the audit log large?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps GetPrivateCertDetail to underline that it contains sensitive data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetHumanCertDetail is an existing function in k8s.io/apiserver/pkg/server/dynamiccertificates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, got it. Using it now - although I had to update it to truncate current time to seconds to pass unit tests (not before and not after are already being displayed like that now)

@vrutkovs vrutkovs force-pushed the audit-log-annotation-for-invalid-certificates branch from 7d29cf0 to b09fa5a Compare March 18, 2025 08:36
@k8s-ci-robot k8s-ci-robot removed the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 18, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 18, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vrutkovs
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/test sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Mar 18, 2025
@vrutkovs vrutkovs force-pushed the audit-log-annotation-for-invalid-certificates branch from b09fa5a to a50d1d6 Compare March 18, 2025 08:48
@Jefftree
Copy link
Member

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Mar 18, 2025
@liggitt liggitt moved this to Assigned in API Reviews Apr 10, 2025
@liggitt liggitt moved this from Assigned to Changes requested in API Reviews Apr 24, 2025
@liggitt liggitt moved this from Changes requested to In progress in API Reviews Apr 24, 2025
@liggitt liggitt moved this from In progress to Changes requested in API Reviews May 8, 2025
@vrutkovs vrutkovs force-pushed the audit-log-annotation-for-invalid-certificates branch from a50d1d6 to 9b3ed9d Compare May 13, 2025 16:02
@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label May 13, 2025
In some cases finding the invalid cert by its serial number is tedious.
However we can't expose more PII in kube-apiserver logs.

This commit will add an audit log annotations with additional details:
common name, issuer common name and serial number
@vrutkovs vrutkovs force-pushed the audit-log-annotation-for-invalid-certificates branch from 9b3ed9d to b4713f3 Compare May 13, 2025 16:04
@jpbetz
Copy link
Contributor

jpbetz commented May 13, 2025

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 13, 2025
@liggitt liggitt moved this from Changes requested to In progress in API Reviews May 22, 2025
@liggitt liggitt moved this from In progress to Assigned in API Reviews Jul 17, 2025
@liggitt liggitt moved this from Assigned to Backlog in API Reviews Jul 22, 2025
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. area/apiserver area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Backlog
Status: Changes Requested
Development

Successfully merging this pull request may close these issues.

Certificate info in expire logs
9 participants