Skip to content

ROX-29232: Validate user-provided dockerConfigs for non-UTF-8 chars #15270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 3, 2025

Conversation

vikin91
Copy link
Contributor

@vikin91 vikin91 commented May 12, 2025

Description

This PR is to prevent GRPC marshaling and unmarshaling errors (leading to crash of the GRPC connection) when data modeled as string in the proto contain non-UTF-8 characters.

Note: this is a very limited fix, as it only validates selected types of data provided from the user. It is still possible that other not yet validated user-provided inputs may lead to grpc crash due to non-UTF-8 characters.

Update ⚠️

This PR won't help with the problem. Here is what I know so far:

  1. It is impossible to read a non-UTF-8 characters from the orchestrator (from a secret, configmap, annotation etc) and crash the grpc connection. Reading it from k8s makes it non-lethal, as most probably the non-utf-8 characters are fixed in the process of reading or writing to the secret.
  2. It is possible to crash grpc connection with a Marshaling error when Sensor is sending to Central only when:
    1. We use version 4.6 or earlier (due to the codec change in 4.7)
    2. We craft a broken message containing non-utf-8 in the code of Sensor
  3. It is possible to crash grpc connection with a Unmarshaling error when Central is receiving only when:
    1. On any version
    2. We craft a broken message containing non-utf-8 in the code of Sensor.
    3. On version 4.7 the marshaling error on Sensor side does not happen.

Thus: Validation of the secrets read from k8s makes no sense and this PR will be closed.

Update 2 ⚠️

Apparently storing secrets with non-uft-8 characters in k8s to make them harmless works only for some selected non-utf-8 characters. @dcaravel found that the following secret can lead to a crash even if stored and read from k8s:

apiVersion: v1
data:
  .dockercfg: ewogICAgImV4YW1wbGUuY29tIjogewogICAgICAgICJhdXRoIjoibnVsbE9wN3BaUT09IgogICAgfQp9
kind: Secret
metadata:
  name: badsec
  namespace: stackrox
type: kubernetes.io/dockercfg

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

Reproduction

First, I hardcoded a dockerConfig secret (in the code) with non-UTF-8 characters*. I did something like:

		registries = append(registries, &storage.ImagePullSecret_Registry{
			Name:     registryAddr + "a\xc5z",
			Username: dce.Username,
		})

around this place:

registries = append(registries, &storage.ImagePullSecret_Registry{

and run that with local sensor:

go run tools/local-sensor/main.go -kubeconfig $HOME/.cluster1/kubeconfig -no-mem-prof -no-cpu-prof -connect-central localhost:8443

then in Sensor logs I observed the following error:

common/sensor: 2025/05/12 16:08:37.341123 central_receiver_impl.go:57: Info: Stopping with error: rpc error: code = Internal desc = grpc: failed to unmarshal the received message: string field contains invalid UTF-8
common/sensor: 2025/05/12 16:08:37.341145 central_sender_impl.go:116: Error: unable to send to stream: EOF
common/sensor: 2025/05/12 16:08:37.341152 central_communication_impl.go:73: Error: Stopping connection due to error: rpc error: code = Internal desc = grpc: failed to unmarshal the received message: string field contains invalid UTF-8

*) I did that, because I am unable to type a non-UTF-8 character from my keyboard directly into a k8s secret.

Copy link

openshift-ci bot commented May 12, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@rhacs-bot
Copy link
Contributor

rhacs-bot commented May 12, 2025

Images are ready for the commit at d4453b1.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.8.x-828-gd4453b164d.

Copy link

codecov bot commented May 12, 2025

Codecov Report

Attention: Patch coverage is 66.66667% with 6 lines in your changes missing coverage. Please review.

Project coverage is 49.24%. Comparing base (dbad869) to head (a4fc30d).
Report is 11 commits behind head on master.

Files with missing lines Patch % Lines
sensor/kubernetes/listener/resources/secrets.go 66.66% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #15270      +/-   ##
==========================================
- Coverage   49.24%   49.24%   -0.01%     
==========================================
  Files        2578     2578              
  Lines      189182   189298     +116     
==========================================
+ Hits        93158    93212      +54     
- Misses      88688    88744      +56     
- Partials     7336     7342       +6     
Flag Coverage Δ
go-unit-tests 49.24% <66.66%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@dcaravel dcaravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few top of mind thoughts

  • This seems like it would work just fine, bit shocked we haven't run into this in the past.
  • Is the issue only with gRPC such that k8s / CRI-O still work OK even when a secret has a 'non-utf-8' character in it? (ie: is it only ACS that cannot interpret this data)
    • If so, wondering if worth looking at how k8s/cri-o handle this - perhaps mimcing any encoding / types / validation would be better? (granted introduces some other complexes on either end of the comms, and when persisting the data to Central DB may run into the same error during marshalling)
    • Or perhaps there is some gRPC flag to not be strict on invalid utf-8 chars? <- Not so sure I like this - is a cover up
    • Or perhaps strings.ToValidUTF8 can help? <- this also makes me uneasy as the data is being changed.

Would it help to also add logging to the error message on either end of the comms to print the 'type' information of the message that failed to be marshalled/unmarshalled? (ie: error failed to marshall, bad utf8, in MsgFromSensor.SensorEvent_ImageIntegration) so that we would know (for example) it is a secret that had the bad data vs. a deployment vs. something else.

@vikin91
Copy link
Contributor Author

vikin91 commented May 13, 2025

Would it help to also add logging to the error message on either end of the comms to print the 'type' information of the message that failed to be marshalled/unmarshalled? (ie: error failed to marshall, bad utf8, in MsgFromSensor.SensorEvent_ImageIntegration) so that we would know (for example) it is a secret that had the bad data vs. a deployment vs. something else.

It is a second case like this that I am handling and I would love to have the information that you suggest. The tricky part here is that if there is an error returned from grpc (on receiving), then the msg object is nil. In order to get to that object, we would need to change the automatically-generated grpc code. So shortly speaking, it is impossible in current case.

However, it is possible to do that on the sending side. And I think, we should have it.

@janisz
Copy link
Contributor

janisz commented May 13, 2025

I made a small test and looks like VT proto properly handles non-utf8 characters. From your example it looks proto is called instead of vt proto. This may happen because codec is not setup properly (in theory it should as it's in init and sensor imports pkg/grpc). I curious why this was not happening before as gogo also reports an error on non-utf8 character.

package protocompat

import (
	"testing"

	gogo "github.com/gogo/protobuf/proto"
	"github.com/stackrox/rox/pkg/protoassert"
	"github.com/stretchr/testify/assert"
	"google.golang.org/protobuf/proto"

	"github.com/stackrox/rox/generated/storage"
)

var name = string([]byte{0xC0})

func TestUnmarshalGogo(t *testing.T) {
	m := &storage.NamespaceMetadata{Name: name}
	_, err := gogo.Marshal(m)
	assert.EqualError(t, err, `proto: field ".Name" contains invalid UTF-8`)
}

func TestUnmarshalVT(t *testing.T) {
	m := &storage.NamespaceMetadata{Name: name}
	a, err := m.MarshalVT()
	assert.NoError(t, err)
	n := &storage.NamespaceMetadata{}
	err = n.UnmarshalVT(a)
	assert.NoError(t, err)
	protoassert.Equal(t, m, n)
}

func TestUnmarshalProto(t *testing.T) {
	m := &storage.NamespaceMetadata{Name: name}
	_, err := proto.Marshal(m)
	assert.EqualError(t, err, `string field contains invalid UTF-8`)
}

@vikin91
Copy link
Contributor Author

vikin91 commented May 13, 2025

This may happen because codec is not setup properly (in theory it should as it's in init and sensor imports pkg/grpc)

@janisz thanks for sharing the insights! Do you have an idea how to verify that the codec is setup properly?
I believe that the communication in both ways between Sensor and Central may be affected by that, as I saw the errors on both sides and in both marshaling and unmarshaling.

@dcaravel
Copy link
Contributor

It is a second case like this that I am handling and I would love to have the information that you suggest. The tricky part here is that if there is an error returned from grpc (on receiving), then the msg object is nil. In order to get to that object, we would need to change the automatically-generated grpc code. So shortly speaking, it is impossible in current case.

However, it is possible to do that on the sending side. And I think, we should have it.

Agree not possible on the receiving side, would be on the sending side.

@janisz
Copy link
Contributor

janisz commented May 13, 2025

We can create a unit test for straming like here https://github.com/stackrox/stackrox/pull/14083/files#diff-80aadd25196a5db848dfb2c5d9240d09512010efc442f90766ac247fbcfc4180
Or you can change the error log line to "+v" to get stack trace if error is wrapped

@vikin91
Copy link
Contributor Author

vikin91 commented May 15, 2025

I am closing this PR. The reasons for that are in the PR description in the "Update" section.

@vikin91 vikin91 closed this May 15, 2025
@vikin91
Copy link
Contributor Author

vikin91 commented May 15, 2025

Temporary, one can use branch piotr/ROX-29232-broken-docker-secrets to make sensor send broken messages to Central with command:

ROX_LOCAL_IMAGE_SCANNING_ENABLED=true go run tools/local-sensor/main.go -kubeconfig $HOME/.cluster1/kubeconfig -no-mem-prof -no-cpu-prof -connect-central localhost:8443

@vikin91 vikin91 reopened this May 16, 2025
Copy link

gitguardian bot commented May 16, 2025

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
17208197 Triggered Generic High Entropy Secret 9da1da2 sensor/kubernetes/listener/resources/secrets_test.go View secret
17207583 Triggered Kubernetes Docker Secret 8f6765e sensor/kubernetes/listener/resources/secrets_test.go View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@vikin91
Copy link
Contributor Author

vikin91 commented May 16, 2025

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

This is a dummy secret created only for the tests.

@vikin91 vikin91 marked this pull request as ready for review May 16, 2025 12:37
@vikin91 vikin91 requested review from a team as code owners May 16, 2025 12:37
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @vikin91 - I've reviewed your changes - here's some feedback:

  • In the update/delete flows you drop the event entirely when all registries are invalid, which can leave stale image integrations behind—consider emitting a delete event in that case to clean up existing integrations.
  • The function name validateSecret is misleading since it only validates a single DockerConfig entry; consider renaming it (e.g. validateDockerConfigEntry) and extracting common UTF-8 checks for reuse.
  • Rather than ad-hoc UTF-8 checks on registry, username, and password, you may want to enforce UTF-8 validation at the gRPC/message boundary or switch to byte fields to avoid similar crashes elsewhere.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟡 Testing: 1 issue found
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@vikin91 vikin91 requested a review from dcaravel May 16, 2025 13:47
Copy link
Contributor

@RTann RTann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, do you know what happens in K8s? As in, does K8s see these registries as valid/are they actually valid?

@vikin91
Copy link
Contributor Author

vikin91 commented May 16, 2025

Out of curiosity, do you know what happens in K8s? As in, does K8s see these registries as valid/are they actually valid?

Unfortunately, I don't know. I would need to do an experiment.

Copy link
Contributor

@dcaravel dcaravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth preventing the connection from being killed for any invalid message? Will that be too much overhead? Since this can occur for any message sent to Central from Sensor (or vice versa).

I realize that k8s does some degree of validation for more structured fields (such as resource names) - should we treat all info from k8s events as 'untrusted input'?

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
@vikin91
Copy link
Contributor Author

vikin91 commented May 27, 2025

@dcaravel I addressed some of the issues; others I commented on as I am not sure how to incorporate them into this PR. Do you think we can merge this for 4.8 or shall we improve more?

Copy link
Contributor

@janisz janisz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a changelog entry for this with information that we do not support non UTF-8 docker configs.

Co-authored-by: David Caravello <119438707+dcaravel@users.noreply.github.com>
rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
@vikin91
Copy link
Contributor Author

vikin91 commented Jun 3, 2025

@dcaravel I acknowledge that this PR is not fixing all the issues, but I believe that having it in 4.8 would be valuable.
Thus, I plan to merge it, unless you give it a ❌. Would that be okay?

@dcaravel
Copy link
Contributor

dcaravel commented Jun 3, 2025

I'm reviewing right now - give me 10 mins

Copy link
Contributor

@dcaravel dcaravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

openshift-ci bot commented Jun 3, 2025

@vikin91: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/gke-sensor-integration-tests d4453b1 link false /test gke-sensor-integration-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@vikin91
Copy link
Contributor Author

vikin91 commented Jun 3, 2025

The integration tests failure looks unrelated:

    sync_test.go:100: 
        	Error Trace:	/go/src/github.com/stackrox/stackrox/sensor/tests/complianceoperator/sync_test.go:100
        	            				/go/src/github.com/stackrox/stackrox/sensor/tests/helper/helper.go:396
        	            				/go/src/github.com/stackrox/stackrox/sensor/tests/helper/helper.go:352
        	            				/go/src/github.com/stackrox/stackrox/sensor/tests/helper/helper.go:231
        	            				/go/src/github.com/stackrox/stackrox/sensor/tests/complianceoperator/sync_test.go:89
        	Error:      	Received unexpected error:
        	            	object is being deleted: customresourcedefinitions.apiextensions.k8s.io "scansettings.compliance.openshift.io" already exists
        	Test:       	Test_ComplianceOperatorScanConfigSync

@vikin91 vikin91 enabled auto-merge (squash) June 3, 2025 15:41
@vikin91 vikin91 merged commit d5216e5 into master Jun 3, 2025
85 of 87 checks passed
@vikin91 vikin91 deleted the piotr/ROX-29232-broken-docker-secrets branch June 3, 2025 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants