Skip to content

Conversation

camilamacedo86
Copy link
Contributor

When upgrading operators, CRD validation errors can be very large (50KB+). Kubernetes rejects status updates over 32KB with "Too long: may not be more than 32768 bytes". This causes ClusterExtension upgrades to fail and get stuck.

Added truncateMessage() function that cuts messages over 30KB. Applied to status condition functions that handle large errors:

  • setStatusProgressing() - handles CRD validation errors
  • ensureAllConditionsWithReason() - handles resolution errors
  • setInstalledStatusConditionUnknown() - handles bundle errors

Messages keep important info at the start and add "... [message truncated]" suffix. Now upgrades complete successfully even with large CRD validation errors.

Added unit tests for truncation logic and CRD error scenarios.

Reviewer Checklist

  • [N/A] API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • [N/A ] Links to related GitHub Issue(s)

When upgrading operators, CRD validation errors can be very large (50KB+).
Kubernetes rejects status updates over 32KB with "Too long: may not be more than 32768 bytes".
This causes ClusterExtension upgrades to fail and get stuck.

Added `truncateMessage()` function that cuts messages over 30KB.
Applied to status condition functions that handle large errors:
- `setStatusProgressing()` - handles CRD validation errors
- `ensureAllConditionsWithReason()` - handles resolution errors
- `setInstalledStatusConditionUnknown()` - handles bundle errors

Messages keep important info at the start and add "... [message truncated]" suffix.
Now upgrades complete successfully even with large CRD validation errors.

Added unit tests for truncation logic and CRD error scenarios.

Assisted-by: Cursor
@camilamacedo86 camilamacedo86 requested a review from a team as a code owner August 27, 2025 13:18
Copy link

netlify bot commented Aug 27, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 3fd05a8
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/68af05a39e81250008b07b97
😎 Deploy Preview https://deploy-preview-2169--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link

openshift-ci bot commented Aug 27, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign thetechnick for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@camilamacedo86 camilamacedo86 changed the title 🐛 Fix: Truncate large error messages in status conditions 🐛 Fix: Truncate large error messages in status conditions (OCPBUGS-59518, OCPBUGS-38567) Aug 27, 2025
@joelanford
Copy link
Member

Can you include some details of the messages that are too long? I feel like arbitrarily truncating the message is sort of papering over the underlying issue, which is that 30k-byte messages in conditions are a poor UX, and the real solution would be to make the message shorter to begin with.

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 27, 2025
@@ -27,6 +27,23 @@ import (
ocv1 "github.com/operator-framework/operator-controller/api/v1"
)

const (
// maxConditionMessageLength is the Kubernetes limit minus some buffer for safety
maxConditionMessageLength = 30000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the maximum length is 32768 according to kubernetes validation, we can use all of that. There's no need to have an extra bit of space for safety.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an extra for truncationSuffix = "\n\n... [message truncated]" at least

@@ -160,7 +160,7 @@ func ensureAllConditionsWithReason(ext *ocv1.ClusterExtension, reason v1alpha1.C
Type: condType,
Status: metav1.ConditionFalse,
Reason: string(reason),
Message: message,
Message: truncateMessage(message),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a limit imposed on all condition messages, it seems like we need to make sure that we truncate all condition messages.

This is one of many places where we set condition messages, right?

We may need to implement a wrapper around the meta.SetCondition() that:

  1. truncates messages
  2. everything throughout our project uses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think wrapper will be better as well +1

@camilamacedo86 camilamacedo86 changed the title 🐛 Fix: Truncate large error messages in status conditions (OCPBUGS-59518, OCPBUGS-38567) WIP 🐛 Fix: Truncate large error messages in status conditions (OCPBUGS-59518, OCPBUGS-38567) Aug 27, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 27, 2025
@camilamacedo86
Copy link
Contributor Author

Hi @joelanford

Thank you for your fast review
I just pushed it :-)

I feel like arbitrarily truncating the message is sort of papering over the underlying issue,

I remember when we discussed it in the past the idea was just trunc
The problem is the preflight checks they give a huge amount of data
I will think about

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants