Skip to content

Refactor Volume Auto Grow to support additional volume types #4239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 14, 2025

Conversation

tjmoore4
Copy link
Contributor

Checklist:

  • Have you added an explanation of what your changes do and why you'd like them to be included?
  • Have you updated or added documentation for the change, as applicable?
  • Have you tested your changes on all related environments with successful results, as applicable?
    • Have you added automated tests?

Type of Changes:

  • New feature
  • Bug fix
  • Documentation
  • Testing enhancement
  • Other

What is the new behavior (if this is a feature change)?

  • Breaking change (fix or feature that would cause existing functionality to change)
    This commit refactors the existing pgData volume auto grow code to better support upcoming feature enhancements relating to auto grow capability for the pgBackRest repository volume and pg_wal volume.

Other Information:
Issue: PGO-2606

volumeType, instanceSpecName string,
volumeRequestSize *resource.Quantity) (string, error) {

switch volumeType {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 This switch statement will likely be very similar to the one used by limitIsSet. The difference is that this function references status objects that don't yet exist whereas limitIsSet required nothing new outside of itself, so it made sense to move forward a bit there but not here.

@tjmoore4 tjmoore4 force-pushed the autogrow-refactor branch 3 times, most recently from 74853dc to b303aea Compare August 13, 2025 17:26
// status. If the value has grown, create an Event.
func (r *Reconciler) storeDesiredRequest(
ctx context.Context, cluster *v1beta1.PostgresCluster,
volumeType, instanceSetName, desiredRequest, desiredRequestBackup string,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would these be accurate descriptions of these variables? Not necessarily a suggestion... just checking my understanding

Suggested change
volumeType, instanceSetName, desiredRequest, desiredRequestBackup string,
volumeType, instanceSetName, currentRequestString, previousRequestString string,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's right. The reason it's referred to as a 'backup' here is that that would be how it would be used. In case there's a parsing issue, etc, this value gives us something to roll back to.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need something to roll back to? wouldn't we typically bail if we hit an issue then assume we the change would pass on the next reconcile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also used for comparison purposes by the function, so it's doing a couple of different things. Part of what's happening is we're going from a string to a parsed value, so that has the potential to cause issues. In terms of naming, really 'previous' or 'backup' are both accurate. I guess the difference is a backup is always previous but previous isn't always a backup? Maybe? 🤷‍♂️

@tjmoore4 tjmoore4 marked this pull request as ready for review August 13, 2025 18:21

// Cycle through the instance sets to ensure the correct limit is identified.
case "pgData":
for _, specInstance := range cluster.Spec.InstanceSets {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we passed around instance instead of instanceSetName would that mean we didn't need to loop over each instance ❓ 💭

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we could make this even more generic. We would use this same function for a pgbackrest repo 🤔 and the repo wouldn't be tied to an instance... right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly most of these functions pass around some reference to the instance. Does that break when we start thinking about pgbackrest repo ❓

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is ultimately being triggered by observeInstances, so we don't really have a good way to access the instance object itself if I remember correctly.

We'll need to know the volume type, etc so that we can drill down to the correct portion of the cluster spec, so I don't know if we can simplify this much further.

I think a lot of the logic around ensuring the various instance volumes grow in tandem won't be needed for the repo volume work. There can be more than one instance set and each instance set can have more than one replica. All the volumes for a given instance should grow the same, but each instance set can be different (and have different limits). Repo volumes on the other hand should be (assuming I'm not forgetting anything) 1:1 per repo definition. All that to say, I don't know that it will be quite as complicated even though it might require more code.

For this particular function, I was expecting to just pass in an entry string for the instanceSetName and key off of the volumeType where that will be repo1, repo2, etc. That said, we'll have to see how things work out. It might make sense to do things a different way, but I can't say for certain at this point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we just passed in a VolumeClaimSpec 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really following. We'd have to associate the VolumeClaimSpec with the appropriate instance name otherwise we can't associate it with a request.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be in the autgrow file 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's used by config.go in the postgres package. If we moved it to the autogrow.go file, we'd have to import postgrescluster which I don't think we want to do.

Comment on lines +362 to 366
status.DesiredPGDataVolume[instance.Name] = r.storeDesiredRequest(ctx, cluster, "pgData",
name, status.DesiredPGDataVolume[instance.Name], previousDesiredRequests[instance.Name])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to https://github.com/CrunchyData/postgres-operator/pull/4239/files#r2274537730

if we passed instance in here instead of name would we have access to everything we need... ❓

Suggested change
status.DesiredPGDataVolume[instance.Name] = r.storeDesiredRequest(ctx, cluster, "pgData",
name, status.DesiredPGDataVolume[instance.Name], previousDesiredRequests[instance.Name])
status.DesiredPGDataVolume[instance.Name] = r.storeDesiredRequest(ctx, cluster, "pgData",
instance, status.DesiredPGDataVolume[instance.Name], previousDesiredRequests[instance.Name])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose, but we're still getting that from the cluster. I think the idea behind passing just the name is to make sure we're updating the statuses one at a time. storeDesiredRequest really only works with a single instance at a time, which is I think what we want to do here.

} else if feature.Enabled(ctx, feature.AutoGrowVolumes) {

// determine the appropriate volume request based on what's set in the status
if dpv, err := getDesiredVolumeSize(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do we think the "P" in dpv means?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not seeing where dpv is used in the rest of this function besides when we log and error.
am i missing something 🔧

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh is see, volumeRequestSize is a pointer that is passed in and updated.

it took me a minute to get here - maybe its worth changing something here ♻️

  • if we aren't using the return value maybe we don't return it
  • or maybe return the value instead of updating the pointer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about the logic in this function

	if volumeLimitFromSpec.IsZero() { return }
    if request > limit { // use limit + warn }
    if autogrow {
        dpv = getDesired // this changes value of request
        if requet >= limit { // use limit + warn }
    }

if limit isn't set, autogrow isn't enabled so do nothing
if request has exceeded the limit - use the limit
if autogrow is enabled we see if the request is different
use getDesiredVolumeSize to override the request
if value has met or exceeded the limit - use the limit

maybe...

	if volumeLimitFromSpec.IsZero() {
		return
	}

    request =  pvc.Spec.Resources.Requests.Storage()
    if feature.Enabled(ctx, feature.AutoGrowVolumes) {
        request = getDesiredVolumeSize()
    }

    if request >= limit {
       // use limit and warn
    }

this is assuming we will always use the "desired" volume size if autogrow is enabled and I might be wrong in that assumption 💭

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we simplified to that logic, I think that eliminates the warning event for a defined request that is higher than the limit. That's a useful warning in my opinion because it helps explains unexpected behavior (due to a misconfiguration on the user's part; you shouldn't even define a request that's higher than the limit you define in the same object).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the return by reference, we could definitely return it explicitly. I'm not returning anything that's not used, it's just that the one value is only used in the error log, which feels ok to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which event would we be eliminating?

dvp doesn't seem to be needed outside of getDesiredVolumeSize beyond logging the error. If we log the error in the function the return isn't necessary. We could pass in ctx to the func and have access to the same logger 📝

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would eliminate the VolumeRequestOverLimit Event if I'm following correctly.

I was trying to keep the pulled out function simple, which also makes the test less complicated. It seemed better to just return the string in that case. 🤷‍♂️

Copy link
Collaborator

@jmckulk jmckulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks like it works to me. I still need to dig in to understand how auto grow works but i can do that as part of future changes

This commit refactors the existing pgData volume auto grow code
to better support upcoming feature enhancements relating to auto
grow capability for the pgBackRest repository volume and pg_wal
volume.

Issue: PGO-2606
@tjmoore4 tjmoore4 merged commit 8d323cc into CrunchyData:main Aug 14, 2025
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants