-
Notifications
You must be signed in to change notification settings - Fork 611
databases keep restarting after the upgrade from 5.3 to 5.7.2 #4069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Ever after upgrading k3s version to 1.26.3, nothing has changed and still keep restarting. |
now at least it restarts randomly after running for 5-40mins and prints out below message before it gets rebooted. I assume something is trying to apply some changes to the database, which requires reboot. Then the database restarts when it has been applied. The interesting part is, it doesn't say what kind of configuration change is going to be applied.
|
@benjaminjb do you have any idea what could be the cause? |
Hey @batulziiy sorry you're running into a problem. I am real curious why one k8s cluster has errors and the other is fine, but just to check that I understand the situation:
Do I have that right? I have to say, I am curious that the original 1.26 K8s cluster was working fine with PGO 5.7, because according to our support matrix, 5.7 is compatible with 1.28 - 1.32. So, for hopefully a quick fix, can you upgrade your k8s clusters from 1.26 to 1.28? |
thanks @benjaminjb for the quick response. You understood it correctly and all the bullet items you wrote above are correct. I'm still wondering why it runs on one, while doesn't run on another one. The both k3s cluster have been running stable for more than 2 years now, but one was running on 1.25 till yesterday. |
now I've paused the pg cluster reconcilliation process by running below command and haven't received 'receiving SIGHUP' message since then. But it's not a permanent solution as the operator will lose the ability to control the database. |
so far below are the differences I've found between 2 clusters.
|
Questions
I did postgres-operator version upgrade from 5.2 to 5.7.2 yesterday and it seemed working fine after the upgrade. I was able to connect to the database and query. However this morning found that clusters keep restarting after running normal for ~1-2 mins. I did the exact same upgrade for 2 different environments and one works fine while another one keeps restarting.
Only difference I found is on kube cluster with version v1.26.3+k3s1 it works fine, and we have an issue on the cluster with version v1.25.4+k3s1. Not sure if it makes great difference.
In the pgo pod log, I can see the below error is generated :
time="2025-01-13T09:11:21Z" level=error msg="Reconciler error" PostgresCluster=postgres-operator/hippo controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster error="Operation cannot be fulfilled on Pod \"hippo-instance1-f9rv-0\": the ResourceVersion in the precondition (1043902053) does not match the ResourceVersion in record (1043902088). The object might have been modified" file="internal/controller/postgrescluster/instance.go:879" func="postgrescluster.(*Reconciler).rolloutInstance" name=hippo namespace=postgres-operator reconcileID=6f692192-9b74-4921-891e-ac91a86b00ea version=5.7.2-0
Also in the pod log, I got an exit code 137, which is a bit strange as I have enough memory resource on the cluster.
Have you ever had the same experience, would appreciate if you share your thoughts here, thanks.
Environment
Please provide the following details:
The text was updated successfully, but these errors were encountered: