update ccloud with master changes #3

stefanhipfel · 2018-09-05T08:27:52Z

No description provided.

Compare pods controller revisions with the one for the statefulset to determine whether the pod is running the latest revision and, therefore, no rolling update is necessary. This is performed only during the operator start, afterwards the rolling update status that is stored locally in the cluster structure is used for all rolling update decisions.

It was set to `endTimestamp`, but it should be `timestamp`.

…ccount creation

…ervice-issues Tolerate issues of the Teams API

…eploy-service-account Deploy service account for pod creation on demand

Avoid showing "there is no service in the cluster" when syncing a service for the cluster if the operator has been restarted after the cluster had been created.

…_service_definition_during_sync Fix a bug with syncing services

…og-to-s3 Set up an S3 bucket for the postgres logs

zalando#281) * Improve the pod moving behavior during the Kubernetes cluster upgrade. Fix an issue of not waiting for at least one replica to become ready (if the Statefulset indicates there are replicas) when moving the master pod off the decomissioned node. Resolves the first part of zalando#279. Small fixes to error messages. * Eliminate a race condition during the swithover. When the operator initiates the failover (switchover) that fails and then retries it for a second time it may happen that the previous waitForPodChannel is still active. As a result, the operator subscribes to the former master pod two times, causing a panic. The problem was that the original code didn't bother to cancel the waitForPodLalbel for the new master pod in the case when the failover fails. This commit fixes it by adding a stop channel to that function. Code review by @zerg-junior

@Jan-M

* Sanity checks for the cluster name, improve tests. - check that the normal and clone cluster name complies with the valid service name. For clone cluster, only do it if clone timestamp is not set; with a clone timestamp set, the clone name points to the S3 bucket - add tests and improve existing ones, making sure we don't call Error() method for an empty error, as well as that we don't miss cases where expected error is not empty, but actual call to be tested does not return an error. Code review by @zerg-junior and @Jan-M

…rs to minicube.

…ling updates.

@theRealWardo

* Define sidecars in the operator configuration. Right now only the name and the docker image can be defined, but with the help of the pod_environment_configmap parameter arbitrary environment variables can be passed to the sidecars. * Refactoring around generatePodTemplate. Original implementation of per-cluster sidecars by @theRealWardo Per review by @zerg-junior and @Jan-M

@schmitch

The old way of specifying it with the annotation is deprecated and not available in recent Kubernetes versions. We will keep it there anyway until upgrading to the new go-client that is incompatible with those versions. Per report from @schmitch

@valer-cara

…ndo#343) * Switchover must wait for the inner goroutine before it returns. Otherwise, two corner cases may happen: - waitForPodLabel writes to the podLabelErr channel that has been already closed by the outer routine - the outer routine exists and the caller subscribes to the pod the inner goroutine has already subscribed to, resulting in panic. The previous commit zalando@fe47f9e that touched that code added the cancellation channel, but didn't bother to actually wait for the goroutine to be cancelled. Per report and review from @valer-cara. Original issue: zalando#342

@Jan-M

* Up until now, the operator read its own configuration from the configmap. That has a number of limitations, i.e. when the configuration value is not a scalar, but a map or a list. We use a custom code based on github.com/kelseyhightower/envconfig to decode non-scalar values out of plain text keys, but that breaks when the data inside the keys contains both YAML-special elememtns (i.e. commas) and complex quotes, one good example for that is search_path inside `team_api_role_configuration`. In addition, reliance on the configmap forced a flag structure on the configuration, making it hard to write and to read (see zalando#308 (comment)). The changes allow to supply the operator configuration in a proper YAML file. That required registering a custom CRD to support the operator configuration and provide an example at manifests/postgresql-operator-default-configuration.yaml. At the moment, both old configmap and the new CRD configuration is supported, so no compatibility issues, however, in the future I'd like to deprecate the configmap-based configuration altogether. Contrary to the configmap-based configuration, the CRD one doesn't embed defaults into the operator code, however, one can use the manifests/postgresql-operator-default-configuration.yaml as a starting point in order to build a custom configuration. Since previously `ReadyWaitInterval` and `ReadyWaitTimeout` parameters used to create the CRD were taken from the operator configuration, which is not possible if the configuration itself is stored in the CRD object, I've added the ability to specify them as environment variables `CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` respectively. Per review by @zerg-junior and @Jan-M.

* During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.

Do not show 'persistent volumes are not compatible' errors for the volumes that failed to be resized because of the other reasons (i.e. the new size is smaller than the existing one).

* Improve generting of Scalyr container environment. Avoid duplicating POD_NAME and POD_NAMESPACE that already bundled every sidecar. Do not complain on the lack of SCLALYR_SERVER_HOST, since it is set to https://upload.eu.scalyr.com in the container we use. Do not mentioned SCALYR_SERVER_HOST in the error messages, since it is derived from the cluster name automatically.

A repair is a sync scan that acts only on those clusters that indicate that the last add, update or sync operation on them has failed. It is supposed to kick in more frequently than the repair scan. The repair scan still remains to be useful to fix the consequences of external actions (i.e. someone deletes a postgres-related service by mistake) unbeknownst to the operator. The repair scan is controlled by the new repair_period parameter in the operator configuration. It has to be at least 2 times more frequent than a sync scan to have any effect (a normal sync scan will update both last synced and last repaired attributes of the controller, since repair is just a sync underneath). A repair scan could be queued for a cluster that is already being synced if the sync period exceeds the interval between repairs. In that case a repair event will be discarded once the corresponding worker finds out that the cluster is not failing anymore. Review by @zerg-junior

…zalando#351)

There are shortcuts in this code, i.e. we created the deepcopy function by using the deepcopy package instead of the generated code, that will be addressed once migrated to client-go v8. Also, some objects, particularly statefulsets, are still taken from v1beta, this will also be addressed in further commits once the changes are stabilized.

@angapov

Per a gripe from @angapov: zalando#355

Among other things, fix a few issues with deepcopy implementation.

* Allow configuring pod priority globally and per cluster. Allow to specify pod priority class for all pods managed by the operator, as well as for those belonging to individual clusters. Controlled by the pod_priority_class_name operator configuration parameter and the podPriorityClassName manifest option. See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass for the explanation on how to define priority classes since Kubernetes 1.8. Some import order changes are due to go fmt. Removal of OrphanDependents deprecated field. Code review by @zerg-junior

* Draft codeowners, update maintainers * Minor reformatting

Run more linters in the gometalinter, i.e. deadcode, megacheck, nakedret, dup. More consistent code formatting, remove two dead functions, eliminate naked a bunch of naked returns, refactor a few functions to avoid code duplication.

Not much changes, except for one function that has been deprecated. However, unless we find a way to use semantic version comparisons like '^' on a branch name, we would have to update the apimachinery, apiextensions-apiserver and code-generator dependencies manually. Also, slash a linter warning about RoleOriginUnknown being not used.

Previously it has been supported by the operator, but the validity check excluded it for no reason.

Assign the list of clusters in the controller with the up-to-date list of Postgres manifests on Kubernetes during the startup. Node migration routines launched asynchronously to the cluster processing rely on an up-to-date list of clusters in the controller to detect clusters affected by the migration of the node and lock them when doing migration of master pods. Without the initial list the operator was subject to race conditions like the one described at zalando#363 Restructure the code to decouple list cluster function required by the postgresql informer from the one that emits cluster sync events. No extra work is introduced, since cluster sync already runs in a separate goroutine (clusterResync). Introduce explicit initial cluster sync at the end of acquireInitialListOfClusters instead of relying on an implicit one coming from list function of the PostgreSQL informer. Some minor refactoring. Review by @zerg-junior

…alando#361) Previously, the operator put pg_hba into the bootstrap/pg_hba key of Patroni. That had 2 adverse effects: - pg_hba.conf was shadowed by Spilo default section in the local postgresql configuration - when updating pg_hba in the cluster manifest, the updated lines were not propagated to DCS, since the key was defined in the boostrap section of Patroni. Include some minor refactoring, moving methods to unexported when possible and commenting out usage of md5, so that gosec won't complain. Per zalando#330 Review by @zerg-junior

Bump manifest to use v1.0.0 operator

Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on. Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API. All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1 Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator. Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into Change the type of the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it. Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes. Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files. Update client-go dependencies. Minor reformatting and renaming.

* Document code generation

@Jan-M

Add @Jan-M @CyberDem0n @avaczi as codeowners

* Improve error reporting for short cluster names * Revert to clusterName

…do#371) Added support for superuser team in addition to the admin team that owns the postgres cluster.

…trollers. (zalando#378)

sync with origin

alexeyklyukin and others added 30 commits April 9, 2018 18:07

Fix clone timestamp key in example manifest (zalando#276)

5e1d86e

It was set to `endTimestamp`, but it should be `timestamp`.

Deploy service account for pod creation on demand

214ae04

Remove sync of pod service accounts

23f8936

Document desired behaviour

2f3d63a

Name service account consistenly

a5a65e9

Turn ServiceAccount into struct value to avoid race conditon during a…

bd51d29

…ccount creation

Fix error reporting during pod service account creation

5daf0a4

Include default service account for pods into README.md

a88416e

Make operator unaware of its own service account

c31c762

Tolerate issues of the Teams API

bc8b950

Merge pull request zalando#278 from zalando-incubator/tolerate-team-s…

5d5b48f

…ervice-issues Tolerate issues of the Teams API

Move service account to Controller

485ec4b

Explicitly warn on account name mismatch

3d0ab40

Comment on the default value for pod service account name

e3f7fac

Convert default account definiton into JSON

d99b553

Always empty account's namespace after parsing

4255e70

Minor improvemets in reporting service account creation

1b718fd

Merge pull request zalando#277 from zalando-incubator/automatically-d…

8f08bef

…eploy-service-account Deploy service account for pod creation on demand

Fix a bug with syncing services

37caa3f

Avoid showing "there is no service in the cluster" when syncing a service for the cluster if the operator has been restarted after the cluster had been created.

Merge pull request zalando#282 from zalando-incubator/assign_existing…

4a3ccad

…_service_definition_during_sync Fix a bug with syncing services

Set up an S3 bucket for the postgres daily logs

c45219b

Shorten bucket name

59ded0c

Merge pull request zalando#284 from zalando-incubator/ship-pg-daily-l…

ebff820

…og-to-s3 Set up an S3 bucket for the postgres logs

Merge branch 'master' into pending_rolling_updates

43a1db2

Slimming out README and config map, targeting easy first time deploye…

2bb3bde

…rs to minicube.

Initial implementation for the statefulset annotations indicating rol…

1a20362

…ling updates.

Initial implementation for the statefulset annotations indicating rol…

ce0d4af

…ling updates.

alexeyklyukin and others added 30 commits July 2, 2018 16:25

Upgrade version to enable RBAC in multiple namespace (zalando#348)

accbe20

Avoid showing an extra error when resizing volume fails (zalando#350)

12871aa

Do not show 'persistent volumes are not compatible' errors for the volumes that failed to be resized because of the other reasons (i.e. the new size is smaller than the existing one).

Fix disabling database access and teams API via command-line options. (…

f27833b

…zalando#351)

Fix a link to the CRD manifest. (zalando#356)

d0f4148

Per a gripe from @angapov: zalando#355

Refactoring inspired by gometalinter. (zalando#357)

ac7b132

Among other things, fix a few issues with deepcopy implementation.

[WIP] Draft codeowners, update maintainers (zalando#358)

50f079c

* Draft codeowners, update maintainers * Minor reformatting

Include CREATEROLE to the list of allowed flags. (zalando#365)

acf46bf

Previously it has been supported by the operator, but the validity check excluded it for no reason.

Update postgres-operator.yaml

6e8dcab

Bump manifest to use v1.0.0 operator

Use cluster's own namespace to patch the cluster manifest (zalando#373)

aeae0a6

Add GoDoc badge to readme (zalando#372)

5ed1096

Document code generation (zalando#370)

4543bfd

* Document code generation

Update CODEOWNERS (zalando#376)

75a1d78

Add @Jan-M @CyberDem0n @avaczi as codeowners

Improve error reporting for short cluster names (zalando#377)

1e53e22

* Improve error reporting for short cluster names * Revert to clusterName

[WIP] Grant 'superuser' to the members of Postgres admin teams (zalan…

25fa45f

…do#371) Added support for superuser team in addition to the admin team that owns the postgres cluster.

Move CRD definitions into a formal API to allow access from other con…

a4224f6

…trollers. (zalando#378)

Merge pull request #2 from zalando-incubator/master

aee8182

sync with origin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update ccloud with master changes #3

update ccloud with master changes #3

Uh oh!

stefanhipfel commented Sep 5, 2018

Uh oh!

Uh oh!

update ccloud with master changes #3

Are you sure you want to change the base?

update ccloud with master changes #3

Uh oh!

Conversation

stefanhipfel commented Sep 5, 2018

Uh oh!

Uh oh!