Skip to content

Commit 7e3732c

Browse files
authored
fix(clustering) constant cache flips on dp because of invalidations event (Kong#7112)
### Summary @murillopaula let us know about hybrid mode behavior that caused constant flips of data plane configuration even when no change was actively made with admin api. It turned out that our `clustering` module subscribes to `invalidations`, and on each `invalidations` event it starts to push configs to data planes connected to it. With further look it was also found that the `clustering` module will by itself generate `invalidations` events when it updates the `clustering_data_planes` table about the information of the data plane that connected to it (e.g. last seen status). This caused constant `invalidations` happen across the control plane cluster and then in turn caused constant sending of config and config flips on data planes. The solution was to make check for handling the `dao:crud` event more strict, and make that to create another cluster event called `clustering:invalidation`, that in turn causes config push. This indirection allows us to filter those invalidations events that don't cause actual database change. And the clustering module does not subscribe anymore to the generic `invalidations` event, which has a more broad scope than database entity invalidations.
1 parent 74fb82b commit 7e3732c

File tree

1 file changed

+28
-15
lines changed

1 file changed

+28
-15
lines changed

kong/clustering.lua

Lines changed: 28 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ local function update_config(config_table, update_cache)
8484
end
8585

8686
if declarative.get_current_hash() == new_hash then
87-
ngx_log(ngx_DEBUG, "same config received from control plane,",
87+
ngx_log(ngx_DEBUG, "same config received from control plane, ",
8888
"no need to reload")
8989
return true
9090
end
@@ -942,34 +942,47 @@ function _M.init_worker(conf)
942942
local push_config_semaphore = semaphore.new()
943943

944944
-- Sends "clustering", "push_config" to all workers in the same node, including self
945-
local function post_push_config_event_to_node_workers(data)
946-
if type(data) == "table" and data.schema and
947-
data.schema.db_export == false
948-
then
945+
local function post_push_config_event()
946+
local res, err = kong.worker_events.post("clustering", "push_config")
947+
if not res then
948+
ngx_log(ngx_ERR, "unable to broadcast event: ", err)
949+
end
950+
end
951+
952+
-- Handles "clustering:push_config" cluster event
953+
local function handle_clustering_push_config_event(data)
954+
ngx_log(ngx_DEBUG, "received clustering:push_config event for ", data)
955+
post_push_config_event()
956+
end
957+
958+
959+
-- Handles "dao:crud" worker event and broadcasts "clustering:push_config" cluster event
960+
local function handle_dao_crud_event(data)
961+
if type(data) ~= "table" or data.schema == nil or data.schema.db_export == false then
949962
return
950963
end
951964

965+
kong.cluster_events:broadcast("clustering:push_config", data.schema.name .. ":" .. data.operation)
966+
952967
-- we have to re-broadcast event using `post` because the dao
953968
-- events were sent using `post_local` which means not all workers
954969
-- can receive it
955-
local res, err = kong.worker_events.post("clustering", "push_config")
956-
if not res then
957-
ngx_log(ngx_ERR, "unable to broadcast event: " .. err)
958-
end
970+
post_push_config_event()
959971
end
960972

961-
-- The "invalidations" cluster event gets inserted in the cluster when there's a crud change
962-
-- (like an insertion or deletion). Only one worker per kong node receives this callback.
963-
-- This makes such node post push_config events to all the cp workers on its node
964-
kong.cluster_events:subscribe("invalidations", post_push_config_event_to_node_workers)
973+
-- The "clustering:push_config" cluster event gets inserted in the cluster when there's
974+
-- a crud change (like an insertion or deletion). Only one worker per kong node receives
975+
-- this callback. This makes such node post push_config events to all the cp workers on
976+
-- its node
977+
kong.cluster_events:subscribe("clustering:push_config", handle_clustering_push_config_event)
965978

966979
-- The "dao:crud" event is triggered using post_local, which eventually generates an
967-
-- "invalidations" cluster event. It is assumed that the workers in the
980+
-- ""clustering:push_config" cluster event. It is assumed that the workers in the
968981
-- same node where the dao:crud event originated will "know" about the update mostly via
969982
-- changes in the cache shared dict. Since DPs don't use the cache, nodes in the same
970983
-- kong node where the event originated will need to be notified so they push config to
971984
-- their DPs
972-
kong.worker_events.register(post_push_config_event_to_node_workers, "dao:crud")
985+
kong.worker_events.register(handle_dao_crud_event, "dao:crud")
973986

974987
-- When "clustering", "push_config" worker event is received by a worker,
975988
-- it loads and pushes the config to its the connected DPs

0 commit comments

Comments
 (0)