Skip to content

Rejigger API Priority and Fairness config #130459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

MikeSpreitzer
Copy link
Member

@MikeSpreitzer MikeSpreitzer commented Feb 26, 2025

What type of PR is this?

/kind cleanup
/kind bug

What this PR does / why we need it:

This PR tweaks the default configuration objects for API Priority and Fairness. There are two changes.

  1. Introduce a FlowSchema and priority level for events.
  2. Generally reshuffle the nominal concurrency shares and borrowing parameters so that the baseline is more in line with actual priority and need and can respond via borrowing. This is good because the default configuration was originally designed before borrowing was introduced and has some inversions among the nominal concurrency shares. See [APF] low priorities have larger effective shares than high priorities #121982 (comment) for most of this. Also, @linxiulei has been recommending increasing the nominal concurrency shares for leader election, and testing supports this change. See Increase leader-election nominal concurrency shares from 10 to 40 #129646 and the other PRs and tests referenced in Increase leader-election nominal concurrency shares from 10 to 40 #129646 (comment) .

This PR subsumes #129646 .

Also, as suggested in #129646 (comment) , this PR introduces a FeatureGate that allows administrators to opt out of this set of configuration changes. The FeatureGate is named "APFNewConifg".

Which issue(s) this PR fixes:

Fixes #121982

Special notes for your reviewer:

Does this PR introduce a user-facing change?

The default configuration for API Priority and Fairness has changed, to work better under stress and take advantage of the borrowing of concurrency between priority levels. Note that during a rollout, while a cluster has a mixed population of kube-apiservers, they will periodically overwrite each other's configurations; this is expected to be a usable compromise. There is a new FeatureGate named "APFNewConifg" that can be disabled to opt out of the new configuration.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 26, 2025
@k8s-ci-robot k8s-ci-robot added area/apiserver area/test kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 26, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 26, 2025
@MikeSpreitzer
Copy link
Member Author

/cc @wojtek-t
/cc @tkashem
/cc @deads2k
/cc @linxiulei

@MikeSpreitzer MikeSpreitzer changed the title Rejigger APF config Rejigger API Priority and Fairness config Feb 26, 2025
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@MikeSpreitzer MikeSpreitzer force-pushed the rejigger-apf-config branch 3 times, most recently from 6605f14 to 26ac0eb Compare February 27, 2025 06:38
@MikeSpreitzer
Copy link
Member Author

MikeSpreitzer commented Mar 18, 2025

I have a problem with the new integration test. The pull-kubernetes-verify job complained about the way that the integration test flaps the Feature, saying

 Test files may not access mutable global feature gates directly:
test/integration/apiserver/flowcontrol/concurrency_test.go:			case featuregate.MutableFeatureGate:
Use this invocation instead:
  featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, features.<FeatureName>, <value>) 

But https://github.com/kubernetes/component-base/blob/v0.32.3/featuregate/testing/feature_gate.go#L40 says that it cannot flap one Feature.

So I am looking for a clue about whether there is a way to make an integration test like this, or I have to give up and make an E2E test instead, or a test of less-than-full servers.

@MikeSpreitzer
Copy link
Member Author

Erm, well, maybe that comment about not setting the same Feature twice in one test is just too cautious? I looked at the body of that func, and do not see a failure. I tested, and also did not get a failure. I have added a commit that switches to using that func.

return
}
tCtx.Cancel("test function done")
}
Copy link
Contributor

@tkashem tkashem Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do something like below and use the framework.StartTestServer with defaults

func TestConfigFlap(t *testing.T) {
    t.Run("first", func(t *testing.T){
        featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, genericfeatures.APFv134Config, false)
        ...
    })

    t.Run("second", func(t *testing.T){
        featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, genericfeatures.APFv134Config, true)
        ...
    })

    t.Run("third", func(t *testing.T){
        featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, genericfeatures.APFv134Config, false)
        ...
    })
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, of course. Done.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 20, 2025
Because some testing shows leader election being starved.

Signed-off-by: Mike Spreitzer <mspreitz@us.ibm.com>
.. so that it takes no larger bite from system capacity than before
(modulo small difference due to total nominal shares).
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 20, 2025
@MikeSpreitzer
Copy link
Member Author

The force push to a80c450 is a rebase onto master.

Make APFv134Config Alpha at 1.33
.. and use it multiple times on the same Feature in the same test.
The comment on this func says it will fail in that case, but it
appears to actually succeed. Examining the body of the func, I do
not see why it should fail.
@MikeSpreitzer
Copy link
Member Author

And the force-push to e3e70fc fixes a botch in that rebase.

@MikeSpreitzer
Copy link
Member Author

And the force-push to 5444b71 fixes a botch in the new MakeGate func, linter insists on checking the error from FeatureGate.Set.

.. instead of feature values.
@MikeSpreitzer
Copy link
Member Author

The force-push to 58b961e changes a variable name to not trigger a sloppy linter (hack/verify-non-mutating-validation.sh).

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 19, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dims dims added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 12, 2025
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/apiserver area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[APF] low priorities have larger effective shares than high priorities
8 participants