Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multisocket plugin #6882

Merged
merged 12 commits into from
Nov 13, 2024
Merged

multisocket plugin #6882

merged 12 commits into from
Nov 13, 2024

Conversation

Shmillerov
Copy link
Contributor

@Shmillerov Shmillerov commented Sep 20, 2024

1. Why is this pull request needed and what does it do?

Added the ability to run multiple servers on the same port using the multisocket plugin. This increases the throughput of CoreDNS in environments with a large number of CPU cores.

Syntax:

multisocket [NUM_SOCKETS]

default value of NUM_SOCKETS is equal to GOMAXPROCS.

Performance testing:
#6882 (comment)

2. Which issues (if any) are related?

#5595

3. Which documentation changes (if any) need to be made?

I added a readme for the plugin. After merging to CoreDNS repo, need to re-generate https://coredns.io/plugins/ using https://github.com/coredns/coredns.io

4. Does this introduce a backward incompatible change or deprecation?

Changes are backward compatible

SuperQ
SuperQ previously requested changes Sep 20, 2024
Copy link
Collaborator

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to implement this. Multi-process makes a number of things much worse. For example monitoring metrics and similar things.

I think we should put more effort into making multi-threaded scaling better rather than paper over the issue with multiple processes.

@chrisohaver
Copy link
Member

This is more or less doable currently by running multiple instances of coredns.

@johnbelamaric
Copy link
Member

I think we should put more effort into making multi-threaded scaling better rather than paper over the issue with multiple processes.

Isn't that what this does? It allows multiple internal servers (same process) to run on the same address and port. Basically, in #5595 it was determined that the single socket used to read the queries was a bottleneck; this alleviates that, IIUC.

@Shmillerov
Copy link
Contributor Author

Let me tell you about our scenario.
We want to run СoreDNS as a DaemonSet in Kubernetes. This means that there is only one pod per node. We can't scale CoreDNS horizontally in that case. And vertically too, because no matter how many resources you add, the CoreDNS will not perform better (#5595)

This fix allows to scale CoreDNS vertically.

This is more or less doable currently by running multiple instances of coredns.

In general, I agree. But it is more difficult to control these instances.

If we run several processes in a container, it is not clear how to detect main process. If the instance goes down, who will restart it? We would like to have one process so that Kubernetes itself controls the lifecycle of our container.

@SuperQ
Copy link
Collaborator

SuperQ commented Sep 23, 2024

Sorry, I clearly didn't read the code / proposal well enough. Setting up multiple listeners from the same process is the goal.

I think this is fine for now, at least until Go has better support for this.

@chrisohaver
Copy link
Member

Perhaps we can consider an alternate plugin name? “reuseport” is a bit confusing given that coredns already binds using the reuseport option. Perhaps “multiprocess” or “multiserver” or something.

@johnbelamaric
Copy link
Member

Perhaps we can consider an alternate plugin name? “reuseport” is a bit confusing given that coredns already binds using the reuseport option. Perhaps “multiprocess” or “multiserver” or something.

"numsockets" would most clearly describe what it does, I think. It's not about multiple processes.

@glebkin
Copy link

glebkin commented Sep 23, 2024

Hi!

Please take into account, that we also need additional changes in Caddy server - coredns/caddy#6 to address this issue - #5595 (comment) with performance degradation after server reload.

@Shmillerov Shmillerov changed the title reuseport plugin numsockets plugin Sep 23, 2024
@SuperQ
Copy link
Collaborator

SuperQ commented Sep 23, 2024

Maybe this should be a command line flag and not something reloadable?

@Shmillerov
Copy link
Contributor Author

Maybe this should be a command line flag and not something reloadable?

During reloading, the server restarts. When restarting, all servers start listening to a single socket because here: https://github.com/coredns/caddy/blob/master/caddy.go#L221, we have a map of server addresses to PacketConn.

With numsockets, we are starting multiple servers on the same address. Therefore, reloading is a a problem for us.

If we want this functionality with numsockets, we need to fix this somehow. It doesn't matter whether it is a plugin or a CLI flag.

@glebkin
Copy link

glebkin commented Sep 26, 2024

Can anyone please take a look at this PR - coredns/caddy#6?

@johnbelamaric
Copy link
Member

Or do you think this is enough? :)

whoami shows that the reading of the requests and the replies work properly still, and of course the restart test is critical. So I think that's probably good.

@johnbelamaric
Copy link
Member

Shouldn't there be at least one failing test, since we haven't merged coredns/caddy#6 ? I had expected to see that failing, then merge the caddy change, then update the go.mod to point to the new caddy version, and see the test clear up. The fact that it's not failing now makes me think our testing is incomplete?

@Shmillerov
Copy link
Contributor Author

https://github.com/coredns/coredns/pull/6882/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6R146
In this PR I did replace like

// Temp replace. Will be removed from this PR when the caddy changes are merged
replace github.com/coredns/caddy v1.1.1 => github.com/glebkin/caddy v0.0.0-20240924070517-c7def4b62439

So when caddy changes will be merged, I will remove it from here and will update the caddy version

Copy link

codecov bot commented Oct 29, 2024

Codecov Report

Attention: Patch coverage is 32.29167% with 65 lines in your changes missing coverage. Please review.

Project coverage is 57.75%. Comparing base (93c57b6) to head (df5da4d).
Report is 1317 commits behind head on master.

Files with missing lines Patch % Lines
core/dnsserver/register.go 0.00% 65 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6882      +/-   ##
==========================================
+ Coverage   55.70%   57.75%   +2.05%     
==========================================
  Files         224      270      +46     
  Lines       10016    17371    +7355     
==========================================
+ Hits         5579    10033    +4454     
- Misses       3978     6726    +2748     
- Partials      459      612     +153     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@johnbelamaric
Copy link
Member

@Shmillerov if you happen to be at KubeCon NA in Salt Lake City next month, would be great if you want to talk about this in the CoreDNS maintainer track session. Otherwise I will cover it. Great stuff.

@Shmillerov Shmillerov mentioned this pull request Oct 30, 2024
@Shmillerov
Copy link
Contributor Author

@johnbelamaric, thank you, but I can't participate :) I believe you will cover me!

@Shmillerov Shmillerov changed the title numsockets plugin multisocket plugin Nov 6, 2024
Shmillerov and others added 12 commits November 6, 2024 14:54
Signed-off-by: Viktor Rodionov <33463837+Shmillerov@users.noreply.github.com>
- update doc

Signed-off-by: Viktor Rodionov <33463837+Shmillerov@users.noreply.github.com>
Signed-off-by: Viktor Rodionov <33463837+Shmillerov@users.noreply.github.com>
Signed-off-by: Viktor Rodionov <33463837+Shmillerov@users.noreply.github.com>
Signed-off-by: Viktor Rodionov <33463837+Shmillerov@users.noreply.github.com>
Signed-off-by: Viktor Rodionov <33463837+Shmillerov@users.noreply.github.com>
Signed-off-by: Viktor Rodionov <33463837+Shmillerov@users.noreply.github.com>
Signed-off-by: Viktor Rodionov <33463837+Shmillerov@users.noreply.github.com>
Signed-off-by: Viktor Rodionov <33463837+Shmillerov@users.noreply.github.com>
Signed-off-by: Viktor Rodionov <33463837+Shmillerov@users.noreply.github.com>
- default as GOMAXPROCS
- update README

Signed-off-by: Viktor Rodionov <33463837+Shmillerov@users.noreply.github.com>
Signed-off-by: Viktor Rodionov <33463837+Shmillerov@users.noreply.github.com>
@johnbelamaric
Copy link
Member

@Shmillerov just FYI since I don't think you are on the CoreDNS Slack. We are hoping to do the following in the next few days:

  • merge low risk PRs and cut a patch release
  • merge this (and maybe others?) and cut a minor release

Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not merge until we cut a patch release with low-risk changes first.

@johnbelamaric johnbelamaric requested a review from SuperQ November 6, 2024 17:21
@johnbelamaric johnbelamaric dismissed SuperQ’s stale review November 13, 2024 17:38

Already resolved

@johnbelamaric johnbelamaric merged commit 6c39f4b into coredns:master Nov 13, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants