Skip to content

feat: Add high availability for multiple replicas #4555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 86 commits into from
Oct 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
35b2fed
feat: HA tailnet coordinator
coadler Sep 22, 2022
68a812b
fixup! feat: HA tailnet coordinator
coadler Sep 23, 2022
774c5da
fixup! feat: HA tailnet coordinator
coadler Sep 23, 2022
bd82c5e
remove printlns
coadler Sep 23, 2022
02e079d
Merge branch 'main' into colin/pg-coordinate
coadler Oct 7, 2022
fbad8d0
close all connections on coordinator
coadler Oct 7, 2022
46803aa
impelement high availability feature
coadler Oct 7, 2022
d38391e
fixup! impelement high availability feature
coadler Oct 7, 2022
a0bcd64
fixup! impelement high availability feature
coadler Oct 7, 2022
1f33018
fixup! impelement high availability feature
coadler Oct 7, 2022
b6a5070
fixup! impelement high availability feature
coadler Oct 7, 2022
1883430
Add replicas
kylecarbs Oct 12, 2022
7dc968c
Add DERP meshing to arbitrary addresses
kylecarbs Oct 12, 2022
1dcf0d0
Move packages to highavailability folder
kylecarbs Oct 12, 2022
5c43d63
Merge branch 'main' into colin/pg-coordinate
kylecarbs Oct 12, 2022
4804269
Merge branch 'colin/pg-coordinate' into replica
kylecarbs Oct 12, 2022
289e139
Move coordinator to high availability package
kylecarbs Oct 12, 2022
585bc1d
Add flags for HA
kylecarbs Oct 12, 2022
fdb3557
Rename to replicasync
kylecarbs Oct 13, 2022
9124b00
Denest packages for replicas
kylecarbs Oct 13, 2022
d5555f6
Add test for multiple replicas
kylecarbs Oct 13, 2022
8dfc261
Fix coordination test
kylecarbs Oct 13, 2022
ff5968b
Add HA to the helm chart
kylecarbs Oct 13, 2022
557b390
Rename function pointer
kylecarbs Oct 13, 2022
186a5e2
Add warnings for HA
kylecarbs Oct 13, 2022
de5b13b
Add the ability to block endpoints
kylecarbs Oct 13, 2022
9a50ac4
Add flag to disable P2P connections
kylecarbs Oct 14, 2022
6fa941f
Wow, I made the tests pass
kylecarbs Oct 14, 2022
abff96b
Add replicas endpoint
kylecarbs Oct 14, 2022
d6ce216
Ensure close kills replica
kylecarbs Oct 14, 2022
c3786a5
Merge branch 'main' into replica
kylecarbs Oct 14, 2022
d7cc0ff
Update sql
kylecarbs Oct 14, 2022
9914840
Add database latency to high availability
kylecarbs Oct 15, 2022
c1aa3d2
Pipe TLS to DERP mesh
kylecarbs Oct 15, 2022
0cc4263
Fix DERP mesh with TLS
kylecarbs Oct 15, 2022
f9177e4
Add tests for TLS
kylecarbs Oct 15, 2022
ee59d88
Fix replica sync TLS
kylecarbs Oct 15, 2022
8641e58
Fix RootCA for replica meshing
kylecarbs Oct 15, 2022
3dfb796
Remove ID from replicasync
kylecarbs Oct 15, 2022
ec2c1f1
Fix getting certificates for meshing
kylecarbs Oct 15, 2022
590f0f8
Remove excessive locking
kylecarbs Oct 15, 2022
d8580d1
Fix linting
kylecarbs Oct 15, 2022
ae956fb
Store mesh key in the database
kylecarbs Oct 15, 2022
d703e2d
Fix replica key for tests
kylecarbs Oct 15, 2022
9bb021c
Fix types gen
kylecarbs Oct 15, 2022
76c9e2c
Fix unlocking unlocked
kylecarbs Oct 15, 2022
09e87b0
Fix race in tests
kylecarbs Oct 15, 2022
18c0464
Update enterprise/derpmesh/derpmesh.go
kylecarbs Oct 15, 2022
6f25b2d
Rename to syncReplicas
kylecarbs Oct 15, 2022
efb6ece
Merge branch 'replica' of github.com:coder/coder into replica
kylecarbs Oct 15, 2022
1e85039
Reuse http client
kylecarbs Oct 15, 2022
ae0aa5f
Delete old replicas on a CRON
kylecarbs Oct 15, 2022
332d435
Merge branch 'main' into replica
kylecarbs Oct 15, 2022
bd7fb13
Fix race condition in connection tests
kylecarbs Oct 15, 2022
bb5b347
Fix linting
kylecarbs Oct 15, 2022
76e0511
Fix nil type
kylecarbs Oct 15, 2022
1ff5f7d
Move pubsub to in-memory for twenty test
kylecarbs Oct 16, 2022
b732184
Add comment for configuration tweaking
kylecarbs Oct 16, 2022
38465ac
Fix leak with transport
kylecarbs Oct 16, 2022
72555e2
Fix close leak in derpmesh
kylecarbs Oct 16, 2022
e54072a
Fix race when creating server
kylecarbs Oct 16, 2022
27d5f40
Remove handler update
kylecarbs Oct 16, 2022
4d0b1d8
Skip test on Windows
kylecarbs Oct 16, 2022
129f5ba
Fix DERP mesh test
kylecarbs Oct 16, 2022
4e5d30e
Wrap HTTP handler replacement in mutex
kylecarbs Oct 16, 2022
0359a7e
Fix error message for relay
kylecarbs Oct 16, 2022
f364d1f
Fix API handler for normal tests
kylecarbs Oct 16, 2022
423a47e
Fix speedtest
kylecarbs Oct 16, 2022
c3a77fe
Fix replica resend
kylecarbs Oct 16, 2022
729f8a0
Fix derpmesh send
kylecarbs Oct 16, 2022
ae0bc5d
Ping async
kylecarbs Oct 16, 2022
d7d50db
Increase wait time of template version jobd
kylecarbs Oct 16, 2022
77d23dc
Fix race when closing replica sync
kylecarbs Oct 16, 2022
435bbbb
Add name to client
kylecarbs Oct 16, 2022
9b7c41a
Log the derpmap being used
kylecarbs Oct 17, 2022
9615402
Don't connect if DERP is empty
kylecarbs Oct 17, 2022
bcb97ac
Improve agent coordinator logging
kylecarbs Oct 17, 2022
e2f6a19
Fix lock in coordinator
kylecarbs Oct 17, 2022
c855c9b
Fix relay addr
kylecarbs Oct 17, 2022
a0e5cab
Fix race when updating durations
kylecarbs Oct 17, 2022
9878fc5
Fix client publish race
kylecarbs Oct 17, 2022
7a40bf8
Run pubsub loop in a queue
kylecarbs Oct 17, 2022
08b9681
Store agent nodes in order
kylecarbs Oct 17, 2022
79991a9
Fix coordinator locking
kylecarbs Oct 17, 2022
020171b
Check for closed pipe
kylecarbs Oct 17, 2022
6a57554
Merge branch 'main' into replica
kylecarbs Oct 17, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
"derphttp",
"derpmap",
"devel",
"dflags",
"drpc",
"drpcconn",
"drpcmux",
Expand Down Expand Up @@ -86,8 +87,10 @@
"ptytest",
"quickstart",
"reconfig",
"replicasync",
"retrier",
"rpty",
"SCIM",
"sdkproto",
"sdktrace",
"Signup",
Expand Down
1 change: 1 addition & 0 deletions agent/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ func (a *agent) runTailnet(ctx context.Context, derpMap *tailcfg.DERPMap) {
if a.isClosed() {
return
}
a.logger.Debug(ctx, "running tailnet with derpmap", slog.F("derpmap", derpMap))
if a.network != nil {
a.network.SetDERPMap(derpMap)
return
Expand Down
6 changes: 2 additions & 4 deletions agent/agent_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -465,7 +465,7 @@ func TestAgent(t *testing.T) {

conn, _ := setupAgent(t, codersdk.WorkspaceAgentMetadata{}, 0)
require.Eventually(t, func() bool {
_, err := conn.Ping()
_, err := conn.Ping(context.Background())
return err == nil
}, testutil.WaitMedium, testutil.IntervalFast)
conn1, err := conn.DialContext(context.Background(), l.Addr().Network(), l.Addr().String())
Expand All @@ -483,9 +483,7 @@ func TestAgent(t *testing.T) {

t.Run("Speedtest", func(t *testing.T) {
t.Parallel()
if testing.Short() {
t.Skip("The minimum duration for a speedtest is hardcoded in Tailscale to 5s!")
}
t.Skip("This test is relatively flakey because of Tailscale's speedtest code...")
derpMap := tailnettest.RunDERPAndSTUN(t)
conn, _ := setupAgent(t, codersdk.WorkspaceAgentMetadata{
DERPMap: derpMap,
Expand Down
14 changes: 6 additions & 8 deletions cli/agent_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ import (
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"

"cdr.dev/slog"

"github.com/coder/coder/cli/clitest"
"github.com/coder/coder/coderd/coderdtest"
"github.com/coder/coder/provisioner/echo"
Expand Down Expand Up @@ -67,11 +65,11 @@ func TestWorkspaceAgent(t *testing.T) {
if assert.NotEmpty(t, workspace.LatestBuild.Resources) && assert.NotEmpty(t, resources[0].Agents) {
assert.NotEmpty(t, resources[0].Agents[0].Version)
}
dialer, err := client.DialWorkspaceAgentTailnet(ctx, slog.Logger{}, resources[0].Agents[0].ID)
dialer, err := client.DialWorkspaceAgent(ctx, resources[0].Agents[0].ID, nil)
require.NoError(t, err)
defer dialer.Close()
require.Eventually(t, func() bool {
_, err := dialer.Ping()
_, err := dialer.Ping(ctx)
return err == nil
}, testutil.WaitMedium, testutil.IntervalFast)
cancelFunc()
Expand Down Expand Up @@ -128,11 +126,11 @@ func TestWorkspaceAgent(t *testing.T) {
if assert.NotEmpty(t, resources) && assert.NotEmpty(t, resources[0].Agents) {
assert.NotEmpty(t, resources[0].Agents[0].Version)
}
dialer, err := client.DialWorkspaceAgentTailnet(ctx, slog.Logger{}, resources[0].Agents[0].ID)
dialer, err := client.DialWorkspaceAgent(ctx, resources[0].Agents[0].ID, nil)
require.NoError(t, err)
defer dialer.Close()
require.Eventually(t, func() bool {
_, err := dialer.Ping()
_, err := dialer.Ping(ctx)
return err == nil
}, testutil.WaitMedium, testutil.IntervalFast)
cancelFunc()
Expand Down Expand Up @@ -189,11 +187,11 @@ func TestWorkspaceAgent(t *testing.T) {
if assert.NotEmpty(t, resources) && assert.NotEmpty(t, resources[0].Agents) {
assert.NotEmpty(t, resources[0].Agents[0].Version)
}
dialer, err := client.DialWorkspaceAgentTailnet(ctx, slog.Logger{}, resources[0].Agents[0].ID)
dialer, err := client.DialWorkspaceAgent(ctx, resources[0].Agents[0].ID, nil)
require.NoError(t, err)
defer dialer.Close()
require.Eventually(t, func() bool {
_, err := dialer.Ping()
_, err := dialer.Ping(ctx)
return err == nil
}, testutil.WaitMedium, testutil.IntervalFast)
cancelFunc()
Expand Down
5 changes: 5 additions & 0 deletions cli/config/file.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ func (r Root) Session() File {
return File(filepath.Join(string(r), "session"))
}

// ReplicaID is a unique identifier for the Coder server.
func (r Root) ReplicaID() File {
return File(filepath.Join(string(r), "replica_id"))
}

func (r Root) URL() File {
return File(filepath.Join(string(r), "url"))
}
Expand Down
3 changes: 1 addition & 2 deletions cli/configssh_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ import (
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"

"cdr.dev/slog"
"cdr.dev/slog/sloggers/slogtest"

"github.com/coder/coder/agent"
Expand Down Expand Up @@ -115,7 +114,7 @@ func TestConfigSSH(t *testing.T) {
_ = agentCloser.Close()
}()
resources := coderdtest.AwaitWorkspaceAgents(t, client, workspace.ID)
agentConn, err := client.DialWorkspaceAgentTailnet(context.Background(), slog.Logger{}, resources[0].Agents[0].ID)
agentConn, err := client.DialWorkspaceAgent(context.Background(), resources[0].Agents[0].ID, nil)
require.NoError(t, err)
defer agentConn.Close()

Expand Down
7 changes: 7 additions & 0 deletions cli/deployment/flags.go
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,13 @@ func Flags() *codersdk.DeploymentFlags {
Description: "Addresses for STUN servers to establish P2P connections. Set empty to disable P2P connections.",
Default: []string{"stun.l.google.com:19302"},
},
DerpServerRelayAddress: &codersdk.StringFlag{
Name: "DERP Server Relay Address",
Flag: "derp-server-relay-address",
EnvVar: "CODER_DERP_SERVER_RELAY_ADDRESS",
Description: "An HTTP address that is accessible by other replicas to relay DERP traffic. Required for high availability.",
Enterprise: true,
},
DerpConfigURL: &codersdk.StringFlag{
Name: "DERP Config URL",
Flag: "derp-config-url",
Expand Down
5 changes: 2 additions & 3 deletions cli/portforward.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ import (
"github.com/spf13/cobra"
"golang.org/x/xerrors"

"cdr.dev/slog"
"github.com/coder/coder/agent"
"github.com/coder/coder/cli/cliflag"
"github.com/coder/coder/cli/cliui"
Expand Down Expand Up @@ -96,7 +95,7 @@ func portForward() *cobra.Command {
return xerrors.Errorf("await agent: %w", err)
}

conn, err := client.DialWorkspaceAgentTailnet(ctx, slog.Logger{}, workspaceAgent.ID)
conn, err := client.DialWorkspaceAgent(ctx, workspaceAgent.ID, nil)
if err != nil {
return err
}
Expand Down Expand Up @@ -156,7 +155,7 @@ func portForward() *cobra.Command {
case <-ticker.C:
}

_, err = conn.Ping()
_, err = conn.Ping(ctx)
if err != nil {
continue
}
Expand Down
6 changes: 4 additions & 2 deletions cli/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"context"
"flag"
"fmt"
"io"
"net/http"
"net/url"
"os"
Expand Down Expand Up @@ -100,8 +101,9 @@ func Core() []*cobra.Command {
}

func AGPL() []*cobra.Command {
all := append(Core(), Server(deployment.Flags(), func(_ context.Context, o *coderd.Options) (*coderd.API, error) {
return coderd.New(o), nil
all := append(Core(), Server(deployment.Flags(), func(_ context.Context, o *coderd.Options) (*coderd.API, io.Closer, error) {
api := coderd.New(o)
return api, api, nil
}))
return all
}
Expand Down
23 changes: 16 additions & 7 deletions cli/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ import (
)

// nolint:gocyclo
func Server(dflags *codersdk.DeploymentFlags, newAPI func(context.Context, *coderd.Options) (*coderd.API, error)) *cobra.Command {
func Server(dflags *codersdk.DeploymentFlags, newAPI func(context.Context, *coderd.Options) (*coderd.API, io.Closer, error)) *cobra.Command {
root := &cobra.Command{
Use: "server",
Short: "Start a Coder server",
Expand Down Expand Up @@ -167,9 +167,10 @@ func Server(dflags *codersdk.DeploymentFlags, newAPI func(context.Context, *code
}
defer listener.Close()

var tlsConfig *tls.Config
if dflags.TLSEnable.Value {
listener, err = configureServerTLS(
listener, dflags.TLSMinVersion.Value,
tlsConfig, err = configureTLS(
dflags.TLSMinVersion.Value,
dflags.TLSClientAuth.Value,
dflags.TLSCertFiles.Value,
dflags.TLSKeyFiles.Value,
Expand All @@ -178,6 +179,7 @@ func Server(dflags *codersdk.DeploymentFlags, newAPI func(context.Context, *code
if err != nil {
return xerrors.Errorf("configure tls: %w", err)
}
listener = tls.NewListener(listener, tlsConfig)
}

tcpAddr, valid := listener.Addr().(*net.TCPAddr)
Expand Down Expand Up @@ -328,6 +330,9 @@ func Server(dflags *codersdk.DeploymentFlags, newAPI func(context.Context, *code
Experimental: ExperimentalEnabled(cmd),
DeploymentFlags: dflags,
}
if tlsConfig != nil {
options.TLSCertificates = tlsConfig.Certificates
}

if dflags.OAuth2GithubClientSecret.Value != "" {
options.GithubOAuth2Config, err = configureGithubOAuth2(accessURLParsed,
Expand Down Expand Up @@ -471,11 +476,14 @@ func Server(dflags *codersdk.DeploymentFlags, newAPI func(context.Context, *code
), dflags.PromAddress.Value, "prometheus")()
}

coderAPI, err := newAPI(ctx, options)
// We use a separate closer so the Enterprise API
// can have it's own close functions. This is cleaner
// than abstracting the Coder API itself.
coderAPI, closer, err := newAPI(ctx, options)
if err != nil {
return err
}
defer coderAPI.Close()
defer closer.Close()

client := codersdk.New(localURL)
if dflags.TLSEnable.Value {
Expand Down Expand Up @@ -893,7 +901,7 @@ func loadCertificates(tlsCertFiles, tlsKeyFiles []string) ([]tls.Certificate, er
return certs, nil
}

func configureServerTLS(listener net.Listener, tlsMinVersion, tlsClientAuth string, tlsCertFiles, tlsKeyFiles []string, tlsClientCAFile string) (net.Listener, error) {
func configureTLS(tlsMinVersion, tlsClientAuth string, tlsCertFiles, tlsKeyFiles []string, tlsClientCAFile string) (*tls.Config, error) {
tlsConfig := &tls.Config{
MinVersion: tls.VersionTLS12,
}
Expand Down Expand Up @@ -929,6 +937,7 @@ func configureServerTLS(listener net.Listener, tlsMinVersion, tlsClientAuth stri
if err != nil {
return nil, xerrors.Errorf("load certificates: %w", err)
}
tlsConfig.Certificates = certs
tlsConfig.GetCertificate = func(hi *tls.ClientHelloInfo) (*tls.Certificate, error) {
// If there's only one certificate, return it.
if len(certs) == 1 {
Expand Down Expand Up @@ -963,7 +972,7 @@ func configureServerTLS(listener net.Listener, tlsMinVersion, tlsClientAuth stri
tlsConfig.ClientCAs = caPool
}

return tls.NewListener(listener, tlsConfig), nil
return tlsConfig, nil
}

func configureGithubOAuth2(accessURL *url.URL, clientID, clientSecret string, allowSignups bool, allowOrgs []string, rawTeams []string, enterpriseBaseURL string) (*coderd.GithubOAuth2Config, error) {
Expand Down
6 changes: 4 additions & 2 deletions cli/speedtest.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,9 @@ func speedtest() *cobra.Command {
if cliflag.IsSetBool(cmd, varVerbose) {
logger = logger.Leveled(slog.LevelDebug)
}
conn, err := client.DialWorkspaceAgentTailnet(ctx, logger, workspaceAgent.ID)
conn, err := client.DialWorkspaceAgent(ctx, workspaceAgent.ID, &codersdk.DialWorkspaceAgentOptions{
Logger: logger,
})
if err != nil {
return err
}
Expand All @@ -68,7 +70,7 @@ func speedtest() *cobra.Command {
return ctx.Err()
case <-ticker.C:
}
dur, err := conn.Ping()
dur, err := conn.Ping(ctx)
if err != nil {
continue
}
Expand Down
4 changes: 1 addition & 3 deletions cli/ssh.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ import (
"golang.org/x/term"
"golang.org/x/xerrors"

"cdr.dev/slog"

"github.com/coder/coder/cli/cliflag"
"github.com/coder/coder/cli/cliui"
"github.com/coder/coder/coderd/autobuild/notify"
Expand Down Expand Up @@ -86,7 +84,7 @@ func ssh() *cobra.Command {
return xerrors.Errorf("await agent: %w", err)
}

conn, err := client.DialWorkspaceAgentTailnet(ctx, slog.Logger{}, workspaceAgent.ID)
conn, err := client.DialWorkspaceAgent(ctx, workspaceAgent.ID, nil)
if err != nil {
return err
}
Expand Down
6 changes: 4 additions & 2 deletions coderd/activitybump_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ func TestWorkspaceActivityBump(t *testing.T) {
"deadline %v never updated", firstDeadline,
)

require.WithinDuration(t, database.Now().Add(time.Hour), workspace.LatestBuild.Deadline.Time, time.Second)
require.WithinDuration(t, database.Now().Add(time.Hour), workspace.LatestBuild.Deadline.Time, 3*time.Second)
}
}

Expand All @@ -82,7 +82,9 @@ func TestWorkspaceActivityBump(t *testing.T) {
client, workspace, assertBumped := setupActivityTest(t)

resources := coderdtest.AwaitWorkspaceAgents(t, client, workspace.ID)
conn, err := client.DialWorkspaceAgentTailnet(ctx, slogtest.Make(t, nil), resources[0].Agents[0].ID)
conn, err := client.DialWorkspaceAgent(ctx, resources[0].Agents[0].ID, &codersdk.DialWorkspaceAgentOptions{
Logger: slogtest.Make(t, nil),
})
require.NoError(t, err)
defer conn.Close()

Expand Down
Loading