Skip to content

Commit 0244d74

Browse files
Merge remote-tracking branch 'origin/main' into yevhenii/510-reconciliation-loop-v2
2 parents e354956 + ae44ecf commit 0244d74

File tree

259 files changed

+12267
-1759
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

259 files changed

+12267
-1759
lines changed

.cursorrules

+122
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Cursor Rules
2+
3+
This project is called "Coder" - an application for managing remote development environments.
4+
5+
Coder provides a platform for creating, managing, and using remote development environments (also known as Cloud Development Environments or CDEs). It leverages Terraform to define and provision these environments, which are referred to as "workspaces" within the project. The system is designed to be extensible, secure, and provide developers with a seamless remote development experience.
6+
7+
# Core Architecture
8+
9+
The heart of Coder is a control plane that orchestrates the creation and management of workspaces. This control plane interacts with separate Provisioner processes over gRPC to handle workspace builds. The Provisioners consume workspace definitions and use Terraform to create the actual infrastructure.
10+
11+
The CLI package serves dual purposes - it can be used to launch the control plane itself and also provides client functionality for users to interact with an existing control plane instance. All user-facing frontend code is developed in TypeScript using React and lives in the `site/` directory.
12+
13+
The database layer uses PostgreSQL with SQLC for generating type-safe database code. Database migrations are carefully managed to ensure both forward and backward compatibility through paired `.up.sql` and `.down.sql` files.
14+
15+
# API Design
16+
17+
Coder's API architecture combines REST and gRPC approaches. The REST API is defined in `coderd/coderd.go` and uses Chi for HTTP routing. This provides the primary interface for the frontend and external integrations.
18+
19+
Internal communication with Provisioners occurs over gRPC, with service definitions maintained in `.proto` files. This separation allows for efficient binary communication with the components responsible for infrastructure management while providing a standard REST interface for human-facing applications.
20+
21+
# Network Architecture
22+
23+
Coder implements a secure networking layer based on Tailscale's Wireguard implementation. The `tailnet` package provides connectivity between workspace agents and clients through DERP (Designated Encrypted Relay for Packets) servers when direct connections aren't possible. This creates a secure overlay network allowing access to workspaces regardless of network topology, firewalls, or NAT configurations.
24+
25+
## Tailnet and DERP System
26+
27+
The networking system has three key components:
28+
29+
1. **Tailnet**: An overlay network implemented in the `tailnet` package that provides secure, end-to-end encrypted connections between clients, the Coder server, and workspace agents.
30+
31+
2. **DERP Servers**: These relay traffic when direct connections aren't possible. Coder provides several options:
32+
- A built-in DERP server that runs on the Coder control plane
33+
- Integration with Tailscale's global DERP infrastructure
34+
- Support for custom DERP servers for lower latency or offline deployments
35+
36+
3. **Direct Connections**: When possible, the system establishes peer-to-peer connections between clients and workspaces using STUN for NAT traversal. This requires both endpoints to send UDP traffic on ephemeral ports.
37+
38+
## Workspace Proxies
39+
40+
Workspace proxies (in the Enterprise edition) provide regional relay points for browser-based connections, reducing latency for geo-distributed teams. Key characteristics:
41+
42+
- Deployed as independent servers that authenticate with the Coder control plane
43+
- Relay connections for SSH, workspace apps, port forwarding, and web terminals
44+
- Do not make direct database connections
45+
- Managed through the `coder wsproxy` commands
46+
- Implemented primarily in the `enterprise/wsproxy/` package
47+
48+
# Agent System
49+
50+
The workspace agent runs within each provisioned workspace and provides core functionality including:
51+
- SSH access to workspaces via the `agentssh` package
52+
- Port forwarding
53+
- Terminal connectivity via the `pty` package for pseudo-terminal support
54+
- Application serving
55+
- Healthcheck monitoring
56+
- Resource usage reporting
57+
58+
Agents communicate with the control plane using the tailnet system and authenticate using secure tokens.
59+
60+
# Workspace Applications
61+
62+
Workspace applications (or "apps") provide browser-based access to services running within workspaces. The system supports:
63+
64+
- HTTP(S) and WebSocket connections
65+
- Path-based or subdomain-based access URLs
66+
- Health checks to monitor application availability
67+
- Different sharing levels (owner-only, authenticated users, or public)
68+
- Custom icons and display settings
69+
70+
The implementation is primarily in the `coderd/workspaceapps/` directory with components for URL generation, proxying connections, and managing application state.
71+
72+
# Implementation Details
73+
74+
The project structure separates frontend and backend concerns. React components and pages are organized in the `site/src/` directory, with Jest used for testing. The backend is primarily written in Go, with a strong emphasis on error handling patterns and test coverage.
75+
76+
Database interactions are carefully managed through migrations in `coderd/database/migrations/` and queries in `coderd/database/queries/`. All new queries require proper database authorization (dbauthz) implementation to ensure that only users with appropriate permissions can access specific resources.
77+
78+
# Authorization System
79+
80+
The database authorization (dbauthz) system enforces fine-grained access control across all database operations. It uses role-based access control (RBAC) to validate user permissions before executing database operations. The `dbauthz` package wraps the database store and performs authorization checks before returning data. All database operations must pass through this layer to ensure security.
81+
82+
# Testing Framework
83+
84+
The codebase has a comprehensive testing approach with several key components:
85+
86+
1. **Parallel Testing**: All tests must use `t.Parallel()` to run concurrently, which improves test suite performance and helps identify race conditions.
87+
88+
2. **coderdtest Package**: This package in `coderd/coderdtest/` provides utilities for creating test instances of the Coder server, setting up test users and workspaces, and mocking external components.
89+
90+
3. **Integration Tests**: Tests often span multiple components to verify system behavior, such as template creation, workspace provisioning, and agent connectivity.
91+
92+
4. **Enterprise Testing**: Enterprise features have dedicated test utilities in the `coderdenttest` package.
93+
94+
# Open Source and Enterprise Components
95+
96+
The repository contains both open source and enterprise components:
97+
98+
- Enterprise code lives primarily in the `enterprise/` directory
99+
- Enterprise features focus on governance, scalability (high availability), and advanced deployment options like workspace proxies
100+
- The boundary between open source and enterprise is managed through a licensing system
101+
- The same core codebase supports both editions, with enterprise features conditionally enabled
102+
103+
# Development Philosophy
104+
105+
Coder emphasizes clear error handling, with specific patterns required:
106+
- Concise error messages that avoid phrases like "failed to"
107+
- Wrapping errors with `%w` to maintain error chains
108+
- Using sentinel errors with the "err" prefix (e.g., `errNotFound`)
109+
110+
All tests should run in parallel using `t.Parallel()` to ensure efficient testing and expose potential race conditions. The codebase is rigorously linted with golangci-lint to maintain consistent code quality.
111+
112+
Git contributions follow a standard format with commit messages structured as `type: <message>`, where type is one of `feat`, `fix`, or `chore`.
113+
114+
# Development Workflow
115+
116+
Development can be initiated using `scripts/develop.sh` to start the application after making changes. Database schema updates should be performed through the migration system using `create_migration.sh <name>` to generate migration files, with each `.up.sql` migration paired with a corresponding `.down.sql` that properly reverts all changes.
117+
118+
If the development database gets into a bad state, it can be completely reset by removing the PostgreSQL data directory with `rm -rf .coderv2/postgres`. This will destroy all data in the development database, requiring you to recreate any test users, templates, or workspaces after restarting the application.
119+
120+
Code generation for the database layer uses `coderd/database/generate.sh`, and developers should refer to `sqlc.yaml` for the appropriate style and patterns to follow when creating new queries or tables.
121+
122+
The focus should always be on maintaining security through proper database authorization, clean error handling, and comprehensive test coverage to ensure the platform remains robust and reliable.

.github/.linkspector.yml

+1
Original file line numberDiff line numberDiff line change
@@ -23,5 +23,6 @@ ignorePatterns:
2323
- pattern: "wiki.ubuntu.com"
2424
- pattern: "mutagen.io"
2525
- pattern: "docs.github.com"
26+
- pattern: "claude.ai"
2627
aliveStatusCodes:
2728
- 200

.github/workflows/ci.yaml

+7-4
Original file line numberDiff line numberDiff line change
@@ -252,7 +252,7 @@ jobs:
252252
run: |
253253
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.30
254254
go install storj.io/drpc/cmd/protoc-gen-go-drpc@v0.0.34
255-
go install golang.org/x/tools/cmd/goimports@latest
255+
go install golang.org/x/tools/cmd/goimports@v0.31.0
256256
go install github.com/mikefarah/yq/v4@v4.44.3
257257
go install go.uber.org/mock/mockgen@v0.5.0
258258
@@ -299,6 +299,9 @@ jobs:
299299
- name: Setup Node
300300
uses: ./.github/actions/setup-node
301301

302+
- name: Check Go version
303+
run: IGNORE_NIX=true ./scripts/check_go_versions.sh
304+
302305
# Use default Go version
303306
- name: Setup Go
304307
uses: ./.github/actions/setup-go
@@ -674,8 +677,8 @@ jobs:
674677
variant:
675678
- premium: false
676679
name: test-e2e
677-
- premium: true
678-
name: test-e2e-premium
680+
#- premium: true
681+
# name: test-e2e-premium
679682
# Skip test-e2e on forks as they don't have access to CI secrets
680683
if: (needs.changes.outputs.go == 'true' || needs.changes.outputs.ts == 'true' || needs.changes.outputs.ci == 'true' || github.ref == 'refs/heads/main') && !(github.event.pull_request.head.repo.fork)
681684
timeout-minutes: 20
@@ -860,7 +863,7 @@ jobs:
860863
run: |
861864
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.30
862865
go install storj.io/drpc/cmd/protoc-gen-go-drpc@v0.0.34
863-
go install golang.org/x/tools/cmd/goimports@latest
866+
go install golang.org/x/tools/cmd/goimports@v0.31.0
864867
go install github.com/mikefarah/yq/v4@v4.44.3
865868
go install go.uber.org/mock/mockgen@v0.5.0
866869

.golangci.yaml

+1-2
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,7 @@ linters-settings:
164164
- name: unnecessary-stmt
165165
- name: unreachable-code
166166
- name: unused-parameter
167+
exclude: "**/*_test.go"
167168
- name: unused-receiver
168169
- name: var-declaration
169170
- name: var-naming
@@ -195,8 +196,6 @@ issues:
195196
- errcheck
196197
- forcetypeassert
197198
- exhaustruct # This is unhelpful in tests.
198-
- revive # TODO(JonA): disabling in order to update golangci-lint
199-
- gosec # TODO(JonA): disabling in order to update golangci-lint
200199
- path: scripts/*
201200
linters:
202201
- exhaustruct

.vscode/settings.json

+4-1
Original file line numberDiff line numberDiff line change
@@ -57,5 +57,8 @@
5757
"[css][html][markdown][yaml]": {
5858
"editor.defaultFormatter": "esbenp.prettier-vscode"
5959
},
60-
"typos.config": ".github/workflows/typos.toml"
60+
"typos.config": ".github/workflows/typos.toml",
61+
"[markdown]": {
62+
"editor.defaultFormatter": "DavidAnson.vscode-markdownlint"
63+
}
6164
}

agent/agent.go

+14-7
Original file line numberDiff line numberDiff line change
@@ -1773,15 +1773,22 @@ func (a *agent) Close() error {
17731773
a.setLifecycle(codersdk.WorkspaceAgentLifecycleShuttingDown)
17741774

17751775
// Attempt to gracefully shut down all active SSH connections and
1776-
// stop accepting new ones.
1777-
err := a.sshServer.Shutdown(a.hardCtx)
1776+
// stop accepting new ones. If all processes have not exited after 5
1777+
// seconds, we just log it and move on as it's more important to run
1778+
// the shutdown scripts. A typical shutdown time for containers is
1779+
// 10 seconds, so this still leaves a bit of time to run the
1780+
// shutdown scripts in the worst-case.
1781+
sshShutdownCtx, sshShutdownCancel := context.WithTimeout(a.hardCtx, 5*time.Second)
1782+
defer sshShutdownCancel()
1783+
err := a.sshServer.Shutdown(sshShutdownCtx)
17781784
if err != nil {
1779-
a.logger.Error(a.hardCtx, "ssh server shutdown", slog.Error(err))
1780-
}
1781-
err = a.sshServer.Close()
1782-
if err != nil {
1783-
a.logger.Error(a.hardCtx, "ssh server close", slog.Error(err))
1785+
if errors.Is(err, context.DeadlineExceeded) {
1786+
a.logger.Warn(sshShutdownCtx, "ssh server shutdown timeout", slog.Error(err))
1787+
} else {
1788+
a.logger.Error(sshShutdownCtx, "ssh server shutdown", slog.Error(err))
1789+
}
17841790
}
1791+
17851792
// wait for SSH to shut down before the general graceful cancel, because
17861793
// this triggers a disconnect in the tailnet layer, telling all clients to
17871794
// shut down their wireguard tunnels to us. If SSH sessions are still up,

agent/agentssh/agentssh.go

+53-13
Original file line numberDiff line numberDiff line change
@@ -582,6 +582,12 @@ func (s *Server) sessionStart(logger slog.Logger, session ssh.Session, env []str
582582
func (s *Server) startNonPTYSession(logger slog.Logger, session ssh.Session, magicTypeLabel string, cmd *exec.Cmd) error {
583583
s.metrics.sessionsTotal.WithLabelValues(magicTypeLabel, "no").Add(1)
584584

585+
// Create a process group and send SIGHUP to child processes,
586+
// otherwise context cancellation will not propagate properly
587+
// and SSH server close may be delayed.
588+
cmd.SysProcAttr = cmdSysProcAttr()
589+
cmd.Cancel = cmdCancel(session.Context(), logger, cmd)
590+
585591
cmd.Stdout = session
586592
cmd.Stderr = session.Stderr()
587593
// This blocks forever until stdin is received if we don't
@@ -926,7 +932,12 @@ func (s *Server) CreateCommand(ctx context.Context, script string, env []string,
926932
// Serve starts the server to handle incoming connections on the provided listener.
927933
// It returns an error if no host keys are set or if there is an issue accepting connections.
928934
func (s *Server) Serve(l net.Listener) (retErr error) {
929-
if len(s.srv.HostSigners) == 0 {
935+
// Ensure we're not mutating HostSigners as we're reading it.
936+
s.mu.RLock()
937+
noHostKeys := len(s.srv.HostSigners) == 0
938+
s.mu.RUnlock()
939+
940+
if noHostKeys {
930941
return xerrors.New("no host keys set")
931942
}
932943

@@ -1054,43 +1065,72 @@ func (s *Server) Close() error {
10541065
}
10551066
s.closing = make(chan struct{})
10561067

1068+
ctx := context.Background()
1069+
1070+
s.logger.Debug(ctx, "closing server")
1071+
1072+
// Stop accepting new connections.
1073+
s.logger.Debug(ctx, "closing all active listeners", slog.F("count", len(s.listeners)))
1074+
for l := range s.listeners {
1075+
_ = l.Close()
1076+
}
1077+
10571078
// Close all active sessions to gracefully
10581079
// terminate client connections.
1080+
s.logger.Debug(ctx, "closing all active sessions", slog.F("count", len(s.sessions)))
10591081
for ss := range s.sessions {
10601082
// We call Close on the underlying channel here because we don't
10611083
// want to send an exit status to the client (via Exit()).
10621084
// Typically OpenSSH clients will return 255 as the exit status.
10631085
_ = ss.Close()
10641086
}
1065-
1066-
// Close all active listeners and connections.
1067-
for l := range s.listeners {
1068-
_ = l.Close()
1069-
}
1087+
s.logger.Debug(ctx, "closing all active connections", slog.F("count", len(s.conns)))
10701088
for c := range s.conns {
10711089
_ = c.Close()
10721090
}
10731091

1074-
// Close the underlying SSH server.
1092+
s.logger.Debug(ctx, "closing SSH server")
10751093
err := s.srv.Close()
10761094

10771095
s.mu.Unlock()
1096+
1097+
s.logger.Debug(ctx, "waiting for all goroutines to exit")
10781098
s.wg.Wait() // Wait for all goroutines to exit.
10791099

10801100
s.mu.Lock()
10811101
close(s.closing)
10821102
s.closing = nil
10831103
s.mu.Unlock()
10841104

1105+
s.logger.Debug(ctx, "closing server done")
1106+
10851107
return err
10861108
}
10871109

1088-
// Shutdown gracefully closes all active SSH connections and stops
1089-
// accepting new connections.
1090-
//
1091-
// Shutdown is not implemented.
1092-
func (*Server) Shutdown(_ context.Context) error {
1093-
// TODO(mafredri): Implement shutdown, SIGHUP running commands, etc.
1110+
// Shutdown stops accepting new connections. The current implementation
1111+
// calls Close() for simplicity instead of waiting for existing
1112+
// connections to close. If the context times out, Shutdown will return
1113+
// but Close() may not have completed.
1114+
func (s *Server) Shutdown(ctx context.Context) error {
1115+
ch := make(chan error, 1)
1116+
go func() {
1117+
// TODO(mafredri): Implement shutdown, SIGHUP running commands, etc.
1118+
// For now we just close the server.
1119+
ch <- s.Close()
1120+
}()
1121+
var err error
1122+
select {
1123+
case <-ctx.Done():
1124+
err = ctx.Err()
1125+
case err = <-ch:
1126+
}
1127+
// Re-check for context cancellation precedence.
1128+
if ctx.Err() != nil {
1129+
err = ctx.Err()
1130+
}
1131+
if err != nil {
1132+
return xerrors.Errorf("close server: %w", err)
1133+
}
10941134
return nil
10951135
}
10961136

0 commit comments

Comments
 (0)