Skip to content

fix: Add resiliency to daemon connections #1116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 25, 2022
Merged

fix: Add resiliency to daemon connections #1116

merged 1 commit into from
Apr 25, 2022

Conversation

kylecarbs
Copy link
Member

Connections could fail when massive payloads were transmitted.
This fixes an upstream bug in dRPC where the connection would
end with a context canceled if a message was too large.

This adds retransmission of completion and failures too. If
Coder somehow loses connection with a provisioner daemon,
upon the next connection the state will be properly reported.

@kylecarbs kylecarbs requested a review from coadler April 24, 2022 23:03
@kylecarbs kylecarbs self-assigned this Apr 24, 2022
@codecov
Copy link

codecov bot commented Apr 24, 2022

Codecov Report

Merging #1116 (ebf01b5) into main (7496c3d) will decrease coverage by 0.03%.
The diff coverage is 62.18%.

@@            Coverage Diff             @@
##             main    #1116      +/-   ##
==========================================
- Coverage   66.61%   66.57%   -0.04%     
==========================================
  Files         257      257              
  Lines       16011    16082      +71     
  Branches      156      156              
==========================================
+ Hits        10665    10707      +42     
- Misses       4266     4280      +14     
- Partials     1080     1095      +15     
Flag Coverage Δ
unittest-go-macos-latest 53.61% <62.18%> (+0.04%) ⬆️
unittest-go-postgres- 65.92% <62.18%> (-0.02%) ⬇️
unittest-go-ubuntu-latest 56.16% <62.18%> (+0.16%) ⬆️
unittest-go-windows-2022 53.15% <59.66%> (+0.13%) ⬆️
unittest-js 67.28% <ø> (ø)
Impacted Files Coverage Δ
provisionersdk/transport.go 85.10% <ø> (ø)
provisionerd/provisionerd.go 77.83% <57.94%> (-3.20%) ⬇️
coderd/provisionerdaemons.go 63.29% <100.00%> (+1.89%) ⬆️
codersdk/provisionerdaemons.go 65.67% <100.00%> (+5.97%) ⬆️
coderd/database/db.go 55.17% <0.00%> (-13.80%) ⬇️
provisionersdk/serve.go 35.13% <0.00%> (-8.11%) ⬇️
peerbroker/dial.go 77.04% <0.00%> (-6.56%) ⬇️
coderd/httpapi/httpapi.go 66.25% <0.00%> (-6.25%) ⬇️
peer/conn.go 80.45% <0.00%> (+0.50%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7496c3d...ebf01b5. Read the comment docs.

// TransportPipe creates an in-memory pipe for dRPC transport.
func TransportPipe() (*yamux.Session, *yamux.Session) {
clientReader, clientWriter := io.Pipe()
serverReader, serverWriter := io.Pipe()
yamuxConfig := yamux.DefaultConfig()
yamuxConfig.LogOutput = io.Discard
yamuxConfig.LogOutput = os.Stderr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this turned on for debugging?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

t.Run("PayloadTooBig", func(t *testing.T) {
t.Parallel()
if runtime.GOOS == "windows" {
// Takes too long to allocate memory on Windows!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🪟🪟🪟

@kylecarbs kylecarbs force-pushed the bigsocket branch 3 times, most recently from 5fb7b25 to b401270 Compare April 25, 2022 01:03
Connections could fail when massive payloads were transmitted.
This fixes an upstream bug in dRPC where the connection would
end with a context canceled if a message was too large.

This adds retransmission of completion and failures too. If
Coder somehow loses connection with a provisioner daemon,
upon the next connection the state will be properly reported.
@kylecarbs kylecarbs merged commit db7ed4d into main Apr 25, 2022
@kylecarbs kylecarbs deleted the bigsocket branch April 25, 2022 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants