Skip to content

refactor: Add nightly test stability workflow (aka The Gauntlet) #343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Mar 19, 2022
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
19cf57b
refactor: Add nightly test stability workflow
bryphe-coder Feb 21, 2022
18dae0e
Fix branch name
bryphe-coder Feb 21, 2022
41fca23
Temporarily switch branch
bryphe-coder Feb 21, 2022
a6d3592
Add TODO comment
bryphe-coder Feb 21, 2022
75b8df4
Merge branch 'main' into bryphe/refactor/add-stability-workflow
bryphe-coder Feb 23, 2022
4afef61
Add os name to stability job
bryphe-coder Feb 23, 2022
5b867a1
Add input parameter for iterationCount
bryphe-coder Feb 23, 2022
33be0c8
Fix copy-paste error
bryphe-coder Feb 23, 2022
a0c1800
Add default, since inputs only get populated on workflow dispatches
bryphe-coder Feb 23, 2022
e5c4e18
Add multiple parallel instances
bryphe-coder Feb 23, 2022
34a1049
Bump up test timeout
bryphe-coder Feb 23, 2022
cf05b46
Try default of 10 to avoid limit on simultaneously alive goroutines
bryphe-coder Feb 23, 2022
636c911
Increase timeout for postgres tests
bryphe-coder Feb 23, 2022
93b1ea4
Merge branch 'main' into bryphe/refactor/add-stability-workflow
bryphe-coder Feb 23, 2022
14345df
Remove unnecessary stability prefix, because we send up the workflow …
bryphe-coder Feb 23, 2022
5890ff1
Merge branch 'main' into bryphe/refactor/add-stability-workflow
kylecarbs Mar 15, 2022
ae56344
Merge branch 'main' into bryphe/refactor/add-stability-workflow
bryphe-coder Mar 17, 2022
68f6ad1
Update .github/workflows/coder-test-stability.yaml
Mar 19, 2022
0329a19
Merge branch 'main' into bryphe/refactor/add-stability-workflow
Mar 19, 2022
3dd10a4
Change branch back to main
Mar 19, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions .github/workflows/coder-test-stability.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# This workflow (aka The Gauntlet) is a high-iteration run of our tests,
# used to evaluate stability and shake out intermittent failures.
name: coder-test-stability
on:
schedule:
# Run everyday around midnight Central.
- cron: "0 6 * * *"

pull_request:
branches:
- main
paths:
- .github/workflows/coder-test-stability.yaml
workflow_dispatch:
inputs:
iterationCount:
description: 'Iteration Count'
required: false
default: '10'

# Cancel in-progress runs for pull requests when developers push
# additional changes, and serialize builds in branches.
# https://docs.github.com/en/actions/using-jobs/using-concurrency#example-using-concurrency-to-cancel-any-in-progress-job-or-run
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.event_name == 'pull_request' }}

jobs:
coder-test-stability:
name: "test/go/stability/${{ matrix.os }}/${{ matrix.instance }}"
runs-on: ${{ matrix.os }}
strategy:
matrix:
os:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to consider adding another variable for a run number, so you can run multiple in parallel (e.g. instance: [1, 2] would let you run 2x each base OS, for a total of 6 runs happening in parallel). since this is happening in the background, the main cost is monetary, and running more instances in parallel will help surface flakes sooner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can try it out. Since this is a nightly job, the performance isn't as important as for a PR gate - so it might be actually be OK to run them serially. Trying it out in: e5c4e18

- ubuntu-latest
- macos-latest
- windows-2022
instance:
- 1
- 2
steps:
- uses: actions/checkout@v2

- uses: actions/setup-go@v2
with:
go-version: "^1.17"

- uses: actions/cache@v2
with:
# Go mod cache, Linux build cache, Mac build cache, Windows build cache
path: |
~/go/pkg/mod
~/.cache/go-build
~/Library/Caches/go-build
%LocalAppData%\go-build
key: ${{ matrix.os }}-go-${{ hashFiles('**/go.sum') }}
restore-keys: |
${{ matrix.os }}-go-

- run: go install gotest.tools/gotestsum@latest

- uses: hashicorp/setup-terraform@v1
with:
terraform_version: 1.1.2
terraform_wrapper: false

- name: Test with Mock Database
shell: bash
env:
GOCOUNT: ${{ github.event.inputs.iterationCount || 10 }}
GOMAXPROCS: ${{ runner.os == 'Windows' && 1 || 2 }}
run: gotestsum --junitfile="gotests.xml" --packages="./..." --
-covermode=atomic -coverprofile="gotests.coverage"
-timeout=15m -count=$GOCOUNT -race -short -failfast

- name: Upload DataDog Trace
if: (success() || failure()) && github.actor != 'dependabot[bot]'
env:
DATADOG_API_KEY: ${{ secrets.DATADOG_API_KEY }}
DD_DATABASE: fake
GIT_COMMIT_MESSAGE: ${{ github.event.head_commit.message }}
run: go run scripts/datadog-cireport/main.go gotests.xml

- name: Test with PostgreSQL Database
if: runner.os == 'Linux'
env:
GOCOUNT: ${{ github.event.inputs.iterationCount || 10 }}
run: DB=true gotestsum --junitfile="gotests.xml" --packages="./..." --
-covermode=atomic -coverprofile="gotests.coverage" -timeout=30m
-count=$GOCOUNT -race -parallel=2 -failfast

- name: Upload DataDog Trace
if: (success() || failure()) && github.actor != 'dependabot[bot]' && runner.os == 'Linux'
env:
DATADOG_API_KEY: ${{ secrets.DATADOG_API_KEY }}
DD_DATABASE: postgresql
GIT_COMMIT_MESSAGE: ${{ github.event.head_commit.message }}
run: go run scripts/datadog-cireport/main.go gotests.xml