Skip to content

testing: race condition #74944

@ArnoSen

Description

@ArnoSen

Go version

go version go1.24.6 darwin/arm64

Output of go env in your module/workspace:

AR='ar'
CC='clang'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='clang++'
GCCGO='gccgo'
GO111MODULE=''
GOARCH='arm64'
GOARM64='v8.0'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/Users/<redacted>/Library/Caches/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/Users/<redacted>/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/s_/3pdj5n_55mn_tt4ml59h9jlh0000gp/T/go-build161577767=/tmp/go-build -gno-record-gcc-switches -fno-common'
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMOD='/Users/<redacted>/Developer/P20457-Foundation-Inventory/go.mod'
GOMODCACHE='/Users/<redacted>/go/pkg/mod'
GONOPROXY='dev.azure.com/INGCDaaS'
GONOSUMDB='dev.azure.com/INGCDaaS'
GOOS='darwin'
GOPATH='/Users/<redacted>/go'
GOPRIVATE=‘<redacted>
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/<redacted>/Library/Application Support/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.24.6'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

When I run tests, I sometimes run into a datarace.
The command I run is:
go test -coverpkg=./... -coverprofile=covertest.out -covermode=atomic -count=1 ./... -race -parallel 10 -shuffle=on

What did you see happen?

The full output of in the datarace is here: datarace.txt.

The summary is:

==================
WARNING: DATA RACE
Read at 0x00c0001031c3 by goroutine 21:
  testing.(*common).logDepth()
      /usr/local/go/src/testing/testing.go:1053 +0x8c
  testing.(*common).log()
      /usr/local/go/src/testing/testing.go:1046 +0x80
  testing.(*common).Logf()
      /usr/local/go/src/testing/testing.go:1097 +0x58
  <private repo>/pkg/test_helpers/v2.(*TestSuite).TearDown()
     <private repo>/pkg/test_helpers/v2/suite.go:349 +0xf8
  <private repo>/integrationtest/failuretest.TestIntermittentVcenterFailure.func1()
      <private repo>/integrationtest/failuretest/intermittent_vcenter_failure_test.go:44 +0x44
  testing.(*common).Cleanup.func1()
      /usr/local/go/src/testing/testing.go:1211 +0x140
  testing.(*common).runCleanup()
      /usr/local/go/src/testing/testing.go:1445 +0x1e0
  testing.tRunner.func2()
      /usr/local/go/src/testing/testing.go:1786 +0x48
  runtime.deferreturn()
      /usr/local/go/src/runtime/panic.go:610 +0x5c
  net/http/internal.(*chunkedWriter).Write()
      /usr/local/go/src/net/http/internal/chunked.go:245 +0x90
  io.copyBuffer()
      /usr/local/go/src/io/io.go:431 +0x258
  net/http/internal.(*chunkedWriter).Write()
      /usr/local/go/src/net/http/internal/chunked.go:248 +0xc0
  net/http/internal.(*chunkedWriter).Write()
      /usr/local/go/src/net/http/internal/chunked.go:245 +0x90
  (above block starting with io.copyBuffer repeats a few hundred times)

Previous write at 0x00c0001031c3 by goroutine 22:
  testing.tRunner.func1()
      /usr/local/go/src/testing/testing.go:1779 +0x5dc
  runtime.deferreturn()
      /usr/local/go/src/runtime/panic.go:610 +0x5c
  net/http/internal.(*chunkedWriter).Write()
      /usr/local/go/src/net/http/internal/chunked.go:245 +0x90
  io.copyBuffer()
      /usr/local/go/src/io/io.go:431 +0x258
  net/http/internal.(*chunkedWriter).Write()
      /usr/local/go/src/net/http/internal/chunked.go:248 +0xc0
  net/http/internal.(*chunkedWriter).Write()
      /usr/local/go/src/net/http/internal/chunked.go:245 +0x90
  (above block starting with io.copyBuffer repeats a few hundred times)

Goroutine 21 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1851 +0x684
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:2279 +0x7c
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1792 +0x180
  testing.runTests()
      /usr/local/go/src/testing/testing.go:2277 +0x77c
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:2142 +0xb68
  <private repo>/integrationtest/failuretest.TestMain()
      <private repo>/integrationtest/failuretest/setup_test.go:16 +0x150
  main.main()
      _testmain.go:61 +0x114

Goroutine 22 (finished) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1851 +0x684
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:2279 +0x7c
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1792 +0x180
  testing.runTests()
      /usr/local/go/src/testing/testing.go:2277 +0x77c
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:2142 +0xb68
 <private repo>/integrationtest/failuretest.TestMain()
      <private repo>/integrationtest/failuretest/setup_test.go:16 +0x150
  main.main()
      _testmain.go:61 +0x114
==================

All repsitories are private so I can only share part of the source code. Below are excerpts of the source code referenced in the datarace stacktrace

// file /pkg/test_helpers/v2/suite.go
 
// TearDown removes the resources and network created for the test suite.
func (t *TestSuite) TearDown(ctx context.Context) error {
        var finalErr error

        var numResources int

        t.mu.Lock()
        numResources = len(t.Resources)
        
        // the line below was marked as a datarace which I do not understand
        t.logger.Logf("tearing down %d containers", numResources) // <- suite.go:349. !!! This triggers the read action. The logger property is an interface that is fulfilled by testing.T and consists of methods Log and Logf.
        t.mu.Unlock()    
        
        for _, resource := range t.Resources {
                if err := resource.Purge(ctx); err != nil {
                        finalErr = errors.Join(err)
                }
        }
        
        t.logger.Log("remove network")

        if err := t.Pool.RemoveNetwork(t.Network); err != nil {
                finalErr = errors.Join(err)
        }
        
        t.teardownComplete <- struct{}{}

The above TearDown is called from the t.Cleanup method in the functino below:

// file /integrationtest/failuretest/intermittent_vcenter_failure_test.go
func TestIntermittentVcenterFailure(t *testing.T) {

        t.Parallel()

        ctx := t.Context()

        testSuite, err := testHelpers.NewTestSuite(testHelpers.WithLogger(t))

        t.Cleanup(func() {
                // stop containers
                _ = testSuite.TearDown(ctx) // <- intermittent_vcenter_failure_test.go:44
        })

Goroutines 21 and 22 are started by /integrationtest/failuretest/setup_test.go:16. The source code for this is:

// file setup_test.go
package failuretest

import (
        "flag"
        "testing"
)

var noTearDown, updateGoldenFiles bool

func TestMain(m *testing.M) {
        flag.BoolVar(&noTearDown, "noteardown", false, "keep containers after finishing test")
        flag.BoolVar(&updateGoldenFiles, "update", false, "update golden files")
        flag.Parse()
        m.Run() // <- setup_test.go:16
}

What did you expect to see?

I was not expecting a datarace.

I have spend quite some time finding a solution and replicating the issue in an application that I can share but I was not successful.

The two conflicting go routines are started by testing.M.Run which is not under my control.
I cannot manipulate the testing.T object because that is a given.
My conclusion up to this point is that this datarace is outside of my control. Maybe it is a bug or maybe one should not write with t.Log in a Cleanup function. However I tried to trigger a datarace in a heavily stripped down version of my test but I was not able to replicate this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions