Skip to content

fake database locks during transactions #2473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

spikecurtis
Copy link
Contributor

@spikecurtis spikecurtis commented Jun 17, 2022

Might fix #2436 entirely, but definitely will fix in the fake db case.

There are additional bugs lurking around basically every use of InTx() in our tests that this fixes, because until this PR, transactions exposed intermediate state to other goroutines with the fake database.

Signed-off-by: Spike Curtis <spike@coder.com>
@spikecurtis spikecurtis requested a review from a team June 17, 2022 17:51
@Emyrk
Copy link
Member

Emyrk commented Jun 17, 2022

Huh TIL intermediate state was causing db issues. Makes sense though...

Can we write a unit test for this?

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a really cool idea, but I wonder at what point in this rabbit hole it makes sense to stop reinventing the wheel...?

Comment on lines +10 to +40
// reentrantLock is a lock that can be locked multiple times from the same goroutine.
//
// Go maintainers insist that this is a Bad Idea and refuse to implement it in the standard library, so let's talk about
// why we're doing it here.
//
// We want to support locking the fake database for the duration of a transaction, so that other goroutines cannot see
// uncommitted transactions. However, we also need to lock the database during queries that are not explicitly in a
// transaction. When a goroutine executing a transaction calls a query, it is already holding the lock, so attempting
// to lock a standard mutex again will deadlock. A reentrant lock neatly solves this problem.
//
// The argument I've heard around why reentrant locks are a Bad Idea points out that it indicates a problem with your
// interface, because some methods must leave the database in an inconsistent state. That criticism applies here
// because the whole reason we need transactions is sometimes individual queries leave the database in an inconsistent
// state. However valid the criticism, the assumption that it becomes the most important factor in all cases is flawed
// and, frankly, patronizing.
//
// Here we do not have the luxury of reinventing the interface, because this fake database is attempting to
// emulate another piece of software which does have this interface: postgres. Basically, the logic that enforces the
// consistency of the database resides at a higher layer than we are emulating.
//
// Some alternatives considered, but rejected:
//
// 1. create an explicit transaction type, which are serialized to a channel and then processed in order.
// * requires implementing each query function twice, once wrapping it in a transaction, and once doing the real
// work
// * cannot support recursive transactions
// 2. store whether we're in a transaction in the Context passed to the queries.
// * changes InTx(func(store) error) -> InTx(ctx, func(ctx2, store) error). Inside the transaction function,
// callers **must use** ctx2 to query. Use of other contexts, like ctx, will deadlock. Adding this tripmine
// to every use of transactions seems like a recipe for bugs that are hard to diagnose (and only show up with the
// fake database).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤌 11/10 documentation

Comment on lines +51 to +56
b := make([]byte, 64)
b = b[:runtime.Stack(b, false)]
b = bytes.TrimPrefix(b, []byte("goroutine "))
b = b[:bytes.IndexByte(b, ' ')]
n, _ := strconv.ParseUint(string(b), 10, 64)
return n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😬

At what point does it make more sense to just dispense with the fake database entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you and me both, but we do need a suite of tests that run fast, and our postgres tests are slow.

CLI tests, by and large, shouldn't be calling the server at all, and instead be mocking out the codersdk. That will make them simpler (because you don't need a bunch of preamble to get the DB into the state you want), and super fast.

Coderd tests can and should mock out the database interface for most cases.

We still will want a couple integration tests that go all the way from the CLI -> API -> Database, but there will only be a few and we could use postgres for that.

Once we're in that state, we could consider killing the fake database. I'll repeat this in another issue.

@johnstcn johnstcn requested a review from a team June 17, 2022 18:06
@Emyrk
Copy link
Member

Emyrk commented Jun 17, 2022

Can we just make the mutex an interface, and when calling InTx() just pass this dummy mutex forward?

func (q *fakeQuerier) InTx(fn func(database.Store) error) error {
        q.mutex.Lock()
        defer q.mutex.Unlock()
	return fn(&fakeQuerier{
                // This mutex Lock()/Unlock() do nothing
		mutex:           &notAMutex{},
                // Same reference to all data
		fakeQuerierData: q.fakeQuerierData,
	})
}

Comment on lines +37 to +40
// * changes InTx(func(store) error) -> InTx(ctx, func(ctx2, store) error). Inside the transaction function,
// callers **must use** ctx2 to query. Use of other contexts, like ctx, will deadlock. Adding this tripmine
// to every use of transactions seems like a recipe for bugs that are hard to diagnose (and only show up with the
// fake database).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably implement this with a ruleguard rule and it would not require the reentrant lock

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our InTx currently doesn't require a context param, I think it's better to keep it that way.

I think it's easier just to make a wrapping struct that can contain some state. So store.IsInTx() could return true/false for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started doing that, and that was the point as which I said, a reentrant lock is easier.

The ruleguard helps if it actually catches your problem (impossible to get to 100% on that one), and you remember to run it. It's natural when faced with test failures to want to fix them before you start worrying about things like linting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sqldb currently keeps the "is in a transaction" state on itself. See #2470

I'd rather copy that than use ctx to pass the "is in a tx" state

@dwahler
Copy link
Contributor

dwahler commented Jun 17, 2022

Can we just make the mutex an interface

Seconded. I guess this approach has the slight downside of requiring a bit of code churn to add an extra layer of indirection (everything that currently accesses fields of q would have to be updated to use q.fakeQuerierData).

On the other hand, I think it makes it easier to add support for rolling back transactions. The simple, stupid approach would be to clone the entire inner fakeQuerierData value, pass the clone to fn, and then copy it back to the original if the transaction didn't abort.

@Emyrk
Copy link
Member

Emyrk commented Jun 17, 2022

Seconded. I guess this approach has the slight downside of requiring a bit of code churn to add an extra layer of indirection (everything that currently accesses fields of q would have to be updated to use q.fakeQuerierData).

If you anonymously embed, the fields can be accessed directly. So no need to update accesses.

type fakeQuerier struct {
	mutex Mutex
	*fakeQuerierData
}

On the other hand, I think it makes it easier to add support for rolling back transactions. The simple, stupid approach would be to clone the entire inner fakeQuerierData value, pass the clone to fn, and then copy it back to the original if the transaction didn't abort.

This might get tricky with deletes and things. It might not be too bad to merge states, but that is going to require some code to handle modifies vs deletes vs adds. Probably some reflect magic based on ID field for most structs.

@spikecurtis
Copy link
Contributor Author

Closing in favor of #2478 (thanks @Emyrk for the suggestion!)

@github-actions github-actions bot deleted the spike/2436_fake_db_transaction_isolation branch February 4, 2023 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

database race on create build / acquire job
5 participants