-
Notifications
You must be signed in to change notification settings - Fork 887
fake database locks during transactions #2473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Spike Curtis <spike@coder.com>
…_transaction_isolation
Huh TIL intermediate state was causing db issues. Makes sense though... Can we write a unit test for this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a really cool idea, but I wonder at what point in this rabbit hole it makes sense to stop reinventing the wheel...?
// reentrantLock is a lock that can be locked multiple times from the same goroutine. | ||
// | ||
// Go maintainers insist that this is a Bad Idea and refuse to implement it in the standard library, so let's talk about | ||
// why we're doing it here. | ||
// | ||
// We want to support locking the fake database for the duration of a transaction, so that other goroutines cannot see | ||
// uncommitted transactions. However, we also need to lock the database during queries that are not explicitly in a | ||
// transaction. When a goroutine executing a transaction calls a query, it is already holding the lock, so attempting | ||
// to lock a standard mutex again will deadlock. A reentrant lock neatly solves this problem. | ||
// | ||
// The argument I've heard around why reentrant locks are a Bad Idea points out that it indicates a problem with your | ||
// interface, because some methods must leave the database in an inconsistent state. That criticism applies here | ||
// because the whole reason we need transactions is sometimes individual queries leave the database in an inconsistent | ||
// state. However valid the criticism, the assumption that it becomes the most important factor in all cases is flawed | ||
// and, frankly, patronizing. | ||
// | ||
// Here we do not have the luxury of reinventing the interface, because this fake database is attempting to | ||
// emulate another piece of software which does have this interface: postgres. Basically, the logic that enforces the | ||
// consistency of the database resides at a higher layer than we are emulating. | ||
// | ||
// Some alternatives considered, but rejected: | ||
// | ||
// 1. create an explicit transaction type, which are serialized to a channel and then processed in order. | ||
// * requires implementing each query function twice, once wrapping it in a transaction, and once doing the real | ||
// work | ||
// * cannot support recursive transactions | ||
// 2. store whether we're in a transaction in the Context passed to the queries. | ||
// * changes InTx(func(store) error) -> InTx(ctx, func(ctx2, store) error). Inside the transaction function, | ||
// callers **must use** ctx2 to query. Use of other contexts, like ctx, will deadlock. Adding this tripmine | ||
// to every use of transactions seems like a recipe for bugs that are hard to diagnose (and only show up with the | ||
// fake database). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤌 11/10 documentation
b := make([]byte, 64) | ||
b = b[:runtime.Stack(b, false)] | ||
b = bytes.TrimPrefix(b, []byte("goroutine ")) | ||
b = b[:bytes.IndexByte(b, ' ')] | ||
n, _ := strconv.ParseUint(string(b), 10, 64) | ||
return n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😬
At what point does it make more sense to just dispense with the fake database entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you and me both, but we do need a suite of tests that run fast, and our postgres tests are slow.
CLI tests, by and large, shouldn't be calling the server at all, and instead be mocking out the codersdk. That will make them simpler (because you don't need a bunch of preamble to get the DB into the state you want), and super fast.
Coderd tests can and should mock out the database interface for most cases.
We still will want a couple integration tests that go all the way from the CLI -> API -> Database, but there will only be a few and we could use postgres for that.
Once we're in that state, we could consider killing the fake database. I'll repeat this in another issue.
Can we just make the mutex an interface, and when calling func (q *fakeQuerier) InTx(fn func(database.Store) error) error {
q.mutex.Lock()
defer q.mutex.Unlock()
return fn(&fakeQuerier{
// This mutex Lock()/Unlock() do nothing
mutex: ¬AMutex{},
// Same reference to all data
fakeQuerierData: q.fakeQuerierData,
})
} |
// * changes InTx(func(store) error) -> InTx(ctx, func(ctx2, store) error). Inside the transaction function, | ||
// callers **must use** ctx2 to query. Use of other contexts, like ctx, will deadlock. Adding this tripmine | ||
// to every use of transactions seems like a recipe for bugs that are hard to diagnose (and only show up with the | ||
// fake database). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably implement this with a ruleguard rule and it would not require the reentrant lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our InTx
currently doesn't require a context param, I think it's better to keep it that way.
I think it's easier just to make a wrapping struct that can contain some state. So store.IsInTx()
could return true/false for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started doing that, and that was the point as which I said, a reentrant lock is easier.
The ruleguard helps if it actually catches your problem (impossible to get to 100% on that one), and you remember to run it. It's natural when faced with test failures to want to fix them before you start worrying about things like linting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sqldb currently keeps the "is in a transaction" state on itself. See #2470
I'd rather copy that than use ctx
to pass the "is in a tx" state
Seconded. I guess this approach has the slight downside of requiring a bit of code churn to add an extra layer of indirection (everything that currently accesses fields of On the other hand, I think it makes it easier to add support for rolling back transactions. The simple, stupid approach would be to clone the entire inner |
If you anonymously embed, the fields can be accessed directly. So no need to update accesses. type fakeQuerier struct {
mutex Mutex
*fakeQuerierData
}
This might get tricky with deletes and things. It might not be too bad to merge states, but that is going to require some code to handle modifies vs deletes vs adds. Probably some reflect magic based on |
Might fix #2436 entirely, but definitely will fix in the fake db case.
There are additional bugs lurking around basically every use of
InTx()
in our tests that this fixes, because until this PR, transactions exposed intermediate state to other goroutines with the fake database.