-
Notifications
You must be signed in to change notification settings - Fork 746
Safe database swap #2122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Safe database swap #2122
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @m04f, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the reliability of database swap operations by implementing a robust, transactional-like mechanism. It directly addresses a potential data loss issue by ensuring that if a new database fails to be properly integrated, the system can automatically recover the original, functional database, thereby preventing data loss and maintaining system availability.
Highlights
- Safe Database Swap Implementation: The core
Swap
method inSwappableDB
has been re-engineered to prevent data loss during database replacement operations. - Transactional Swap with Rollback: The old database is now temporarily backed up to a
.bak
file before the new database is moved into its place. If the new database fails to open after being moved, the system attempts to automatically restore the original database from this backup, ensuring data integrity. - New Error Type for Data Loss: A specific
ErrDbDataLost
error has been introduced to clearly indicate when both the primary swap operation and the subsequent recovery attempt of the original database have failed, signaling an unrecoverable state. - Comprehensive Recovery Testing: A new test case,
Test_SwapRecoveryOnFailure
, has been added. This test specifically validates the new recovery mechanism by simulating a scenario where a new database file passes initial SQLite header validation but fails during the actual database opening, ensuring the rollback logic is robust.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a safer database swap mechanism to prevent data loss during the operation. The new implementation backs up the existing database before attempting to swap, and restores it if the swap fails. The changes are well-reasoned and include a thorough test case for the recovery path. My review includes suggestions to improve error handling by using fmt.Errorf
with %w
for better error inspection, and to make the new test more robust and portable.
db/swappable_db.go
Outdated
if err := os.Rename(s.db.Path(), oldDbPath); err != nil { | ||
return fmt.Errorf("failed to rename old database: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
db/swappable_db.go
Outdated
if err != nil { | ||
return fmt.Errorf("open SQLite file failed: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
db/swappable_db.go
Outdated
return fmt.Errorf("open SQLite file failed: %s", err) | ||
if err := replaceDb(path, fkConstraints, walEnabled); err != nil { | ||
if recoveryErr := replaceDb(oldDbPath, oldFkConstraints, oldWalEnabled); recoveryErr != nil { | ||
return fmt.Errorf("%w: failed to swap database (%v) and failed to recover original database (%v)", ErrDbDataLost, err, recoveryErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Test_SwapRecoveryOnFailure tests that the Swap function recovers the original database | ||
// when the swap operation fails after the old database has been moved. | ||
func Test_SwapRecoveryOnFailure(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test uses syscall.Mkfifo
, which is not available on Windows. As noted in the PR description, this will cause the test to fail on Windows platforms. To ensure portability of the test suite, it's best to skip this test on Windows.
You can do this by adding a check at the beginning of the test function:
import "runtime"
// ... in Test_SwapRecoveryOnFailure ...
if runtime.GOOS == "windows" {
t.Skip("Skipping test on Windows because it uses syscall.Mkfifo")
}
Alternatively, you can use a build tag //go:build !windows
at the top of the file.
os.WriteFile(swapPath, sqliteHeader, 0666) | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error returned by os.WriteFile
is not checked. If this write fails, the test might fail in a confusing way later on. It's better to handle this error.
Since this runs in a separate goroutine, you can't use t.Fatalf
directly. A common pattern is to use a channel to send the error back to the main test goroutine, which can then fail the test.
For example:
errCh := make(chan error, 1)
go func() {
errCh <- os.WriteFile(swapPath, sqliteHeader, 0666)
}()
// ... perform swap ...
// After swap, check for error from goroutine
if err := <-errCh; err != nil {
t.Fatalf("writing to fifo failed: %v", err)
}
defer os.Remove(swapPath) | ||
|
||
// Avoid blocking the thread | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this? Why a goroutine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing to a fifo blocks until the other end is opened for reading. Without the goroutine, the test would hang indefinitely at the write, since Swap() (which triggers the read attempt) hasn't been called yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Well, as per my other comment if we can improve this function such that if the passed in database doesn't open we just put the original DB back in, it might be OK to remove the SQLite check.
swapPath := mustTempPath() | ||
defer os.Remove(swapPath) | ||
|
||
err = syscall.Mkfifo(swapPath, 0666) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we modify Swap so that it always leaves the original database place if the swap fails then we can remove the SQLite validity check from the implementation. That was only there to try to protect us because Swap() was a one-way operation. But if we fix that we can remove the check.
Currently, the Swap method works by renaming the existing database files to a backup location, then renaming the new database into place, and finally attempting to open it. If opening the new database fails, it attempts to restore the original files. However, this backup-and-restore mechanism is itself fallible -- for example, the rename of WAL might fail after the database file has been renamed, potentially leading to data loss. A potentially safer approach would be to open and validate the new database file before performing any file renames. This would guarantee that we only touch the original database files if we’re confident the new database is valid. However, this would mean that the active database path would change after the swap. Would it be acceptable for the database path to change in this case? Based on a quick review of Swap's call sites, it doesn't seem like anything depends on the path staying the same, but I wanted to confirm. |
No, you can't change the database path. It's a core assumption of rqlite that the path, once set, is fixed. |
This PR implements safe database swap functionality to address the data loss issue described in #2103.
Testing Note
The recovery test uses
syscall.Mkfifo()
to create a file that passes initial SQLite header validation but fails during database opening. This simulates a swap failure scenario, though I think won't run on Windows platforms. (This is the only way I had in mind to pass the IsValidSQLiteFile guard)Closes #2103