Safe database swap #2122

m04f · 2025-06-22T04:43:56Z

This PR implements safe database swap functionality to address the data loss issue described in #2103.

Testing Note

The recovery test uses syscall.Mkfifo() to create a file that passes initial SQLite header validation but fails during database opening. This simulates a swap failure scenario, though I think won't run on Windows platforms. (This is the only way I had in mind to pass the IsValidSQLiteFile guard)

Closes #2103

gemini-code-assist

Summary of Changes

Hello @m04f, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reliability of database swap operations by implementing a robust, transactional-like mechanism. It directly addresses a potential data loss issue by ensuring that if a new database fails to be properly integrated, the system can automatically recover the original, functional database, thereby preventing data loss and maintaining system availability.

Highlights

Safe Database Swap Implementation: The core Swap method in SwappableDB has been re-engineered to prevent data loss during database replacement operations.
Transactional Swap with Rollback: The old database is now temporarily backed up to a .bak file before the new database is moved into its place. If the new database fails to open after being moved, the system attempts to automatically restore the original database from this backup, ensuring data integrity.
New Error Type for Data Loss: A specific ErrDbDataLost error has been introduced to clearly indicate when both the primary swap operation and the subsequent recovery attempt of the original database have failed, signaling an unrecoverable state.
Comprehensive Recovery Testing: A new test case, Test_SwapRecoveryOnFailure, has been added. This test specifically validates the new recovery mechanism by simulating a scenario where a new database file passes initial SQLite header validation but fails during the actual database opening, ensuring the rollback logic is robust.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a safer database swap mechanism to prevent data loss during the operation. The new implementation backs up the existing database before attempting to swap, and restores it if the swap fails. The changes are well-reasoned and include a thorough test case for the recovery path. My review includes suggestions to improve error handling by using fmt.Errorf with %w for better error inspection, and to make the new test more robust and portable.

gemini-code-assist · 2025-06-22T04:46:12Z

db/swappable_db.go

+	if err := os.Rename(s.db.Path(), oldDbPath); err != nil {
+		return fmt.Errorf("failed to rename old database: %s", err)


It's a good practice to use %w format verb when wrapping errors with fmt.Errorf[^1]. This preserves the original error in the error chain, allowing callers to inspect it using errors.Is or errors.As.

return fmt.Errorf("failed to rename old database: %w", err)

gemini-code-assist · 2025-06-22T04:46:12Z

db/swappable_db.go

+		if err != nil {
+			return fmt.Errorf("open SQLite file failed: %s", err)


Using %w to wrap this error will improve error handling capabilities, as it preserves the underlying error for inspection[^1].

return fmt.Errorf("open SQLite file failed: %w", err)

gemini-code-assist · 2025-06-22T04:46:12Z

db/swappable_db.go

-		return fmt.Errorf("open SQLite file failed: %s", err)
+	if err := replaceDb(path, fkConstraints, walEnabled); err != nil {
+		if recoveryErr := replaceDb(oldDbPath, oldFkConstraints, oldWalEnabled); recoveryErr != nil {
+			return fmt.Errorf("%w: failed to swap database (%v) and failed to recover original database (%v)", ErrDbDataLost, err, recoveryErr)


Consider using %w to wrap the errors here to preserve the error chain[^1].

return fmt.Errorf("%w: failed to swap database (%v) and failed to recover original database (%v)", ErrDbDataLost, err, recoveryErr)

gemini-code-assist · 2025-06-22T04:46:12Z

db/swappable_db_test.go

+// Test_SwapRecoveryOnFailure tests that the Swap function recovers the original database
+// when the swap operation fails after the old database has been moved.
+func Test_SwapRecoveryOnFailure(t *testing.T) {


This test uses syscall.Mkfifo, which is not available on Windows. As noted in the PR description, this will cause the test to fail on Windows platforms. To ensure portability of the test suite, it's best to skip this test on Windows.

You can do this by adding a check at the beginning of the test function:

import "runtime" // ... in Test_SwapRecoveryOnFailure ... if runtime.GOOS == "windows" { t.Skip("Skipping test on Windows because it uses syscall.Mkfifo") }

Alternatively, you can use a build tag //go:build !windows at the top of the file.

gemini-code-assist · 2025-06-22T04:46:12Z

db/swappable_db_test.go

+		os.WriteFile(swapPath, sqliteHeader, 0666)
+	}()


The error returned by os.WriteFile is not checked. If this write fails, the test might fail in a confusing way later on. It's better to handle this error.

Since this runs in a separate goroutine, you can't use t.Fatalf directly. A common pattern is to use a channel to send the error back to the main test goroutine, which can then fail the test.

For example:

errCh := make(chan error, 1) go func() { errCh <- os.WriteFile(swapPath, sqliteHeader, 0666) }() // ... perform swap ... // After swap, check for error from goroutine if err := <-errCh; err != nil { t.Fatalf("writing to fifo failed: %v", err) }

otoolep · 2025-06-22T13:46:18Z

db/swappable_db_test.go

+	defer os.Remove(swapPath)
+
+	// Avoid blocking the thread
+	go func() {


Why this? Why a goroutine?

Writing to a fifo blocks until the other end is opened for reading. Without the goroutine, the test would hang indefinitely at the write, since Swap() (which triggers the read attempt) hasn't been called yet.

I see. Well, as per my other comment if we can improve this function such that if the passed in database doesn't open we just put the original DB back in, it might be OK to remove the SQLite check.

otoolep · 2025-06-22T13:48:02Z

db/swappable_db_test.go

+	swapPath := mustTempPath()
+	defer os.Remove(swapPath)
+
+	err = syscall.Mkfifo(swapPath, 0666)


If we modify Swap so that it always leaves the original database place if the swap fails then we can remove the SQLite validity check from the implementation. That was only there to try to protect us because Swap() was a one-way operation. But if we fix that we can remove the check.

m04f · 2025-06-22T19:27:15Z

Currently, the Swap method works by renaming the existing database files to a backup location, then renaming the new database into place, and finally attempting to open it. If opening the new database fails, it attempts to restore the original files. However, this backup-and-restore mechanism is itself fallible -- for example, the rename of WAL might fail after the database file has been renamed, potentially leading to data loss.

A potentially safer approach would be to open and validate the new database file before performing any file renames. This would guarantee that we only touch the original database files if we’re confident the new database is valid. However, this would mean that the active database path would change after the swap.

Would it be acceptable for the database path to change in this case? Based on a quick review of Swap's call sites, it doesn't seem like anything depends on the path staying the same, but I wanted to confirm.

otoolep · 2025-06-23T14:24:01Z

No, you can't change the database path. It's a core assumption of rqlite that the path, once set, is fixed.

m04f added 2 commits June 22, 2025 07:28

Add atomic database swap

f05205b

Add SQLite DB swap recovery test

b55803c

gemini-code-assist bot reviewed Jun 22, 2025

View reviewed changes

m04f added 2 commits June 22, 2025 15:43

correctly rename and delete database files

af68939

fix database swap error handling

5da1d1f

otoolep reviewed Jun 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Safe database swap #2122

Safe database swap #2122

Uh oh!

m04f commented Jun 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jun 22, 2025

Uh oh!

gemini-code-assist bot Jun 22, 2025

Uh oh!

gemini-code-assist bot Jun 22, 2025

Uh oh!

gemini-code-assist bot Jun 22, 2025

Uh oh!

gemini-code-assist bot Jun 22, 2025

Uh oh!

otoolep Jun 22, 2025

Uh oh!

m04f Jun 22, 2025

Uh oh!

otoolep Jun 22, 2025

Uh oh!

otoolep Jun 22, 2025

Uh oh!

m04f commented Jun 22, 2025

Uh oh!

otoolep commented Jun 23, 2025

Uh oh!

Uh oh!

		if err := os.Rename(s.db.Path(), oldDbPath); err != nil {
		return fmt.Errorf("failed to rename old database: %s", err)

		if err != nil {
		return fmt.Errorf("open SQLite file failed: %s", err)

Safe database swap #2122

Are you sure you want to change the base?

Safe database swap #2122

Uh oh!

Conversation

m04f commented Jun 22, 2025

Testing Note

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

otoolep Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

m04f Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

otoolep Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

otoolep Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

m04f commented Jun 22, 2025

Uh oh!

otoolep commented Jun 23, 2025

Uh oh!

Uh oh!