FIX: Ensure `copy_data` callbacks run even when all rows are skipped #33002

s3lase · 2025-05-30T01:33:50Z

Currently, if a batch "copy" of an import step results in all rows being skipped, the after_commit_of_skipped_rows callback is never triggered. This happens because the callback is nested inside a block that only runs when at least one row is inserted.

This change ensures the DB copy operation returns both inserted and skipped rows, allowing the caller to respond appropriately in either case.

Currently, if a "copy" batch of an import step results in all rows being skipped, the `after_commit_of_skipped_rows` callback is never triggered. This happens because the callback is nested inside a block that only runs when at least one row is inserted. This change ensures the DB copy operation returns both inserted and skipped rows, allowing the caller to respond appropriately in either case.

Avoids creating a Hash object for each row. This also optimized the column mapping in `copy_data` because a while loop and some caching makes it faster.

gschlager · 2025-05-31T23:15:10Z

Oh, good catch. It looks like it can't be avoided to yield skipped rows in the enumerator, so that's fine. But fetch_rows is a hot path and potentially enumerates millions of rows. Creating a Hash for each row seems wasteful. I pushed a commit with a solution that I'm more comfortable with.

migrations/lib/importer/copy_step.rb

migrations/lib/importer/discourse_db.rb

s3lase · 2025-06-01T23:44:35Z

Yes, you’re right, instantiating a new hash for each row isn’t ideal for that path. I started off wanting to avoid enumerating the skipped rows entirely like you were doing, but most of the solutions I came up with ended up either breaking the existing batching implementation for skipped rows or taking things in a totally different direction.

I also considered marking skipped rows in the existing row hash with something like row[:skip] = true, but I was concerned about potentially clashing with a future skip column. Your use of :"$skip" to get around this is a pretty cool workaround.

s3lase · 2025-06-02T00:01:40Z

migrations/lib/importer/discourse_db.rb

+      inserted_rows = []
+      skipped_rows = []


In the same spirit of micro-optimizing, what do you think about pre-allocating these too? We know they won't be larger than COPY_BATCH_SIZE

s3lase requested a review from a team as a code owner May 30, 2025 01:33

github-actions bot added the migrations-tooling PR which includes changes to migrations tooling label May 30, 2025

s3lase requested a review from gschlager May 30, 2025 01:41

Reuse existing row object to store $skip marker

92e7583

Avoids creating a Hash object for each row. This also optimized the column mapping in `copy_data` because a while loop and some caching makes it faster.

gschlager reviewed May 31, 2025

View reviewed changes

migrations/lib/importer/copy_step.rb Show resolved Hide resolved

gschlager reviewed May 31, 2025

View reviewed changes

migrations/lib/importer/discourse_db.rb Show resolved Hide resolved

gschlager reviewed May 31, 2025

View reviewed changes

migrations/lib/importer/discourse_db.rb Show resolved Hide resolved

s3lase commented Jun 2, 2025

View reviewed changes

gschlager approved these changes Jun 2, 2025

View reviewed changes

gschlager merged commit a48f33f into main Jun 2, 2025
5 checks passed

gschlager deleted the fix/mt/copy-data-callbacks branch June 2, 2025 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIX: Ensure `copy_data` callbacks run even when all rows are skipped #33002

FIX: Ensure `copy_data` callbacks run even when all rows are skipped #33002

Uh oh!

s3lase commented May 30, 2025 •

edited

Loading

Uh oh!

gschlager commented May 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

s3lase commented Jun 1, 2025

Uh oh!

s3lase Jun 2, 2025

Uh oh!

gschlager Jun 2, 2025

Uh oh!

Uh oh!

Uh oh!

FIX: Ensure copy_data callbacks run even when all rows are skipped #33002

FIX: Ensure copy_data callbacks run even when all rows are skipped #33002

Uh oh!

Conversation

s3lase commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gschlager commented May 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

s3lase commented Jun 1, 2025

Uh oh!

s3lase Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

gschlager Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

FIX: Ensure `copy_data` callbacks run even when all rows are skipped #33002

FIX: Ensure `copy_data` callbacks run even when all rows are skipped #33002

s3lase commented May 30, 2025 •

edited

Loading