-
Notifications
You must be signed in to change notification settings - Fork 8.6k
FIX: Ensure copy_data
callbacks run even when all rows are skipped
#33002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently, if a "copy" batch of an import step results in all rows being skipped, the `after_commit_of_skipped_rows` callback is never triggered. This happens because the callback is nested inside a block that only runs when at least one row is inserted. This change ensures the DB copy operation returns both inserted and skipped rows, allowing the caller to respond appropriately in either case.
Avoids creating a Hash object for each row. This also optimized the column mapping in `copy_data` because a while loop and some caching makes it faster.
Oh, good catch. It looks like it can't be avoided to yield skipped rows in the enumerator, so that's fine. But |
Yes, you’re right, instantiating a new hash for each row isn’t ideal for that path. I started off wanting to avoid enumerating the skipped rows entirely like you were doing, but most of the solutions I came up with ended up either breaking the existing batching implementation for skipped rows or taking things in a totally different direction. I also considered marking skipped rows in the existing row hash with something like |
inserted_rows = [] | ||
skipped_rows = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same spirit of micro-optimizing, what do you think about pre-allocating these too? We know they won't be larger than COPY_BATCH_SIZE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
Currently, if a batch "copy" of an import step results in all rows being skipped, the
after_commit_of_skipped_rows
callback is never triggered. This happens because the callback is nested inside a block that only runs when at least one row is inserted.This change ensures the DB copy operation returns both inserted and skipped rows, allowing the caller to respond appropriately in either case.