-
Notifications
You must be signed in to change notification settings - Fork 21.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch up all touch
calls made in a transaction.
#18824
Conversation
Greatly reduces the number of database round-trips made when `touch` is called on a lot of records. Fixes rails#18606.
@@ -499,6 +499,7 @@ def except(adapter_names_to_exclude) | |||
t.string :name | |||
t.column :updated_at, :datetime | |||
t.column :happy_at, :datetime | |||
t.column :sad_at , :datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space between comma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I'll fix this.
(Please correct me if I am wrong). As far as I can tell the way you track this, is holding a reference to all records that call Post.transaction do
100_000.times do
Post.last.touch
end
end |
end | ||
end | ||
|
||
klass.unscoped.where(klass.primary_key => records.sort).update_all(changes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed one change I forgot to make, that I have to make before this PR is merged: increment the optimistic locking column for all records if locking is enabled. (See https://github.com/rails/rails/blob/master/activerecord/lib/active_record/persistence.rb#L478).
@arthurnn You are correct. I was concerned about this as well. It should be noted that Rails already does this within a transaction for all records that have In the case of touching, I've have a proposed fix but I think it could be implemented in a separate PR if it becomes a problem. The proposal is to set a limit to the number of records that are held in the buffer, and flush them to the DB if we reach that number of records. If necessary, it could even be an option to The only issue I can think of for such a mechanism is dealing with rollbacks consistently. |
* Use `class << self`. * Make private methods public, marked with :nodoc:. * Fix spaces in schema.rb.
Restoring a hidden comment that's still relevant: I noticed one change I forgot to make, that I have to make before this PR is merged: increment the optimistic locking column for all records if locking is enabled. (See https://github.com/rails/rails/blob/master/activerecord/lib/active_record/persistence.rb#L478). |
end | ||
|
||
state.updated klass, attrs, records | ||
records.each { |record| record._run_touch_callbacks } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
records.each(&:_run_touch_callbacks)
Thanks for your work on this. With all the Anyone else has an opinion on this? |
I agree with @egilburg. It's a bit spread through the code. But it should be done in a separate PR. |
That's an interesting proposal, @egilburg. It's a tricky thing, trying to figure out the right way to organize the If there ends up being agreement that the |
Using benchmark-ips in Ruby 2.2, I found that: 1. Adding records to a SortedSet is just as fast as adding them to a Set. 2. Calling `sorted_set.to_a` and iterating over the results is 337 times faster than calling `set.sort` and iterating over those results. (42,180 vs. 125 iterations per second.) The memory usage is the same, because both methods create a temporary array.
Just as with a `touch` that occurs outside a transaction, this code will update the optimistic locking column but will not fail if the column was changed by another instance. When updating instances in batch, it's impossible to use the lock_version to determine whether another instance has modified the record.
I have fixed the issue I raised about incrementing the optimistic locking column. The fix is in the |
end | ||
|
||
protected | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✂️ newline and indent under the privacy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Lovely work on this. I'll let others help with the detailed review, but this will be a great addition to Rails. I've seen more than my fair share of crazy touch cascades. |
|
||
delegate :add_record, to: :state | ||
|
||
# Are we currently executing in a delay_touching block? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✂️ this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
# touched records with the outer transaction's touched records. | ||
def merge_transactions | ||
num_states = states.length | ||
states[num_states - 2].merge!(current_state) if num_states > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened to states[-2].merge!(current_state) if delay_touching?
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I missed the [-2]
. I'll fix that.
I can't use if delay_touching?
because that will return true even when there is only one state on the stack. I need to make sure there are two or more states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right 👍
I've applied all your recommendations except the ones I pushed back on. 🤘 |
@BMorearty Consider that we may not need touch batching, but just touch coalescing, to alleviate the primary pains with The big pains are deadlocks and lock contention on parent records. Since We can nail both of these by marking records when they're touched, then performing the actual touch (the same one AR would immediately issue) before commit. Here's a sketch from a plugin form of this: https://gist.github.com/jeremy/84134ad08f2a137aa1ef This ensures that the record gets an updated timestamp immediately, that repeated touches are coalesced into a single touch per record, and that touches are performed in a predictable order at the end of the transaction. It depends on couple of AR improvements: before_commit #18903, commit callbacks with manual record enrollment #18904, and touching with an explicit These improvements equip us with a toolbox to make a straightforward implementation of touch coalescing and go on to reuse the same tools to do other kinds of activity coalescing within AR, plugins, and apps. What do you think about starting with touch coalescing only, then expanding to batching as a second exploration that builds on that? |
@jeremy Thanks for the feedback. Here are my thoughts.
These were not the big pains we experienced on a Rails project last summer that prompted me to write the activerecord-delay_touching gem, on which this PR is based. We didn't run across lock contention. For us, the big pain was the large amount of time it would take to touch a lot of records. There was a huge cascade of touches. Some of them would be fixed with your proposal. For example, when updating a lot of Variants (this was a Spree-based project), the associated Product was touched many times. Your proposal would reduce that to one touch for the Product. However, Spree has a complex model that also causes it to touch multiple objects in the same table. One example: a Product can belong to multiple Categories. When the Product is touched, all associated Categories are touched. Anyway, that's the background on why I wrote this the way I did, not only coalescing touches on a single record but batching the touches that occur on multiple records in a table. |
We could coalesce many updates to one model type at the end as well. I don't think that precludes that. When we mark "this needs updating", we could mark that as |
I think the main simplification is to move to a mark-and-bulk-execution-at-the-end model. |
@jeremy Your gist is certainly simpler than the code in this PR. Part of the reason for the complexity in the PR was to handle nested transactions as they're documented--by which I mean, a rollback from an inner Another, more important requirement that cause a bit of complexity was that side-effects (more touches) from If we eliminate requirements like those, and eliminate the ability to coalesce all touches in a single table, then yes, the code can be simpler, as the gist shows. I'm certain we don't want to eliminate the side-effects requirement, though, right? Because then (Or we could still coalesce touches in a single table, as @dhh suggested, but that would re-introduce some of the complexity in this PR. E.g., go back to keying off not only the class, but also the columns that have to be touched. And either live with the timestamps having different values in memory than they have on the DB, or update memory and the DB together, as I did.) |
@BMorearty Re. performance, the Spree benchmark definitely show the best of batching: small number of distinct models (2), large number of distinct parent/grandparent records to touch (150). Coalescing offers no savings when we aren't re-touching common parent/grandparent records. In Russian-doll hierarchies, the # of distinct touchable records is much smaller. Updating N child records touches the same parent and grandparent record N times. Coalescing reduces the Russian-doll case to 2 updates, same as batching. Possible examples in Spree are updating order line item quantities (1 order touch) or backing up a viewable's assets (1 viewable touch). Re. transaction sync, |
Once isn't enough, right? We'd need to bubble it all the way up. I think my use of the word "batch" in the title of this PR may have caused some confusion about its purpose. I just want to clarify that the primary purpose of this code--the reason I wrote it--is the same as the primary purpose you described: the |
Right, I mean "once" in the "when" sense, ha! We're on the same page re. purpose 😁 On Wednesday, February 11, 2015, Brian Morearty notifications@github.com
|
What's the next step for this? @BMorearty is it clear how to move forward with the before_commit approach? |
Yes, @dhh, I think it's clear, but it sounds like this approach is completely different from the approach in the gem I wrote. I originally volunteered because I had already written the code and it had been working in production for a while (and it was an improvement over the current behavior of touches). If someone else wants to run with this new approach, I'm totally fine with that. By the way, I'm still not convinced the before_commit approach--as written in the gist--will correctly bubble up all side-effect touches to the Nth degree. It would need to be tested for that. |
I understand, Brian. Thanks for your work on this so far. Let’s open the issue up again to other contributors to take a swing at the before_commit approach. If that doesn’t pan out, I’m sure we’ll be back on this train ;)
|
👍 |
@arthurnn is going to give it a go on the before_commit approach. We'll reconvene in a new PR when it's ready. |
@BMorearty thanks, Brian. I hope you enjoyed the reviews 😄 |
Greatly reduces the number of database round-trips made when
touch
is called on a lot of records in a transaction. It does so by delaying thetouch
calls until just before the transaction is committed. At that point, it consolidates the touches into the smallest possible number of SQL UPDATEs by (a) only touching each record once, even if it was touched repeatedly in the block, and (2) touching multiple records of a single table simultaneously, using an IN clause.after_touch
calls are also deferred until right after the database updates occur, just before the commit.The original code is from the activerecord-delay_touching gem, which has been in production on multiple sites for a bit over six months.
Some benchmarks:
I wrote a sample app where
Person has_many :pets
andPet belongs_to :person, touch: true
. So each time a pet is updated, the person who owns it is touched. I made a person who has 25 pets. Then I ran this code in Rails 5 with and without this change:The result was 52 iterations per second without this change and 91 iterations per second with it.
One of the production sites where activerecord-delay_touching has been in use is a Spree site. To benchmark this change, @mtuckergd created 6 products. He then added 75 variants to each product. He benchmarked code that processes changes to all of the products' variants and updates the master stock count for each of them. The Spree models that got saved were
Variant
,StockItem
, andStockMovement
. Three runs were with delay touching disabled, and three were with it enabled. Results:That’s a 51% improvement to the average processing time.
Tests are included for edge cases such as rolling back a nested transaction, both with and without
:requires_new
.Fixes #18606.