Skip to content

Regression in bulk insert behavior #163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ajm188 opened this issue Apr 17, 2017 · 33 comments
Closed

Regression in bulk insert behavior #163

ajm188 opened this issue Apr 17, 2017 · 33 comments

Comments

@ajm188
Copy link

ajm188 commented Apr 17, 2017

The following query should be able to be executed in bulk:

INSERT INTO users (id, name) VALUES (1, 'foo'), (2, 'bar') /* insert multiple users in one query */

With mysqlclient 1.3.6, this query is emitted as a bulk insert.

I upgraded mysqclient to 1.3.9, and suddenly, this query is emitted as two individual inserts.

@ajm188
Copy link
Author

ajm188 commented Apr 17, 2017

A git bisect shows 49e401b to be the commit that introduced this regression

@asottile
Copy link

@methane should we revert that commit? This is a pretty bad regression preventing us from upgrading

@methane
Copy link
Member

methane commented Apr 20, 2017

Why can't you just remove comment?

@asottile
Copy link

We use comments to uniquely identify queries such that it can be known from the server where the (problematic) query came from. It's incredibly useful for detecting "where did this come from and who and why and how can we stop it".

Comments are valid sql, refusing to support them seems not the best idea.

@methane
Copy link
Member

methane commented Apr 20, 2017

executemany() don't refuse it.
It just use normal execute().

@asottile
Copy link

Correct it doesn't refuse it, but it turns a bulk insert (1.3.8) which was a single fast query into thousands (millions even!). This kills the db server.

@methane
Copy link
Member

methane commented Apr 20, 2017

execute() sends the query to MySQL. It should support all valid SQL as possible.
On the other hand, executemany() try to rebuild SQL. It can't support all valid SQL unless having full parser.
It must not break SQL. It should fallback to execute() if it can be false positive.

@asottile
Copy link

understood. This used to work but was broken in a patch release. This works correctly with mysql-python also. It's pretty simple to fix as well (as the two related pull requests demonstrate) -- I'm unsure why you are so adamant on not accepting a simple fix to a pretty clear regression.

@methane
Copy link
Member

methane commented Apr 20, 2017

Because it may cause false positive and break query.

@asottile
Copy link

You already break query (this issue)!

If there's a demonstrated test with the fix will you accept a small pull request?

@asottile
Copy link

This is an almost identical (in idea) PR which was accepted: #154

Why the difference of opinion here?

@methane
Copy link
Member

methane commented Apr 20, 2017

Because I don't think ;?\s* can cause false positive.

@methane
Copy link
Member

methane commented Apr 20, 2017

#161 may have false positive. While I didn't test it, I feel it can match entire /* xxx */ ON DUPLICATE KEY x=x+1 /* zzz */

#162 may have false positive too. INSERT INTO x (a, b) values (%s, %s); INSERT INTO y (c, d) values (%s, %s);

Most important problem is maintenance cost. Both of PyMySQL and MySQL-python had been unmaintained for years. And I'm not active enough too.
I think focusing low level, simplicity, stability and performance is very important for project continuity.

Regular expression is one of part which makes project maintenance difficult.
I don't want to pay my time to review complex regular expression which may cause false positive.
But :?\s* looks OK to me because it is short and it doesn't use .* or other dangerous expression which may cause false positive.

You can override cursor and implement your own executemany().
Or you can move your comment to other place like INSERT INTO table /* comment here */ (c1, c2) values (%s, %s).

@asottile
Copy link

If I craft a regular expression which provably cannot match nested comments would you accept a pull request?

@asottile
Copy link

It's also bad practice on github to outright close pull requests without suggesting improvements. I understand you don't want to maintain this project but that doesn't mean we have to be rude to each other :)

@asottile
Copy link

Here's a regex which does not match too much:

/\*.*?\*/

I'll make a PR with this instead

@asottile
Copy link

Err sorry, need a little bit more complicated (but still pretty readable):

/\*(?:[^*]|(?:\*+([^*/])))*\*+/

  1. start with /*
  2. any number of these:
    • any number of non * characters
    • any number of * not followed by a /

@ajm188
Copy link
Author

ajm188 commented Apr 20, 2017

For what it's worth, we can use re.X (aliased to re.VERBOSE) which would allow us to add comments to different parts of the regular expression. This would help add clarity to what is already (in my opinion) a very complex regex. docs here

@methane
Copy link
Member

methane commented Apr 21, 2017

If I craft a regular expression which provably cannot match nested comments would you accept a pull request?

If it's:

  • obvious, easy to maintain, everyone can read and understand completely.
  • zero false positive.

@methane
Copy link
Member

methane commented Apr 21, 2017

It's also bad practice on github to outright close pull requests without suggesting improvements. I understand you don't want to maintain this project but that doesn't mean we have to be rude to each other :)

I don't have any idea improving such pull request.
And opening pull requests which doesn't have chance to merge make hard to maintain project.

@methane
Copy link
Member

methane commented Apr 21, 2017

Err sorry, need a little bit more complicated (but still pretty readable):

Easy to understand is very important.

Current regex is cleary don't support last comment.
Your suggestion is unclear what type of comment is accepted.

@methane
Copy link
Member

methane commented Apr 21, 2017

executemany() is very special feature. The API is not designed for SQL rewriting.
(DB-API 2.0 is not written only for MySQL)

So I think user should adjust their SQL to match the regular expression.
So the regular expression should be very easy to everyone who use .executemany().

My plan is make the regex more strict, zero false positive.
And implement another fallback which is efficient than multiple .execute() call.
So I won't accept any complex regex which doesn't reduce false positive.

@methane
Copy link
Member

methane commented Apr 21, 2017

Additionally, I'm making difference between PyMySQL and mysqlclient smaller.
This is important to reduce maintenance cost too.

Since this project is GPL, please send pull request to PyMySQL.

@asottile
Copy link

I've updated #165 and opened PyMySQL/PyMySQL#566

@methane
Copy link
Member

methane commented Apr 27, 2017

I saw it. But I dislike such complex regex only for trailing comment. I don't want merge it.

I have more safer idea about executemany(). Something like:

def execute_many(self, query, params):
    batched_query = ";\n".join(self.mogrify(query, p)) for p in params)
    return self.execute(batched_query)

But I don't have time to verify the idea. Especially, I want to know how
performance worce. 3x is acceptable but 30x is not acceptable.

If you really want trailing comma support, could you try it?

@asottile
Copy link

The regex is dead simple (much much simpler than the parentheses one right above it!), it looks more complicated because of the non capturing groups. Without them it is much more readable:

/\*([^*]|(\*+([^*/])))*\*+/

@ajm188
Copy link
Author

ajm188 commented Apr 29, 2017

@methane trailing comment is becoming quite a common pattern in SQL querying, specifically for the purposes of annotating where in source code a query came from.

For example, here is a ruby gem whose sole feature is to add these sorts of annotations to queries. It's developed and maintained by basecamp (owned by Rails creator DHH), and they've been using it in production since 2012. Notably,

When we’re trying to improve a slow query, or identify a customer problem, we never have to go digging to understand where the query came from—it’s just right there. This comes in handy in development, support, and operations .... If you combine this with something like pt-query-digest, you end up with a powerful understanding of how each Rails action interacts with MySQL.

Supporting query annotations (via comments) seems perfectly valid to me, and there's plenty of prior art for supporting this in other ORMs.

@methane
Copy link
Member

methane commented Apr 29, 2017

In general, yes. But executemany() is very special purpose method.
Please fork and maintenance it by yourself.

@ajm188
Copy link
Author

ajm188 commented Apr 29, 2017

Can you explain to me how executemany() is "very special purpose"?

I genuinely don't feel like it needs to be.

@methane
Copy link
Member

methane commented Apr 29, 2017

execute() is general, executemany() is special. Is there any question about that???
I don't have enough English skill to answer such type of question.

@methane
Copy link
Member

methane commented Apr 29, 2017

If I accept this, why not heading comment? comment between INSERT and opening (?
comments between values?

executmany() builds "bulk insert" query from template.
Fundamentally, this is not job of driver. It's job of ORM or other SQL utilities.

My design is queries matching simple pattern are convert to "bulk insert" query.
And I plan to implement fallback using multi statement. When I (or some other
volunteer) implement it, this issue will be fixed.

But keeping the pattern minimum is my design decision.
There are no chance to accept pull request making the pattern complicated.

Instead, you can make your library for building query.

I have no time and energy to talk about this issue anymore.

@methane methane closed this as completed Apr 29, 2017
@ajm188
Copy link
Author

ajm188 commented Apr 29, 2017

Your argument is a tautology. "executemany is special because it is special"

If I accept this why not heading comment? comment between INSERT and opening (? comments between values?

Yes!!! These are all valid SQL queries that can be executed in bulk!! Supporting these would be fantastic for users of your library. I am happy to help add support for these.

keeping the pattern minimum is my design decision

I would agree with you, were it not for two points:

  1. As @asottile has previously stated, the pattern is already complicated.
  2. The pattern is fundamentally incorrect, and can be shown to be broken with trivial examples.

@methane
Copy link
Member

methane commented Apr 30, 2017

Your argument is a tautology. "executemany is special because it is special"

execute() executes SQL passed. It driver's job and it support trailing comment.
On same point of view, executemany() supports trailing comment comment.
Driver don't know SQL. It only knows escaping rule.

On the other hand, building "bulk insert" is not driver's job. It needs knowledge of SQL.
It is implemented as just a bonus. There are no guarantee like "all SQL are supported."

Yes!!! These are all valid SQL queries that can be executed in bulk!! Supporting these would be fantastic for users of your library. I am happy to help add support for these.

I hate it!! That's what I don't want to maintain!!!
You can create your own library to build SQL which I won't maintain.

As @asottile has previously stated, the pattern is already complicated.

Current complicity is needed to avoid false positive. Avoiding false positive is important to use
fallback safely.

The pattern is fundamentally incorrect, and can be shown to be broken with trivial examples.

NO! It's not broken. It just doesn't support all insert SQL.
It only support very limited pattern by design, internationally.

But I'm tired to talking with you. I'm considering completely remove the feature.
I want to reduce maintenance cost, and this thread is very huge cost to me.

@PyMySQL PyMySQL locked and limited conversation to collaborators Apr 30, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants