Skip to content

the "SET NAMES" change in mysqlclient 1.4.4 seems to break a case on MySQL 8.0 #383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zzzeek opened this issue Aug 11, 2019 · 11 comments
Closed

Comments

@zzzeek
Copy link

zzzeek commented Aug 11, 2019

So it's not clear if this is actually a regression, because apparently PyMySQL has had this problem for awhile, but it just popped up on mysqlclient 1.4.4. Re-adding the "SET NAMES" mentioned in the changelog seems to fix it. I think MySQL 8.0 has a problem when SET NAMES is not called so if pymysql/mysqlclient won't add it back in I will likely have to add it to SQLAlchemy.

Test case:

import MySQLdb

conn = MySQLdb.connect(
    user="scott", passwd="tiger", host="mysql80", db="test", charset="utf8mb4"
)

cursor = conn.cursor()
# uncomment to fix issue
#cursor.execute("SET NAMES utf8mb4")

for name in [
    "engineers",
    "managers",
]:
    cursor.execute("DROP TABLE IF EXISTS %s" % name)


cursor.execute(
    """
CREATE TABLE engineers (
    person_id INTEGER NOT NULL,
    PRIMARY KEY (person_id)
)
"""
)

cursor.execute(
    """
CREATE TABLE managers (
    person_id INTEGER NOT NULL,
    manager_name VARCHAR(50),
    PRIMARY KEY (person_id)
)
"""
)

cursor.execute("""
    SELECT engineers.person_id, CAST(null AS CHAR(50)) AS manager_name
    FROM engineers
    UNION
    SELECT managers.person_id, managers.manager_name FROM managers
""")


print(cursor.fetchall())

this is using MySQL 8.0.16 running in Centos docker:

MySQLdb._exceptions.OperationalError: (1271, "Illegal mix of collations for operation 'UNION'")

It doesn't like the "CAST(NULL AS CHAR(50))" in the UNION. suddenly. adding back the SET NAMES makes it go away. only on MySQL 8, not on 5.7, 5.6, mariadb, etc.

Sorry this issue kind of sucks.

@methane
Copy link
Member

methane commented Aug 15, 2019

I suppose you are using old (< MySQL 8.0) client library to connect MySQL 8.0.
The default collation of the "utf8mb4" encoding is changed in MySQL 8.0, from utf8mb4_general_ci to utf8mb4_0900_ai_ci.

MySQL CLI will produce a same error when CLI is 5.7 and Server is 8.0.
I don't think this is a bug of driver or ORM. This is just a MySQL's spec.

For example, we use utf8mb4_bin collation often when creating columns.
In such tables, SET NAMES utf8mb4 help nothing because utf8mb4_bin is neither
utf8mb4_general_ci nor utf8mb4_0900_ai_ci.

When user wants to specify collation per-column basis, the query is wrong.
The user should use CAST(null AS CHAR(50)) collate utf8mb4_0900_ai_ci instead.

On the other hand, when user wants to use single collation in whole the application,
they can use SET NAMES 'utf8mb4' COLLATE 'utf8mb4_0900_ai_ci'.

@methane methane closed this as completed Aug 15, 2019
@zzzeek
Copy link
Author

zzzeek commented Aug 16, 2019

I suppose you are using old (< MySQL 8.0) client library to connect MySQL 8.0.
The default collation of the "utf8mb4" encoding is changed in MySQL 8.0, from utf8mb4_general_ci to utf8mb4_0900_ai_ci.

I can confirm that building against MySQL 8.0 client library, the error does not occur. However, using pymysql, the error does occur. in all cases, SQLAlchemy's emitting of "SET NAMES" resolves the issue all around.

MySQL CLI will produce a same error when CLI is 5.7 and Server is 8.0.
I don't think this is a bug of driver or ORM. This is just a MySQL's spec.

For example, we use utf8mb4_bin collation often when creating columns.
In such tables, SET NAMES utf8mb4 help nothing because utf8mb4_bin is neither
utf8mb4_general_ci nor utf8mb4_0900_ai_ci.

I am mostly concerned about the default case. For all the MySQLs until 8.0, no COLLATE is shown for any CREATE TABLE:

Server version: 10.3.9-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [test]> create table foo (data varchar(30));
Query OK, 0 rows affected (0.028 sec)

MariaDB [test]> show create table foo;
+-------+------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                   |
+-------+------------------------------------------------------------------------------------------------+
| foo   | CREATE TABLE `foo` (
  `data` varchar(30) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+-------+------------------------------------------------------------------------------------------------+
1 row in set (0.005 sec)

above is mariadb 10.3. Same behavior on MySQL 5.7, there's no COLLATION shown when I didn't specify one.

For MySQL 8.0, there is:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 431423
Server version: 8.0.16 MySQL Community Server - GPL

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create table foo(data varchar(30));
Query OK, 0 rows affected (0.05 sec)

mysql> show create table foo;
+-------+---------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                              |
+-------+---------------------------------------------------------------------------------------------------------------------------+
| foo   | CREATE TABLE `foo` (
  `data` varchar(30) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+---------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

the utf8mb4_0900_ai_ci collation looks to be all new in mysql 8.0.

The older databases all seem to set collation_connection to "utf8_general_ci". MySQL 8, using the 8 client, sets it to "latin1_swedish_ci", which seems to have nothing to do with anything. Yet using the MySQL 8 client against server 8, the above "1271" error is not generated no matter what I do; the best I can do is specify collation in the SELECT and get "ERROR 1253 (42000): COLLATION 'latin1_swedish_ci' is not valid for CHARACTER SET 'utf8mb4'". I don't know why I don't get that error if I don't specify the collation, since that's the default for collation_connection.

Per MySQL 8 docs at https://dev.mysql.com/doc/refman/8.0/en/set-names.html, calling SET NAMES will set collation_connection to the default collation for that charset.

As far as why one would not want to call "SET NAMES", and allow the case where these settings are not desirable as something that can be changed separately, it's not clear. It would appear that mysqlclient and its predecessor python-mysql have in fact been calling SET NAMES on every connection for 15 years, and this has not been a problem. Can you confirm this is the case ?

When user wants to specify collation per-column basis, the query is wrong.
The user should use CAST(null AS CHAR(50)) collate utf8mb4_0900_ai_ci instead.

With all due respect, that SQL is ridiculous. The database should not need an obtuse COLLATE specification to match a NULL char column in a union. Also, this is not "the user", this is a SQL expression that's hardcoded in the SQLlchemy ORM and has to work across all databases, which it does. Databases like SQLite, Postgresql, SQL Server all have character collations as well as support for the "collate" keyword on a per-column basis yet none of them have such a requirement. MySQL's behavior is not reasonable, so as long as my call to "SET NAMES" is not going to cause any negative side effects and it resolves the need for this case, it will stay in. I would imagine that if the user is emitting CREATE TABLE with different kinds of collations, there might be issues, however IIUC, mysqlclient/python-mysql have emitted SET NAMES for years so we're still doing the best we can do here.

On the other hand, when user wants to use single collation in whole the application,
they can use SET NAMES 'utf8mb4' COLLATE 'utf8mb4_0900_ai_ci'.

AFAICT this occurs implicitly:

mysql> SET NAMES utf8mb4;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like 'collation_connection';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_0900_ai_ci |
+----------------------+--------------------+
1 row in set (0.00 sec)

I would only ask that the mysqlclient project appreciate that the removal of SET NAMES appears to be a major backwards incompatible change and this should have been called 1.5.

@methane
Copy link
Member

methane commented Aug 17, 2019

AFAICT this occurs implicitly:

Only when they want to use default collation. If they (including me) want to use other collation, they can not omit it.
And if the driver sends "SET NAMES" while connection but user want to use other collation, the user need to send "SET NAMES" twice.

I would only ask that the mysqlclient project appreciate that the removal of SET NAMES appears to be a major backwards incompatible change and this should have been called 1.5.

I don't think this change is "major" and I don't want to call it 1.5 only for this change.
This is just the same behavior to mysql cli.

@zzzeek
Copy link
Author

zzzeek commented Aug 17, 2019

AFAICT this occurs implicitly:

Only when they want to use default collation. If they (including me) want to use other collation, they can not omit it.
And if the driver sends "SET NAMES" while connection but user want to use other collation, the user need to send "SET NAMES" twice.

that works for me as long as the vast majority of users who know nothing about this don't get weird collation errors all the sudden. mysql 8 has been causing a LOT of "suddenly dosen't work" types of situations, it has a lot of regressions like this and this

I would only ask that the mysqlclient project appreciate that the removal of SET NAMES appears to be a major backwards incompatible change and this should have been called 1.5.

I don't think this change is "major" and I don't want to call it 1.5 only for this change.
This is just the same behavior to mysql cli.

IMO if mysqlclient / python-mysql have been sending "SET NAMES " for about 10-15 years that would make this a major behavioral change. the state of the DB connection is now different when it first connects.

@methane
Copy link
Member

methane commented Aug 17, 2019

IMO if mysqlclient / python-mysql have been sending "SET NAMES " for about 10-15 years that would make this a major behavioral change. the state of the DB connection is now different when it first connects.

It doesn't make sense unless this change affects MySQL server running from 10-15 years ago.
Most of bug fixes are behavioral chnage at some point.

The trouble is caused by not the change in this project. The trouble is caused by change in MySQL 8.0 and affects most mysql clients.

@zzzeek
Copy link
Author

zzzeek commented Aug 17, 2019

OK just trying to get some truth on the best practice for this, it seems like calling SET NAMES has no real downside since it's what essentially happens when using a compatible client. at least that's what https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html#charset-connection-client-configuration seems to say.

@methane
Copy link
Member

methane commented Aug 17, 2019

it's what essentially happens when using a compatible client. at least that's what https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html#charset-connection-client-configuration seems to say.

Note that It is happens only when "charset" command is used. --default-character-set=koi8r does not cause "SET NAMES 'koi8r'", like charset="koi8r" in this project.

@methane
Copy link
Member

methane commented Oct 19, 2021

FYI, mysqlclient v2.2 will send SET NAMES for new connection always again.
See #504 and #509.

@zzzeek
Copy link
Author

zzzeek commented Oct 19, 2021

I just re-read this whole issue to re-familiarize, then i was like wow what did @yonran say to change your mind. I guess time has just passed. well in any case we're setting it also, not sure if that's wasteful. we could alternatetively just check it first but i doubt that really changes the net effect at all.

@methane
Copy link
Member

methane commented Oct 20, 2021

Not only @yonran changed my mind. I reconsider this issue too.

And I saw two Stackoverflow questions that @yonran mentioned in his issue. These questions may be caused by --character-set-client-handshake option. That is silly configuration, but some users use it without knowing how connection charset is determined.

Sending SET NAMES always solves these issues:

  • MySQL 5.x vs 8
  • MySQL vs MariaDB
  • --character-set-client-handshake

I still don't like adding one round trip for every connection, because I don't like long living connections.
But many MySQL/MariaDB users can not solve these issues by themselves. I don't want to support them. Support cost is much higher than one round trip cost.

@zzzeek
Copy link
Author

zzzeek commented Oct 20, 2021

thats right! sometimes it's worth it just to have people not asking you all the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants