Skip to content

Error when trying to execute "LOAD DATA LOCAL INFILE" using Python Processes #426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aldeirm2 opened this issue Feb 17, 2016 · 19 comments · Fixed by #431
Closed

Error when trying to execute "LOAD DATA LOCAL INFILE" using Python Processes #426

aldeirm2 opened this issue Feb 17, 2016 · 19 comments · Fixed by #431

Comments

@aldeirm2
Copy link

I have a multiprocess application that processes large volumes of data in parallel and at the end loads CSV files into my DB.

The code starts off by creating n number of Python Processes . In each of these processes, I create a separate connection and cursor which I try to use to load this data however I keep getting an error with: pymysql.err.InternalError: Packet sequence number wrong - got 3 expected 2

Sample code here but obviously you would need to put in connection details etc:

from multiprocessing import Process
import pymysql

class RawFileExtractor(Process):

    def __init__(self, file_location):
        Process.__init__(self, daemon=True)
        self.file_location = file_location

    def run(self):
        self.load_csv()

    def load_csv(self):

        mysql_connection = pymysql.connect(user=user, password=password, host=host, database=database,
                                           max_allowed_packet=128000000, connect_timeout=100000, local_infile=True)
        mysql_cursor = mysql_connection.cursor(pymysql.cursors.DictCursor)

        mysql_cursor.execute("LOAD DATA LOCAL INFILE '{file_location}' INTO TABLE c210.table_name "
                             "FIELDS TERMINATED BY '\t' "
                             "LINES TERMINATED BY '\n';".format(file_location=self.file_location))
        mysql_connection.commit()
        mysql_connection.close()


processes = []
file_locations = ["/location/one.csv", "/location/two.csv"]
for i, file_location in enumerate(file_locations):
    processes[i] = RawFileExtractor(file_location=file_location)
    processes[i].start()

Of course my actual script/processes do a lot more than just load the data but this should be sufficient to replicate the issue.

I also know that this isn't a problem from the actual DB as I can load files without any problem using the python MySQL connector tool but I don't want to use it as it's buggy.

Full error below:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 986, in _write_bytes
    self.socket.sendall(data)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 1457, in send_data
    conn.write_packet(chunk)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 936, in write_packet
    self._write_bytes(data)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 988, in _write_bytes
    raise err.OperationalError(2006, "MySQL server has gone away (%r)" % (e,))
pymysql.err.OperationalError: (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 986, in _write_bytes
    self.socket.sendall(data)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 1326, in _read_load_local_packet
    sender.send_data()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 1462, in send_data
    conn.write_packet(b'')
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 936, in write_packet
    self._write_bytes(data)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 988, in _write_bytes
    raise err.OperationalError(2006, "MySQL server has gone away (%r)" % (e,))
pymysql.err.OperationalError: (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/process.py", line 254, in _bootstrap
    self.run()
  File "/Users/maldeiri/raw_data_processing/raw_file_extractor.py", line 109, in run
    "LINES TERMINATED BY '\n';".format(file_location=self.delta_stream_load_file_name))
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/cursors.py", line 146, in execute
    result = self._query(query)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/cursors.py", line 296, in _query
    conn.query(q)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 819, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 1001, in _read_query_result
    result.read()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 1290, in read
    self._read_load_local_packet(first_packet)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 1328, in _read_load_local_packet
    self.connection._read_packet()  # skip ok packet
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 952, in _read_packet
    (packet_number, self._next_seq_id))
pymysql.err.InternalError: Packet sequence number wrong - got 3 expected 2

If you need any further information please let me know.

Python version: 3.4
PyMYSQL version: 0.7.1
OS: MAC

@methane
Copy link
Member

methane commented Feb 20, 2016

In each of these processes, I create a separate connection and cursor

If it's true, muti-process must not affect.
Can you reproduce the issue in single-process?

@methane
Copy link
Member

methane commented Feb 20, 2016

Your first exception is broken pipe.
Your server may killed your connection while sending data.
Is there anything in your server log?

@aldeirm2
Copy link
Author

Mhh I'm getting the exact same error from a single process.

If I run the exact same statement/code from the mysql connector it works without any problems so I don't know how it can be a server side issue.

There are also nothing to note in the server logs.

Any ideas?

@methane
Copy link
Member

methane commented Feb 23, 2016

Could you post error log from single process?
It may be more readable than multi process error log.

@methane
Copy link
Member

methane commented Feb 23, 2016

There are also nothing to note in the server logs.

Do you use --log-warnings option?
https://dev.mysql.com/doc/refman/5.6/en/communication-errors.html

@aldeirm2
Copy link
Author

Here is the log from the single process:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 986, in _write_bytes
    self.socket.sendall(data)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 1457, in send_data
    conn.write_packet(chunk)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 936, in write_packet
    self._write_bytes(data)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 988, in _write_bytes
    raise err.OperationalError(2006, "MySQL server has gone away (%r)" % (e,))
pymysql.err.OperationalError: (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 986, in _write_bytes
    self.socket.sendall(data)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 1326, in _read_load_local_packet
    sender.send_data()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 1462, in send_data
    conn.write_packet(b'')
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 936, in write_packet
    self._write_bytes(data)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 988, in _write_bytes
    raise err.OperationalError(2006, "MySQL server has gone away (%r)" % (e,))
pymysql.err.OperationalError: (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/maldeiri/testCode/test_pymysql_load.py", line 9, in <module>
    cursor.execute("LOAD DATA LOCAL INFILE '/data/d1/delta_West_insert.csv' INTO TABLE c210.delta_stream FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';")
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/cursors.py", line 146, in execute
    result = self._query(query)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/cursors.py", line 296, in _query
    conn.query(q)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 819, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 1001, in _read_query_result
    result.read()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 1290, in read
    self._read_load_local_packet(first_packet)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 1328, in _read_load_local_packet
    self.connection._read_packet()  # skip ok packet
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 952, in _read_packet
    (packet_number, self._next_seq_id))
pymysql.err.InternalError: Packet sequence number wrong - got 3 expected 2

Process finished with exit code 1

The only thing I can see in the logs is the query:

3223 Query  LOAD DATA LOCAL INFILE '/data/West_insert.csv' INTO TABLE west_stream FIELDS TERMINATED BY '    ' LINES TERMINATED BY '
'
            2 Query COMMIT

I tried setting the log-warnnings to the highest value but I can't see anything there in the logs

@methane
Copy link
Member

methane commented Feb 23, 2016

Which log file did you see?

mysql> select @@log_error;
+--------------------------+
| @@log_error              |
+--------------------------+
| /var/log/mysql/error.log |
+--------------------------+

@aldeirm2
Copy link
Author

I'm using RDS on AWS, so I checked both the general/mysql-general.log which just has the queries and error/mysql-error.log which is completely empty.

@methane
Copy link
Member

methane commented Feb 23, 2016

Have you checked max_allowed_packet?

@aldeirm2
Copy link
Author

Yes, the file csv file is about 50MB and I have max_allowed_packet set to 250MB.

As I mentioned it's odd that the exact same load statement on the same database works using other libraries but just fails with pymysql. I did try select, update and insert statements with pymysql though and they all worked, it's just the load data local infile command that fails.

@methane
Copy link
Member

methane commented Feb 23, 2016

Which max_allowed_packet? client-side or server side?
max_allowed_packet can be smaller than file size.
What happens when using conn = pymysql.connect(..., max_allowed_packet=16*1024)?

@aldeirm2
Copy link
Author

Very very interesting :o

So I had it set to 250MB on the server side (and I still didn't change that) and max_allowed_packet=128000000 on the client side, but when I tried your suggestion of max_allowed_packet=16*1024 on the client it worked!

Why is that? Is there a cap on the max_allowed_packet value on the client side?

@methane
Copy link
Member

methane commented Feb 23, 2016

max_allowed_packet may be bigger than max size of packet.
MySQL's max packet size is 2<<24-1 byte (16MB-1byte).

max_allowed_packet option of PyMySQL was added to limit packet size smaller than 16MB. (see #317).
So this is a bug of PyMySQL which doesn't care about max_allowed_packet >= 16MB.
When you set max_allowed_packet larger than 16MB, PyMySQL produces broken packet.

@aldeirm2
Copy link
Author

Yes this causes a new error for me. I remember why I set the max_allowed_packet so high, as I have some very large SQL statements that I call and they are > 16MB.

Setting the max_allowed_packet solves the load problem but now I get a broken pipe error for my select statements.

@methane
Copy link
Member

methane commented Feb 23, 2016

You can set large max_allowed_packet to server and pass max_allowed_packet=16*1024*1024-1 to PyMySQL.

@aldeirm2
Copy link
Author

That's odd. After changing the max_allowed_packet to 16*1024, my queries are getting a broken pipe error. When I increase the max_allowed_packet my select statements works fine

`Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 986, in _write_bytes
    self.socket.sendall(data)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/threading.py", line 921, in _bootstrap_inner
    self.run()
  File "/Users/maldeiri/raw_data_processing/sql_retriever.py", line 22, in run
    self.mysql_cursor.execute(self.sql_statement)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/cursors.py", line 146, in execute
    result = self._query(query)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/cursors.py", line 296, in _query
    conn.query(q)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 818, in query
    self._execute_command(COMMAND.COM_QUERY, sql)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 1047, in _execute_command
    self.write_packet(sql[:chunk_size])
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 936, in write_packet
    self._write_bytes(data)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymysql/connections.py", line 988, in _write_bytes
    raise err.OperationalError(2006, "MySQL server has gone away (%r)" % (e,))
pymysql.err.OperationalError: (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))"

@methane
Copy link
Member

methane commented Feb 23, 2016

Yes. setting max_allowed_packet to 16KB breaks your queries larger than 16KB.
I hope #431 will fix it. Setting max_allowed_packet to 16*1024*1024-1 is workaround for current version.

@methane
Copy link
Member

methane commented Feb 23, 2016

I'm very sleepy since it's 3:30 am in Japan.
It's too late to wait Travis's output and make new release in today.

I hope I have time to make new release in this week.
Sorry for inconvenient and thank you for your detailed report.

@aldeirm2
Copy link
Author

You certainly don't need to be apologizing. You're supporting this library during your own time and I'm sure everyone appreciates it.

Thank you for your help, very appreciated.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants