Skip to content

File IO issues #547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
silmeth opened this issue Feb 25, 2019 · 7 comments
Closed

File IO issues #547

silmeth opened this issue Feb 25, 2019 · 7 comments

Comments

@silmeth
Copy link
Contributor

silmeth commented Feb 25, 2019

There a few differences between the way CPython handles file IO and the way RustPython does.

  1. RustPython does not currently offer any close() methods, opened files get leaked and can be closed only by invoking os.close(file.raw.fileno) (or os.close(file.buffer.raw.fileno) for files in text mode) after importing the os module.
  2. RustPython currently doesn’t offer text writing mode (which is the default for writing in CPython), so in CPython:
    >>> file = open('/tmp/tst', 'w')
    >>> file
    <_io.TextIOWrapper name='/tmp/tst' mode='w' encoding='UTF-8'>
    one can try to open a file for text writing in RustPython with wt mode, but then writing fails:
    >>>>> file = open('/tmp/tst', 'wt')
    >>>>> file.write('\ntest string')
    Traceback (most recent call last):
      File <stdin>, line 0, in <module>
    AttributeError: 'TextIOWrapper' object has no attribute 'write'
    and files for writing are by default opened in binary mode.
  3. RustPython’s fileno is a property, while in CPython it is a method returning the underlying file descriptor.
  4. CPython doesn’t rely on the fileno() value for file interactions, instead it uses internal file handle while fileno() just returns it as a Python int, so one can modify it, and the file object will still be referring to the original file:
    >>> file = open('/tmp/test', 'wb')
    >>> file.write(b'writing to the original file\n')
    29
    >>> file.raw.fileno = lambda: 1  # stdout descriptor
    >>> file.write(b'again writing to the original file\n')
    35
    >>> 
    while in RustPython, changing the fileno property changes the underlying file:
    >>>>> file = open('/tmp/test', 'wb')
    >>>>> file.write(b'writing to the original file\n')
    29
    >>>>> file.raw.fileno = 1  # stdout descriptor
    >>>>> file.write(b'this will appear in the stdout instead\n')
    this will appear in the stdout instead
    39
    >>>>> 
  5. Some file attributes in CPython are protected from modifications, eg.:
    >>> file = open('/tmp/test', 'wb')
    >>> file.raw.mode = 'rb'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: attribute 'mode' of '_io.FileIO' objects is not writable

And then there are some attributes missing (eg. RustPython files do not have a mode attribute at all).

The first three issues seem easy to fix, but the 4. and 5. need changes in the way RustPython handles objects (they mostly behave like regular Python objects, but need custom payload and properties protected from being overwritten).

@silmeth
Copy link
Contributor Author

silmeth commented Feb 27, 2019

As to 2. – it seems that io.TextIOWrapper in CPython cannot be created with an object that does not implement readable(), writable() and seekable() methods (and a closed attribute), so we should check for them when creating the TextIOWrapper ourselves. That means that BufferedReader and BufferedWriter need to implement them.

What’s interesting, TextIOWrapper depends on the value returned by them at the initialization and disregards later changes, so if the wrapper value returns True from writable() at the initialization, it will be treated as such until closed, and also the wrapper has some internal buffering, see:

>>> class PseudoFile:
...   def write(self, buf):
...     print(f'printing buffer: {buf.decode("utf-8")}')
...     return 4
...   def read(self):
...     return b'a'
...   def flush(self):
...     print('flushing')
...   def readable(self):
...     return True
...   def writable(self):
...     res = self._num == 0
...     self._num += 1
...     return res
...   def seekable(self):
...     return False
...   def __init__(self):
...     self.closed = False
...     self._num = 0
... 
>>> 
>>> pseudoWrapper = io.TextIOWrapper(PseudoFile())
>>> pseudoWrapper.buffer.writable()
False
>>> pseudoWrapper.write('noël')
5
>>> pseudoWrapper.write('noël')
5
>>> pseudoWrapper.flush()
printing buffer: noëlnoël
flushing

Also, the TextIOWrapper seems to always return the length of the string written (in unicode code points, hence the string noël has length 5, even though it is 4 graphemes encoded in 6 bytes – it uses two codepoints for ), and doesn’t care that the wrapped writer always reports that it has written only 4 bytes (see the def write(self) in PseudoFile), which would make a corrupt, non-unicode, string here. It doesn’t try to re-write the not-yet-written bytes, and doesn’t throw any exception on the wrong number.

@coolreader18
Copy link
Member

@silmeth is this resolved now? There have been a fair amount of io module changes since this was opened, but I'm not sure if they addressed what you discussed here.

@silmeth
Copy link
Contributor Author

silmeth commented Jul 3, 2019

It seems it is not. There is some progress (eg. files are opened by default in text, not binary, mode, as before), but still:

(and I believe the other two are also still true but didn’t test them, they are lower priority anyway, and the last one is not IO-specific)

Run on master, commit f34f16b2cc0422116f9044df1f4818bcc6941a4c:

% RUSTPYTHONPATH=~/Projekty/RustPython/Lib cargo run --release
    Finished release [optimized] target(s) in 0.18s
     Running `target/release/rustpython`
Welcome to the magnificent Rust Python 0.0.1 interpreter 😱 🖖
>>>>> # Correct text mode (with TextIOWrapper):
>>>>> file = open('/tmp/tst', 'w')
>>>>> file
<TextIOWrapper object at 0x560ef58f43e0>
>>>>> 
>>>>> # No writing for text files:
>>>>> file.write('\ntest string')
Traceback (most recent call last):
  File "<stdin>", line 0, in <module>
AttributeError: "'TextIOWrapper' object has no attribute 'write'"
>>>>> 
>>>>> # No closing:
>>>>> file.close()
Traceback (most recent call last):
  File "<stdin>", line 0, in <module>
AttributeError: "'TextIOWrapper' object has no attribute 'close'"
>>>>> file.buffer.close()
Traceback (most recent call last):
  File "<stdin>", line 0, in <module>
AttributeError: "'BufferedWriter' object has no attribute 'close'"
>>>>> file.buffer.raw.close()
Traceback (most recent call last):
  File "<stdin>", line 0, in <module>
AttributeError: "'FileIO' object has no attribute 'close'"
>>>>> 
>>>>> # fileno is still a property, not a method:
>>>>> file.buffer.raw.fileno
3

cf. with CPython 3.7:

% python3
Python 3.7.3 (default, Apr  3 2019, 05:39:12) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> file = open('/tmp/tst', 'w')
>>> file
<_io.TextIOWrapper name='/tmp/tst' mode='w' encoding='UTF-8'>
>>> file.write('\ntest string')
12
>>> file.buffer.close()
>>> file.close()
>>> file.buffer.raw.fileno
<built-in method fileno of _io.FileIO object at 0x7f4eb195fc60>

@youknowone
Copy link
Member

it seems close is now implemented. can someone check this issue is done or not?

@dralley
Copy link
Contributor

dralley commented Dec 6, 2020

Step 2 works:

Welcome to the magnificent Rust Python 0.1.2 interpreter 😱 🖖
>>>>> file = open('/tmp/tst', 'wt')
>>>>> file.write('\ntest string')
12
>>>>> 

Writing w/o text mode works

Welcome to the magnificent Rust Python 0.1.2 interpreter 😱 🖖
>>>>> file = open('/tmp/tst', 'w')
>>>>> file.write('\ntest string')
12

Closing buffers etc. works

>>>>> file.buffer
<_io.BufferedWriter name='/tmp/tst'>
>>>>> file.buffer.close()
>>>>> file.buffer.raw
<FileIO object at 0x56000ba4ebb0>
>>>>> file.buffer.raw.close()

fileno is a method now rather than a property

>>>>> file.buffer.raw.fileno
<bound method of <FileIO object at 0x56000ba4ebb0>>

The behavior file.raw for both implementations is completely different from what was originally documented in this issue and is now identical between cPython and RustPython. It appears that in newer versions of Python you cannot access the raw buffer at all, so none of the modifications in step 4 and 5 are possible.

>>>>> file.raw
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'TextIOWrapper' object has no attribute 'raw'
>>> file.raw
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: '_io.TextIOWrapper' object has no attribute 'raw'

The Python documentation says:

raw

The underlying raw stream (a RawIOBase instance) that BufferedIOBase deals with. This is not part of the BufferedIOBase API and may not exist on some implementations.

Given all of this, I think the issue can be closed.

@dralley
Copy link
Contributor

dralley commented Dec 6, 2020

@youknowone ^

@youknowone
Copy link
Member

Thank you for confirming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants