Skip to content

io.BufferReader.read() returns None #80050

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
steverpalmer mannequin opened this issue Jan 31, 2019 · 8 comments
Open

io.BufferReader.read() returns None #80050

steverpalmer mannequin opened this issue Jan 31, 2019 · 8 comments
Labels
3.7 (EOL) end of life docs Documentation in the Doc dir topic-IO type-bug An unexpected behavior, bug, or error

Comments

@steverpalmer
Copy link
Mannequin

steverpalmer mannequin commented Jan 31, 2019

BPO 35869
Nosy @vadmium, @steverpalmer
Superseder
  • bpo-13322: The io module doesn't support non-blocking files
  • Files
  • read2.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-01-31.15:21:50.213>
    labels = ['type-bug', '3.7', 'expert-IO', 'docs']
    title = 'io.BufferReader.read() returns None'
    updated_at = <Date 2019-02-01.08:36:11.123>
    user = 'https://github.com/steverpalmer'

    bugs.python.org fields:

    activity = <Date 2019-02-01.08:36:11.123>
    actor = 'martin.panter'
    assignee = 'docs@python'
    closed = False
    closed_date = None
    closer = None
    components = ['Documentation', 'IO']
    creation = <Date 2019-01-31.15:21:50.213>
    creator = 'steverpalmer'
    dependencies = []
    files = ['48091']
    hgrepos = []
    issue_num = 35869
    keywords = []
    message_count = 3.0
    messages = ['334630', '334632', '334655']
    nosy_count = 3.0
    nosy_names = ['docs@python', 'martin.panter', 'steverpalmer']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = '13322'
    type = 'behavior'
    url = 'https://bugs.python.org/issue35869'
    versions = ['Python 3.7']

    Linked PRs

    @steverpalmer
    Copy link
    Mannequin Author

    steverpalmer mannequin commented Jan 31, 2019

    class io.BufferedIOBase states "In addition, those methods [read(), readinto() and write()] can raise BlockingIOError if the underlying raw stream is in non-blocking mode and cannot take or give enough data; unlike their RawIOBase counterparts, they will never return None."

    However, class.io.BufferedReader (inheriting from io.BufferedIOBase) *does* return None in this case. Admittedly, io.BufferedReader does says it is overriding the inherited method, but I'm surprised that change in behaviour declared for buffered objects, is reverted to the RarIOBase behaviour on a more specific class.

    The attached file (a little long - sorry), simulates a slow non-blocking raw file, which it wraps in a BufferReader to test the behaviour defined in BufferedIOBase.

    @steverpalmer steverpalmer mannequin added 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Jan 31, 2019
    @steverpalmer
    Copy link
    Mannequin Author

    steverpalmer mannequin commented Jan 31, 2019

    The description of read in io.BufferedReader.read function states "Read and return size bytes, or if size is not given or negative, until EOF or if the read call would block in non-blocking mode." This does mention the non-block mode scenario, but I can't parse this sentence.

    @vadmium
    Copy link
    Member

    vadmium commented Feb 1, 2019

    This is covered by bpo-13322. There are a few other BufferedReader methods that contradict the documentation for non-blocking mode.

    A while ago I posted a patch to change the implementation to match the documentation, but nobody reviewed it or gave their opinion. These days I would prefer to just documentat the reality: the methods might raise an exception rather than returning None, or perhaps no particular behaviour at all is expected in general in the non-blocking case. But I don’t spend much time on Python now, so you might have to find someone else to move this forward.

    Regarding the entry for “BufferedReader.read”, it would make more sense to remove the last “if”: “if ‘size’ is not given . . ., until EOF, or the ‘read’ call would block”. But I don’t think even that is complete, because it should also say the read stops and returns short in the non-blocking case even when “size” is positive.

    @vadmium vadmium added docs Documentation in the Doc dir topic-IO labels Feb 1, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    cmaloney added a commit to cmaloney/cpython that referenced this issue Feb 27, 2025
    This synchronizes the documentation with the current implementation,
    with particular focus on behavior around non-blocking streams.
    cmaloney added a commit to cmaloney/cpython that referenced this issue Feb 27, 2025
    Synchronize `io.BufferedReader` and `io.BufferedIOBase` documentation
    with the current implementation. PArticular focus on behavior around
    non-blocking streams.
    cmaloney added a commit to cmaloney/cpython that referenced this issue Feb 27, 2025
    Synchronize `io.BufferedReader` and `io.BufferedIOBase` documentation
    with the current implementation. PArticular focus on behavior around
    non-blocking streams.
    cmaloney added a commit to cmaloney/cpython that referenced this issue Feb 27, 2025
    Synchronize `io.BufferedReader` and `io.BufferedIOBase` documentation
    with the current implementation. Focused on behavior around non-blocking
    streams.
    @cmaloney
    Copy link
    Contributor

    cmaloney commented Mar 7, 2025

    I looked at changing this and found three cases reading through the BufferedReader code:

    1. BufferedReader.read(size=10) underlying raw.read(size=10) returns None
    2. BufferedReader.read() without size, raw implements readall() which returns None
    3. BufferedReader.read() without size, raw doesn't have readall so uses a loop that calls raw.read() which returns None.

    All three of these cases consistently return None from the BufferedReader. Definitely could change that behavior, but worry that in the 6 years since this issue was opened more code exists that expects None than have implemented and tested behavior for a BlockingIOError.

    Could potentially add a new mode/flag which would cause it to raise, but Blocking I/O behavior is not well specified in the io library above the "Raw I/O" layer (https://peps.python.org/pep-3116/#non-blocking-i-o). Could definitely be evolved, but I think that would need first a Python Ideas discussion.

    I do think the documentation should be updated to include the None return to reduce surprise. I'm actively working on that.

    miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 21, 2025
    …ythonGH-130653)
    
    (cherry picked from commit e1f8914)
    
    Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
    gpshead pushed a commit that referenced this issue May 21, 2025
    …H-130653) (#134444)
    
    gh-80050: Update BufferedReader.read docs around non-blocking (GH-130653)
    (cherry picked from commit e1f8914)
    
    Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
    gpshead pushed a commit to gpshead/cpython that referenced this issue May 21, 2025
    …cking (pythonGH-130653)
    
    (cherry picked from commit e1f8914)
    
    Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
    gpshead added a commit that referenced this issue May 21, 2025
    …H-130653) (#134445)
    
    (cherry picked from commit e1f8914)
    
    Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
    @cmaloney
    Copy link
    Contributor

    cmaloney commented Jun 4, 2025

    @vadmium , @steverpalmer the docs are now updated to match implementation behavior, can this be closed?

    @steverpalmer
    Copy link
    Contributor

    In short - maybe.

    I understand why the preference is not to change the behaviour of the code given its flying hours and exposure. It may well be that some code out in the wild has been written around the actual behaviour rather than the previously described behaviour. The corollary to this though is that the description can become complex and there needs to be caution in offering simplified summaries.

    (I'm quoting from https://docs.python.org/3/library/io.html#io.BufferedIOBase)

    The summary description in io.BufferedIOBase states:

    In addition, if the underlying raw stream is in non-blocking mode, when the system returns would block write() will raise BlockingIOError with BlockingIOError.characters_written and read() will return data read so far or None if no data is available.

    The detailed description in io.BufferedIOBase.read(...) states:

    Note: When the underlying raw stream is non-blocking, implementations may either raise BlockingIOError or return None if no data is available. io implementations return None.

    The first quote seems to suggest that, in the case under consideration, the read function will return None is no data is available, but the later quote extends this to allow that it might alternatively raise a BlockingIOError, but not by the io module. The two quotes taken together might be confusing.

    I wondered about editing the first quote to match the second, but that felt like duplication and too much detail for a summary. It also seems superfluous to change the first quote (summary) to refer to the second quote (detail). I then wondered about removing the first quote completely - what does it really add? I suspect that it was originally written about a perfectly reasonable, arguably better design intention, but as we now know this intention was carried inconsistently to the implementation. Oddly, the one thing that the first quote does add, not included in v3.12 of the document and that isn't detailed later in the function detail, is that the io.BufferedIOBase.write function BlockingIOError exception includes the charaters_written field.

    My suggestion, then. is to delete the first summary quote entirely; the actual behaviour being described is too complex to simplify in a summary. I would also suggest that the io.BufferedIOBase.write function be extended to reference the BlockingIOError.charaters_written field (as it has been in io.BufferedWriter.write). Perhaps:

    When in non-blocking mode, a BlockingIOError with BlockingIOError.characters_written is raised if the data needed to be written to the raw stream but it couldn’t accept all the data without blocking.

    Nevertheless, the new text is better than the v3.12 text which is simply incorrect. Additionally. the reference from io.BufferedReader.read back to io.BufferedIOBasse.read is also better. If I am being pedantic, please feel free to ignore me.

    @cmaloney
    Copy link
    Contributor

    cmaloney commented Jun 5, 2025

    re: write, slowly working my way through different docs functions :). FileIO and RawIOBase have some corner cases for just .read that don't currently match the docs (gh-129011), but also the behavior has not precisely matched the docs since the first release of the io code shipped over 15 years ago as far as I can tell in my repository spelunking. Without general agreement on what the code actually does, and what is an understandable, precise, and concise way of documenting it I don't see rewriting more being valuable. To me need to get consensus on one as a model for more :).

    re: BufferedIOBase.read I get it's annoying to write both behaviors in the detailed doc still. Unfortunately things which followed the docs as they were before may exist, and there is nothing CPython can really do about that; Python doesn't enforce abstract API / interface shapes in that way. The note is purely a "the old docs said this, and others may have implemented it". Definitely get frustrating, but removing entirely (pretending the docs never said that) doesn't seem right to me nor does including in the main body of the doc (CPython implementation never did that). If you're using CPython and reading CPython's docs, the main paragraph behavior says what the code actually does, the notes give you details on how to be extra defensive in your programming to handle all possible cases that may exist in the Python ecosystem. My hope is the common case (you're just using read with a normal file and blocking I/O) is the very first thing documented and the rest is "skippable" details. For someone new to Python not documenting read() at all when people get it commonly from open(filename, 'rb') doesn't seem like enough... Could maybe drop blocking section entirely, but it's behavior which is used/relied upon today (even by code in CPython for specific cases).

    In any case, glad you find better. Will experiment some more on improving further.

    @steverpalmer
    Copy link
    Contributor

    Thank you for your consideration of my comments. Best of luck with improving the documentation. I'm happy if you want to close this issue. Cheers.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life docs Documentation in the Doc dir topic-IO type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants