gh-127647: Add typing.Reader and Writer protocols #127648

srittau · 2024-12-05T18:55:08Z

Issue: Add simple Reader and Writer protocols #127647

📚 Documentation preview 📚: https://cpython-previews--127648.org.readthedocs.build/

srittau · 2024-12-05T19:00:46Z

A few design considerations:

I used the names Reader and Writer to match existing ABC names in typing/collections.abc. Alternatives are Readable and Writable, but I think they are used more rarely for these kind of (pseudo-)protocols. SupportsX would work for Writer, but not for Reader, since the latter supports multiple methods. (Any we maybe want to use SupportsRead etc. at some point for tighter protocols.)
I would prefer these protocols to be in io, where they fit better thematically, but that would require importing typing with unforseeable (performance) implications for such a foundational module.
I deliberated using tighter protocols, but for ease of use reasons – and since they are mostly an alternative to using IO et al. – I went with larger protocols for now.
Seekable (with methods seek and tell) would be an interesting addition as well, but I think this should wait for easy protocol composition.

srittau · 2024-12-05T19:24:01Z

https://cpython-previews--127648.org.readthedocs.build/en/127648/library/typing.html#abcs-and-protocols-for-working-with-i-o

Lib/typing.py

picnixz

I'm a bit sad that we can't use covariance and contravariance in the protocols since a subclass could use it with an invariant type.

Lib/typing.py

Doc/library/typing.rst

Small improvements to the docstrings and signature

Doc/library/typing.rst

JelleZijlstra · 2024-12-05T20:43:28Z

Doc/library/typing.rst

+
+   Protocol for reading from a file or other input stream.
+
+   .. method:: read(size=...)


Suggested change

.. method:: read(size=...)

.. method:: read(size=..., /)

(Same for other methods)

I was unsure of how much of the signature to document. For example, would it also make sense to add argument and/or return types? I've added the slashes for now.

Doc/library/typing.rst

Lib/typing.py

AlexWaygood · 2024-12-06T11:09:38Z

I would prefer these protocols to be in io, where they fit better thematically, but that would require importing typing with unforseeable (performance) implications for such a foundational module.

io feels like a more natural home to me too. Could you not do something similar to the pseudo-protocol ABCs in collections.abc, contextlib and os.path? The approach there is:

Make the classes ABCs rather than protocols
But document them as protocols

And write __subclasshook__ methods so that they behave identically to runtime-checkable Protocols in their isinstance()/issubclass() behaviour:

cpython/Lib/contextlib.py

Lines 34 to 38 in 023b7d2

    
           @classmethod 
        
           def __subclasshook__(cls, C): 
        
               if cls is AbstractContextManager: 
        
                   return _collections_abc._check_methods(C, "__enter__", "__exit__") 
        
               return NotImplemented

And special-case the ABCs in typing.py so that the typing module treats them identically to protocols:

cpython/Lib/typing.py

Lines 1940 to 1948 in 023b7d2

    
           _PROTO_ALLOWLIST = { 
        
               'collections.abc': [ 
        
                   'Callable', 'Awaitable', 'Iterable', 'Iterator', 'AsyncIterable', 
        
                   'AsyncIterator', 'Hashable', 'Sized', 'Container', 'Collection', 
        
                   'Reversible', 'Buffer', 
        
               ], 
        
               'contextlib': ['AbstractContextManager', 'AbstractAsyncContextManager'], 
        
               'os': ['PathLike'], 
        
           }

And pretend in typeshed that they're protocols, so that type checkers treat them as protocols (I feel like python/typeshed@f554f54 missed the point there slightly)

srittau · 2024-12-06T11:14:26Z

That feels terribly hackish to me, even if there is precedence. And in my experience, hackish stuff often falls on one's feet one way or the other. But I can move it to io if that's the consensus.

AlexWaygood · 2024-12-06T11:18:05Z

This has been the way the collections.abc and contextlib ABCs have always worked. As far as I'm aware, it's worked pretty well up to now. And it doesn't actually feel like the most hackish part of collections.abc ;)

cpython/Lib/collections/__init__.py

Lines 32 to 33 in 023b7d2

    
           _sys.modules['collections.abc'] = _collections_abc 
        
           abc = _collections_abc

srittau · 2024-12-06T11:25:36Z

Well, considering that typing.is_protocol uses the _is_protocol marker to determine whether something is a protocol, I guess that's the official way to mark protocols at runtime, not the fact that they are derived from Protocol. Which contradicts the typing spec at the moment. This all seems a bit precarious to me.

AlexWaygood · 2024-12-06T11:29:03Z

I'm open to something less hacky if you can find a generalized solution that doesn't make _collections_abc depend on typing (which would significantly slow down startup for all Python applications, since _collections_abc is imported as part of Python's initialisation).

AlexWaygood · 2024-12-06T11:31:14Z

considering that typing.is_protocol uses the _is_protocol marker to determine whether something is a protocol, I guess that's the official way to mark protocols at runtime, not the fact that they are derived from Protocol. Which contradicts the typing spec at the moment.

I don't think the typing spec needs to be cognisant of the hacks we use to make things work at runtime in the stdlib! This knowledge is irrelevant for third-party users of protocols, to users of type checkers and to implementers of type checkers. It's definitely not encouraged for any external users to make use of the _is_protocol attribute.

srittau · 2024-12-06T12:01:41Z

CI seems flaky, failures are unrelated.

Doc/library/io.rst

Doc/library/typing.rst

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

AlexWaygood

This LGTM now but it might be good to have one or two simple tests covering the new code. It probably doesn't need to be much more than this (added to test_io.py)

cpython/Lib/test/test_typing.py

Lines 4260 to 4262 in a083633

    
           def test_supports_index(self): 
        
               self.assertIsSubclass(int, typing.SupportsIndex) 
        
               self.assertNotIsSubclass(str, typing.SupportsIndex)

and maybe an addition to the test_typing tests similar to this one:

cpython/Lib/test/test_typing.py

Lines 4274 to 4303 in a083633

    
           def test_collections_protocols_allowed(self): 
        
               @runtime_checkable 
        
               class Custom(collections.abc.Iterable, Protocol): 
        
                   def close(self): ... 
        
               class A: pass 
        
               class B: 
        
                   def __iter__(self): 
        
                       return [] 
        
                   def close(self): 
        
                       return 0 
        
               self.assertIsSubclass(B, Custom) 
        
               self.assertNotIsSubclass(A, Custom) 
        
               @runtime_checkable 
        
               class ReleasableBuffer(collections.abc.Buffer, Protocol): 
        
                   def __release_buffer__(self, mv: memoryview) -> None: ... 
        
               class C: pass 
        
               class D: 
        
                   def __buffer__(self, flags: int) -> memoryview: 
        
                       return memoryview(b'') 
        
                   def __release_buffer__(self, mv: memoryview) -> None: 
        
                       pass 
        
               self.assertIsSubclass(D, ReleasableBuffer) 
        
               self.assertIsInstance(D(), ReleasableBuffer) 
        
               self.assertNotIsSubclass(C, ReleasableBuffer) 
        
               self.assertNotIsInstance(C(), ReleasableBuffer)

AlexWaygood

This is great, thank you! Just a few docs nits remaining

Doc/library/io.rst

Doc/library/typing.rst

Lib/io.py

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

AlexWaygood · 2025-02-27T12:59:23Z

I'll leave it a little bit in case any of the other reviewers here have any remaining concerns before merging

AA-Turner · 2025-02-27T13:16:29Z

Lib/io.py

+    __slots__ = ()
+
+    @abc.abstractmethod
+    def read(self, size=..., /):


Sorry if this has been discussed before, but I'm unsure on the runtime use of size=... (I didn't notice this earlier in my documentation review, sorry).

Almost every other read(size) method I can find has a default of either None or -1. I also can't find another method in the stdlib with a default of ... (outside of the recently-added protocols in wsgiref.types).

Would it be better to have size=-1, to indicate that the method takes an int? I'm not sure how much we want typeshed-like practices to leak into the standard library.

Suggested change

def read(self, size=..., /):

def read(self, size=-1, /):

A

Mandating defaults is not really something you can do in a protocol. I also wouldn't want to mandate that implementors have to use a default of -1, because – as you said – some implementations use None.

Right, but this isn't a protocol -- it's an ABC, which do have defaults -- see e.g. collections.abc.Generator. All of the read() methods in io are documented as having read(size=-1, /), and given these ABCs are going into io, I think we should be consistent with that interface, or have good reason to diverge from it (& document why).

https://docs.python.org/3/library/io.html#io.RawIOBase.read

https://docs.python.org/3/library/io.html#io.BufferedIOBase.read

https://docs.python.org/3/library/io.html#io.BufferedReader.read

https://docs.python.org/3/library/io.html#io.TextIOBase.read

It's supposed to be a protocol, not an ABC. (Notwithstanding the fact that all protocols are ABCs.) It's just a protocol in the implementation for performance reasons. And using -1 as a default would give users the very wrong impression that they can use read(-1) when that may or may not actually be supported.

using -1 as a default would give users the very wrong impression that they can use read(-1) when that may or may not actually be supported.

From the documentation of io.RawIOBase.read():

Read up to size bytes from the object and return them. As a convenience, if size is unspecified or -1, all bytes until EOF are returned.

I would expect the io.Reader.read() ABC/protocol to have this same guarantee, for a 'properly' implemented read() method (according to the io expecations). In the proposed documentation, we say:

Read data from the input stream and return it. If size is specified, it should be an integer, and at most size items (bytes/characters) will be read.

This forbids None (good!), but is silent on what happens should size be omitted. I still think we should use -1 instead of ..., but at the very least we should include in the documentation the contract for what happens when read() is called with no arguments.

I disagree. -1 is the default for some implementations, but the protocol should not make any default mandatory.

I'm still not sure I follow, though -- currently the ABC/protocol has the mandatory default of size=..., which is always invalid and not the correct type. -1 is valid for all interfaces specified by the io documentation, which is where this new type is being added.

I ran the following with mypy --strict and it passed, so I don't think type checkers care about default values (and as discussed, the type forbids using non-integer types):

import abc class Reader(metaclass=abc.ABCMeta): __slots__ = () @abc.abstractmethod def read(self, size: int = -1, /) -> bytes: pass class CustomReader(Reader): def read(self, size: int = -1, /) -> bytes: return b'' class CustomReaderZero(Reader): def read(self, size: int = 0, /) -> bytes: return b'' assert issubclass(CustomReader, Reader) assert issubclass(CustomReaderZero, Reader) assert isinstance(CustomReader(), Reader) assert isinstance(CustomReaderZero(), Reader) def reader_func(r: Reader) -> None: r.read() reader_func(CustomReader()) reader_func(CustomReaderZero())

I agree with Sebastian here; we should use ... because the protocol need not mandate any particular default.

-1 is valid for all interfaces specified by the io documentation

The protocol should also match other file-like classes defined elsewhere in the stdlib or even in third-party libraries. When defining a protocol it's often useful to be permissive, so that all objects that are intended to match the protocol actually match it.

srittau · 2025-03-06T09:44:58Z

Could this be merged before the next alpha next week?

bluetech · 2025-03-07T00:00:41Z

I only saw the update now, great that it's merged!

I have a few comments. I hope they're not too bothersome - I just think these protocols are important so it's worth the effort.

Can the docs include type annotations? I know that Sphinx supports it. Given that this is meant for static typing, it makes sense to me to include.

I think it is important to document the expected semantics of these methods, otherwise you don't really know what you're going to get when you accept a Reader or Writer. What I can think of:

What should happen on EOF?
Is EOF sticky?
Short reads/writes
What does it mean to omit size?
What happens on size 0?
Is size allowed to be -1 or more generally negative?
Can read return value be None? Empty?
Can write return value be 0?
Expected exceptions -- IOError?

I don't like the parameter name size, because from the caller perspective it is not exactly a size of anything. I would have preferred just n. But, I see this name is already used e.g. IOBase and consistency probably wins the argument here.

The docs say "If size is specified, it should be an integer, and at most size items (bytes/characters) will be read" and "Write data to the output stream and return the number of items (bytes/characters) written". I would drop the "(bytes/characters)" since T is not necessarily that.

The docs say "The following protocols can be used for annotating function and method arguments for simple stream reading or writing operations". I have two nitpicks:

What is annotated is the parameter, not the argument (= a value for a parameter in a specific invocation). So would change "arguments" -> "parameters".
I don't think it is just for simple operations, so I'd remove "simple".

I think T should have some bound. I think at least Sized, it's hard to interpret the return value without a notion of size.

srittau · 2025-03-09T14:01:29Z

A merged PR is probably not the best place to discuss this, but a few points:

Can the docs include type annotations?

Personally, I'd be fine with, and actually prefer if the docs used type annotations more, but I didn't want to divert from the existing style in the documentation.

I think it is important to document the expected semantics of these methods

While it would make sense to me to document some of those semantics (e.g. size must be >= 1), most are not part of this protocol. Exceptions and EOF are not concepts that are understood by those protocols.

I would drop the "(bytes/characters)" since T is not necessarily that.

It might have been clearer to say "(e.g. bytes or characters)", but I think calling out those specific types here makes the documentation clearer, as in 99% of cases it will be either of those. Otherwise the concept of "item" is a bit nebulous.

I think T should have some bound.

I don't think it's necessary to make the protocol more complex. Whoever is using the protocol will have to choose an type that they can work with.

srittau · 2025-03-09T14:11:30Z

Also in general, protocols are not a replacement for API documentation. It doesn't make too much sense to include expectations around a protocol that can't be checked statically. To document these expectations is the responsibility of the author of a function that uses these protocols, and to follow them is the responsibility of the user. These expectations can vary significantly from function to function.

bluetech · 2025-03-10T23:50:36Z

(Please feel free to ignore this if you don't want to discuss this further -- I couldn't resist posting another comment...)

If I understand correctly, you prefer a narrow approach to these protocols because 1) they can't be checked statically 2) semantics can vary between uses. My replies:

In an ideal world, it would be possible to convey all semantics in a static way. But if it's not possible, documentation is the alternative. Stated differently, the Platonic ideal of the Reader interface contains all of its (agreed) semantics, but its projection onto the Python type system is incomplete, so the documentation should try its best to fill the gap. That's how I view it at least.
Why should the semantics vary between uses? A single set of semantics has a big advantage - it reduces (N impls * M uses) coordination points to N + M. As an implementer, I can (implicitly or explicitly) follow the protocol and don't need to invent or document my semantics. As a user, I write to a single set of semantics, not needing to handle multiple sets, defensively protect against unknown ones, or document my expectations. As a "middleware" (e.g. a BufferedReader which adds buffering to any reader), I can take any generic reader. What's the advantage of allowing implementations to be inconsistent on something like how EOF is signaled?

Cf. python/cpython#127648

Add typing.Reader and Writer protocols

b45fec0

srittau requested review from JelleZijlstra and AlexWaygood as code owners December 5, 2024 18:55

bedevere-app bot mentioned this pull request Dec 5, 2024

Add simple Reader and Writer protocols #127647

Closed

bedevere-app bot added the awaiting review label Dec 5, 2024

srittau added 4 commits December 5, 2024 20:02

Add a note about Iterable

1525e05

Fix docs formatting

7867ec1

Small wording improvements

6a22a02

Simplify the docs/improve formatting

5d632a3

sobolevn reviewed Dec 5, 2024

View reviewed changes

Lib/typing.py Outdated Show resolved Hide resolved

picnixz reviewed Dec 5, 2024

View reviewed changes

Lib/typing.py Outdated Show resolved Hide resolved

Doc/library/typing.rst Outdated Show resolved Hide resolved

Doc/library/typing.rst Outdated Show resolved Hide resolved

Explicitly document the methods

4d50c2e

Small improvements to the docstrings and signature

JelleZijlstra reviewed Dec 5, 2024

View reviewed changes

Mark protocol members as abstract

1e1ea41

This was referenced Dec 6, 2024

Protocol should be documented as using ABCMeta as metaclass #127677

Open

Protocols in wsgiref.types should use @abstractmethod #127678

Open

srittau added 4 commits December 6, 2024 12:01

Add .. versionadded

56a38a0

Added slashes to documented signatures

f2c331b

Fix overindentation

6764b6a

Fix documentation of Reader.__iter__()

022acaa

Remove the @runtime_checkable flags

b86073d

AA-Turner reviewed Feb 27, 2025

View reviewed changes

Doc/library/io.rst Outdated Show resolved Hide resolved

AA-Turner reviewed Feb 27, 2025

View reviewed changes

Doc/library/typing.rst Show resolved Hide resolved

AA-Turner reviewed Feb 27, 2025

View reviewed changes

Doc/library/typing.rst Outdated Show resolved Hide resolved

srittau and others added 2 commits February 27, 2025 13:08

Fix references in docs

3b5975e

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

Split signature

96080fe

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

AlexWaygood reviewed Feb 27, 2025

View reviewed changes

srittau added 3 commits February 27, 2025 13:38

Document that Reader and Writer are generic

3723370

Add tests

76003a8

Add missing import

43e23f0

AlexWaygood approved these changes Feb 27, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Feb 27, 2025

Doc fixes

c644770

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

AA-Turner reviewed Feb 27, 2025

View reviewed changes

JelleZijlstra approved these changes Mar 1, 2025

View reviewed changes

Merge branch 'main' into typing-readable-writable

bfab2fd

JelleZijlstra merged commit c6dd234 into python:main Mar 6, 2025
39 checks passed

bedevere-app bot removed the awaiting merge label Mar 6, 2025

srittau deleted the typing-readable-writable branch March 6, 2025 16:02

srittau added a commit to srittau/typing_extensions that referenced this pull request Apr 9, 2025

Add Reader and Writer protocols

dad6e79

Cf. python/cpython#127648

srittau mentioned this pull request Apr 9, 2025

Add Reader and Writer protocols python/typing_extensions#582

Merged

seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025

pythongh-127647: Add typing.Reader and Writer protocols (python#127648)

81088c4

mdboom added a commit to mdboom/cpython that referenced this pull request Apr 25, 2025

pythongh-127648: Improve Python startup time by ~12%

9fc1238

yashwanthatla mentioned this pull request Jun 21, 2025

Replication of original PR #582: Add Reader and Writer protocols yashwanthatla/typing_extensions#4

Open


		Protocol for reading from a file or other input stream.

		.. method:: read(size=...)

	def test_supports_index(self):
	self.assertIsSubclass(int, typing.SupportsIndex)
	self.assertNotIsSubclass(str, typing.SupportsIndex)

	def test_collections_protocols_allowed(self):
	@runtime_checkable
	class Custom(collections.abc.Iterable, Protocol):
	def close(self): ...

	class A: pass
	class B:
	def __iter__(self):
	return []
	def close(self):
	return 0

	self.assertIsSubclass(B, Custom)
	self.assertNotIsSubclass(A, Custom)

	@runtime_checkable
	class ReleasableBuffer(collections.abc.Buffer, Protocol):
	def __release_buffer__(self, mv: memoryview) -> None: ...

	class C: pass
	class D:
	def __buffer__(self, flags: int) -> memoryview:
	return memoryview(b'')
	def __release_buffer__(self, mv: memoryview) -> None:
	pass

	self.assertIsSubclass(D, ReleasableBuffer)
	self.assertIsInstance(D(), ReleasableBuffer)
	self.assertNotIsSubclass(C, ReleasableBuffer)
	self.assertNotIsInstance(C(), ReleasableBuffer)

Uh oh!

gh-127647: Add typing.Reader and Writer protocols #127648

gh-127647: Add typing.Reader and Writer protocols #127648

Uh oh!

Conversation

srittau commented Dec 5, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srittau commented Dec 5, 2024

Uh oh!

srittau commented Dec 5, 2024

Uh oh!

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexWaygood commented Dec 6, 2024

Uh oh!

srittau commented Dec 6, 2024

Uh oh!

AlexWaygood commented Dec 6, 2024

Uh oh!

srittau commented Dec 6, 2024

Uh oh!

AlexWaygood commented Dec 6, 2024

Uh oh!

AlexWaygood commented Dec 6, 2024

Uh oh!

srittau commented Dec 6, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexWaygood left a comment

Choose a reason for hiding this comment

Uh oh!

AlexWaygood left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexWaygood commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srittau Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srittau commented Mar 6, 2025

Uh oh!

srittau commented Dec 5, 2024 •

edited by github-actions bot

Loading

AlexWaygood commented Feb 27, 2025 •

edited

Loading

srittau Feb 27, 2025 •

edited

Loading