Skip to content

gh-112064: fix incorrect handling of negative read sizes in HTTPResponse.read() #128270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 28, 2025

Conversation

manushkin
Copy link
Contributor

@manushkin manushkin commented Dec 26, 2024

Param value amt can be negative, e.x. -1.
In that case we read all data, with chunk separators, instead of correct reading.

P.S. example of using -1 here:
https://github.com/Textualize/rich/blob/master/rich/progress.py#L247

Param value `amt` can be negative, e.x. -1.
In that case we read all data, with chunk separators, instead of correct reading.

P.S. example of using -1 here:
https://github.com/Textualize/rich/blob/master/rich/progress.py#L247
@ghost
Copy link

ghost commented Dec 26, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Dec 26, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@manushkin
Copy link
Contributor Author

Should I create an issue/bug before that PR?

@picnixz
Copy link
Member

picnixz commented Dec 26, 2024

Should I create an issue/bug before that PR?

Yes.


Now, the prototype of HTTPResponse.read() is wrong and this is the one that should be fixed instead. If you look at the call chain, you'll see that passing amt=-1 actually leads to a lot of issues. For instance:

while (chunk_left := self._get_chunk_left()) is not None:
    if amt is not None and amt <= chunk_left:
        value.append(self._safe_read(amt))
        self.chunk_left = chunk_left - amt
        break

We have an issue with self.chunk_left = chunk_left - amt and self._safe_read. Instead, we should:

  • assert that amt is None or amt >= 0 in private methods
  • change amt = -1 to amt = None when calling read(). Namely read(-1) should be equivalent to read(None).

Btw, negative reading size means "read everything you can" (see: https://docs.python.org/3/library/io.html#io.BufferedIOBase.read).

@manushkin
Copy link
Contributor Author

I created issue: #128271

@picnixz picnixz changed the title Fixed http read_chunked for amt with negative value gh-128271: fix incorrect handling of negative read sizes in HTTPResponse.read() Dec 26, 2024
@picnixz
Copy link
Member

picnixz commented Dec 26, 2024

Considering the tests are failing and that the implementation should handle negative reading sizes as being equivalent to None, I'm converting this PR to a draft PR. When everything is corrected, you can ask for a review from me.

@picnixz picnixz marked this pull request as draft December 26, 2024 10:11
@picnixz
Copy link
Member

picnixz commented Dec 26, 2024

By the way, we need some tests for that. But focus on fixing the implementation first and I can help you for the tests later.

@bedevere-app
Copy link

bedevere-app bot commented Dec 26, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@picnixz
Copy link
Member

picnixz commented Dec 26, 2024

  • You can forget about hypothesis tests (the failure is known)

  • You should add a NEWS entry via blurb. Something like:

    Fix incorrect handling of negative read sizes in :meth:`HTTPResponse.read
    <http.client.HTTPResponse.read>`. Patch by YOUR_NAME.

    If you don't want to, you can leave out "Patch by [...]".

  • We can now test this. You can improve BasicTest.test_chunked in test_httplib.py by adding a test case for None and -1. You can take inspiration from

    for n in range(1, 12):
        sock = FakeSocket(chunked_start + last_chunk + chunked_end)
        resp = client.HTTPResponse(sock, method="GET")
        resp.begin()
        self.assertEqual(resp.read(n) + resp.read(n) + resp.read(), expected)
        resp.close()

    where you'll need to modify n and adapt the assertEqual assertion.

@picnixz picnixz marked this pull request as ready for review December 26, 2024 13:23
@picnixz picnixz self-requested a review December 26, 2024 13:24
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is something I haven't seen in the test_chunked(), but shouldn't we set {'Transfer-Encoding': 'chunked'} as well? Just to be sure, can you locally check that resp.chunked is true? (to check that we are indeed using the method you fixed)

@illia-v
Copy link
Contributor

illia-v commented Dec 26, 2024

FYI, we've had a similar change done in urllib3 urllib3/urllib3#3356

@picnixz
Copy link
Member

picnixz commented Dec 26, 2024

Thanks for the information. We could also change our own _read1_chunked but the latter already handles negative n so... it should be fine on our side. However, @manushkin could you perhaps add some tests for read1(None) or read1(-123) as well (if there are none?) [you can also decide to make a follow-up PR for that]

@picnixz
Copy link
Member

picnixz commented Dec 27, 2024

Note: I didn't quite understand whether you confirmed or not my suggestion of keeping it _read_chunked().

I think it's probably better to keep it in _read_chunked. While making the check in read() would make the two versions (non-chunked and chunked) eventually equivalent most of the time, we would still be reading fp in terms of chunk and not in one go, doing multiple calls (I've extracted the relevant lines of _read_chunked):

while (chunk_left := self._get_chunk_left()) is not None:
    value.append(self._safe_read(chunk_left))

If the HTTP response is meant to be read by chunks, we should do it so since "read as many bytes you can in chunks" and "read as many bytes you can in one go" may not be equivalent (depending on how the server sends the data, right?)

@illia-v
Copy link
Contributor

illia-v commented Dec 27, 2024

Let me clarify my point 🙂

This PR makes read handle negative integers and None the same for the if self.chunked case.

But if the response is not chunked, there are still different code paths for negative ints and None. So to fix read fully, the read method needs to handle negative integers anyway, either like in the following diff or in its beginning to pass only positive integers or None to self._read_chunked.

diff --git a/Lib/http/client.py b/Lib/http/client.py
index 1c0332d82bd..33a858d34ae 100644                                                                                                                                                                                            
--- a/Lib/http/client.py                                                                                                                                                                                                         
+++ b/Lib/http/client.py                                                                                                                                                                                                         
@@ -472,7 +472,7 @@ def read(self, amt=None):                                                                                                                                                                                    
         if self.chunked:                                                                                                                                                                                                        
             return self._read_chunked(amt)                                                                                                                                                                                      
                                                                                                                                                                                                                                 
-        if amt is not None:                                                                                                                                                                                                     
+        if amt is not None and amt >= 0:                                                                                                                                                                                        
             if self.length is not None and amt > self.length:                                                                                                                                                                   
                 # clip the read to the "end of response"                                                                                                                                                                        
                 amt = self.length   

@picnixz
Copy link
Member

picnixz commented Dec 27, 2024

if amt is not None and amt >= 0:

This is one is not really needed because we would bypass amt > self.length and directly call self.fp.read(amt) which would delegate to BufferIOBase.read() which handles negative ones correctly. But I agree that it's probably better to use the None path.

@illia-v
Copy link
Contributor

illia-v commented Dec 27, 2024

if amt is not None and amt >= 0:

This is one is not really needed because we would bypass amt > self.length and directly call self.fp.read(amt) which would delegate to BufferIOBase.read() which handles negative ones correctly. But I agree that it's probably better to use the None path.

In most cases it's not needed indeed, but there is some logic related to raising IncompleteRead in the None path which is skipped for negative ints currently

@manushkin
Copy link
Contributor Author

Is there anything else expected of me?

@picnixz
Copy link
Member

picnixz commented Jan 2, 2025

Is there anything else expected of me?

No. At least not until a core dev has reviewed it this PR. Maybe @vadmium wants to have a look?

@vadmium
Copy link
Member

vadmium commented Jan 3, 2025

This looks okay to me

@Eclips4 Eclips4 changed the title gh-128271: fix incorrect handling of negative read sizes in HTTPResponse.read() gh-112064: fix incorrect handling of negative read sizes in HTTPResponse.read() Jan 28, 2025
Copy link
Member

@Eclips4 Eclips4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the PR.
Assuming that previous behavior was incorrect, I prefer to backport to 3.12 and 3.13.

@Eclips4 Eclips4 added needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Jan 28, 2025
@Eclips4 Eclips4 enabled auto-merge (squash) January 28, 2025 11:31
@Eclips4 Eclips4 merged commit 4d0d24f into python:main Jan 28, 2025
46 checks passed
@miss-islington-app
Copy link

Thanks @manushkin for the PR, and @Eclips4 for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jan 28, 2025
…TPResponse.read()` (pythonGH-128270)

The parameter `amt` of `HTTPResponse.read()`, which could be a negative integer,
has not been handled before and led to waiting for the connection to close
for `keep-alive connections`. Now, this has been fixed, and passing negative values
to `HTTPResponse().read()` works the same as passing `None` value.
(cherry picked from commit 4d0d24f)

Co-authored-by: Yury Manushkin <manushkin@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Jan 28, 2025

GH-129395 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Jan 28, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jan 28, 2025
…TPResponse.read()` (pythonGH-128270)

The parameter `amt` of `HTTPResponse.read()`, which could be a negative integer,
has not been handled before and led to waiting for the connection to close
for `keep-alive connections`. Now, this has been fixed, and passing negative values
to `HTTPResponse().read()` works the same as passing `None` value.
(cherry picked from commit 4d0d24f)

Co-authored-by: Yury Manushkin <manushkin@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Jan 28, 2025

GH-129396 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 only security fixes label Jan 28, 2025
Eclips4 pushed a commit that referenced this pull request Jan 28, 2025
…TTPResponse.read()` (GH-128270) (#129396)

gh-112064: Fix incorrect handling of negative read sizes in `HTTPResponse.read()` (GH-128270)

The parameter `amt` of `HTTPResponse.read()`, which could be a negative integer,
has not been handled before and led to waiting for the connection to close
for `keep-alive connections`. Now, this has been fixed, and passing negative values
to `HTTPResponse().read()` works the same as passing `None` value.
(cherry picked from commit 4d0d24f)

Co-authored-by: Yury Manushkin <manushkin@gmail.com>
Eclips4 pushed a commit that referenced this pull request Jan 28, 2025
…TTPResponse.read()` (GH-128270) (#129395)

gh-112064: Fix incorrect handling of negative read sizes in `HTTPResponse.read()` (GH-128270)

The parameter `amt` of `HTTPResponse.read()`, which could be a negative integer,
has not been handled before and led to waiting for the connection to close
for `keep-alive connections`. Now, this has been fixed, and passing negative values
to `HTTPResponse().read()` works the same as passing `None` value.
(cherry picked from commit 4d0d24f)

Co-authored-by: Yury Manushkin <manushkin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants