From 3d8b7b1bd66a5b75d70c9aa126fc73804ab5609e Mon Sep 17 00:00:00 2001 From: tbartlett0 Date: Mon, 22 Jul 2019 01:00:24 +1000 Subject: [PATCH 1/2] bpo-1346874: Add 'Expect: 100-Continue' support to httplib Previously, http.client would always send content body immediately and ignore any 100 responses. This change makes HTTPClient.request() wait for a `Continue` response if the `Expect: 100-Continue` header is set, and adds a parameter to HTTPClient.getresponse() that will cause it to return 100 responses instead of eating them. --- Doc/library/http.client.rst | 86 ++++++++++- Lib/http/client.py | 89 +++++++++--- Lib/test/test_httplib.py | 137 ++++++++++++++++++ Misc/ACKS | 1 + ...2019-07-22-00-57-26.bpo-1346874.TentEE.rst | 2 + 5 files changed, 288 insertions(+), 27 deletions(-) create mode 100644 Misc/NEWS.d/next/Library/2019-07-22-00-57-26.bpo-1346874.TentEE.rst diff --git a/Doc/library/http.client.rst b/Doc/library/http.client.rst index 4e761cd39a0109..e580d0c5815593 100644 --- a/Doc/library/http.client.rst +++ b/Doc/library/http.client.rst @@ -32,7 +32,7 @@ The module provides the following classes: .. class:: HTTPConnection(host, port=None[, timeout], source_address=None, \ - blocksize=8192) + blocksize=8192, continue_timeout=2.5) An :class:`HTTPConnection` instance represents one transaction with an HTTP server. It should be instantiated passing it a host and optional port @@ -44,7 +44,10 @@ The module provides the following classes: The optional *source_address* parameter may be a tuple of a (host, port) to use as the source address the HTTP connection is made from. The optional *blocksize* parameter sets the buffer size in bytes for - sending a file-like message body. + sending a file-like message body. Finally, the optional *continue_timeout* + parameter controls how long the connection will wait for a ``100 Continue`` + response from the server before sending the body, if the request included + an ``Expect: 100-Continue`` header. For example, the following calls all create instances that connect to the server at the same host and port:: @@ -59,11 +62,14 @@ The module provides the following classes: .. versionchanged:: 3.4 The *strict* parameter was removed. HTTP 0.9-style "Simple Responses" are - not longer supported. + no longer supported. .. versionchanged:: 3.7 *blocksize* parameter was added. + .. versionchanged:: 3.9 + *continue_timeout* parameter was added. + .. class:: HTTPSConnection(host, port=None, key_file=None, \ cert_file=None[, timeout], \ @@ -306,11 +312,24 @@ HTTPConnection Objects No attempt is made to determine the Content-Length for file objects. -.. method:: HTTPConnection.getresponse() + .. versionchanged:: 3.9 + If the headers include ``Expect: 100-Continue`` and *body* is set, + the body will not be sent until either a ``100 Continue`` response is + received from the server or the :class:`HTTPConnection`'s *continue_timeout* + is reached; if a response code other than 100 is received, the body + will not be sent at all. + +.. method:: HTTPConnection.getresponse(ignore_100_continue=True) Should be called after a request is sent to get the response from the server. Returns an :class:`HTTPResponse` instance. + By default, a server response of ``100 Continue`` will be ignored and the + call will not return until a response other than code 100 is received. + Setting *ignore_100_continue* to `False` will allow a 100 response to be + returned; you will then need to call :meth:`getresponse` a second time + (after transmitting the body) to get the final response. + .. note:: Note that you must have read the whole response before you can send a new @@ -321,6 +340,10 @@ HTTPConnection Objects :class:`HTTPConnection` object will be ready to reconnect when a new request is sent. + .. versionchanged:: 3.9 + Added the *ignore_100_continue* parameter. (In prior versions + a ``Continue`` response was always ignored.) + .. method:: HTTPConnection.set_debuglevel(level) @@ -399,11 +422,17 @@ also send your request step by step, by using the four functions below. an argument. -.. method:: HTTPConnection.endheaders(message_body=None, *, encode_chunked=False) +.. method:: HTTPConnection.endheaders(message_body=None, *, encode_chunked=False, \ + expect_continue=False) Send a blank line to the server, signalling the end of the headers. The optional *message_body* argument can be used to pass a message body - associated with the request. + associated with the request. If a body is provided, setting + *expect_continue* to `True` will wait for a ``100 Continue`` response + from the server before sending the body. (This should generally be + used only when an ``Expect: 100-Continue`` header has been sent.) If no + response is received within the :class:`HTTPConnection`'s *continue_timeout*, + the body will be sent regardless. If *encode_chunked* is ``True``, the result of each iteration of *message_body* will be chunk-encoded as specified in :rfc:`7230`, @@ -425,6 +454,9 @@ also send your request step by step, by using the four functions below. Chunked encoding support. The *encode_chunked* parameter was added. + .. versionadded:: 3.9 + The *expect_continue* parameter was added. + .. method:: HTTPConnection.send(data) @@ -579,6 +611,48 @@ request using http.client:: >>> print(response.status, response.reason) 200, OK +Since version 3.9, conditional transmission of the body is supported when an +``Expect: 100-Continue`` header is set. To use this in a simple case, just +set the header, and optionally the time for which the client should wait for +a ``100 Continue`` response before sending the body regardless:: + + >>> import http.client + >>> BODY = "***filecontents***" + >>> conn = http.client.HTTPConnection("localhost", 8080, continue_timeout=1.0) + >>> conn.request("PUT", "/file", BODY, headers={'Expect': '100-Continue'}) + >>> response = conn.getresponse() + >>> # You will not see the '100' response, as it is handled internally + >>> print(response.status, response.reason) + 200, OK + +Here is a more complex example in which we manually check the response and +decide whether to send the body. This may be useful if the body must be +generated by some resource-intensive process which should be skipped if the +server will not accept it:: + + >>> import http.client + >>> conn = http.client.HTTPConnection("localhost", 8080) + >>> conn.putrequest("PUT", "/file") + >>> conn.putheader('Expect', '100-Continue') + >>> # Assuming you know in advance what the length will be + >>> # If not, you will need to encode it as chunked + >>> conn.putheader('Content-Length', '42') + >>> conn.endheaders() + >>> response = conn.getresponse(ignore_100_continue=False) + >>> print(response.status, response.reason) + 100, Continue + >>> BODY = resource_intensive_calculation() + >>> conn.send(BODY) + >>> response = conn.getresponse() + >>> print(response.status, response.reason) + 200, OK + +.. note:: + + The *continue_timeout* setting does not apply when directly using + :meth:`getresponse`, so use the above example only if you are confident + that the server respects the ``Expect: 100-Continue`` header. + .. _httpmessage-objects: HTTPMessage Objects diff --git a/Lib/http/client.py b/Lib/http/client.py index f61267e108a524..05a3b04a99b360 100644 --- a/Lib/http/client.py +++ b/Lib/http/client.py @@ -73,6 +73,7 @@ import http import io import re +import select import socket import collections.abc from urllib.parse import urlsplit @@ -293,15 +294,16 @@ def _read_status(self): raise BadStatusLine(line) return version, status, reason - def begin(self): + def begin(self, ignore_100_continue=True): if self.headers is not None: # we've already started reading the response return # read until we get a non-100 response + # (unless caller has requested return of 100 responses) while True: version, status, reason = self._read_status() - if status != CONTINUE: + if not (ignore_100_continue and status == CONTINUE): break # skip the header from the 100 response while True: @@ -813,13 +815,15 @@ def _get_content_length(body, method): return None def __init__(self, host, port=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT, - source_address=None, blocksize=8192): + source_address=None, blocksize=8192, continue_timeout=2.5): self.timeout = timeout + self.continue_timeout = continue_timeout self.source_address = source_address self.blocksize = blocksize self.sock = None self._buffer = [] self.__response = None + self.__pending_response = None self.__state = _CS_IDLE self._method = None self._tunnel_host = None @@ -828,9 +832,10 @@ def __init__(self, host, port=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT, (self.host, self.port) = self._get_hostport(host, port) - # This is stored as an instance variable to allow unit - # tests to replace it with a suitable mockup + # These are stored as instance variables to allow unit + # tests to replace them with a suitable mockup self._create_connection = socket.create_connection + self._select = select.select def set_tunnel(self, host, port=None, headers=None): """Set up host and port for HTTP CONNECT tunnelling. @@ -928,6 +933,10 @@ def close(self): self.sock = None sock.close() # close it manually... there may be other refs finally: + pending_response = self.__pending_response + if pending_response: + self.__pending_response = None + pending_response.close() response = self.__response if response: self.__response = None @@ -992,7 +1001,8 @@ def _read_readable(self, readable): datablock = datablock.encode("iso-8859-1") yield datablock - def _send_output(self, message_body=None, encode_chunked=False): + def _send_output(self, message_body=None, encode_chunked=False, + expect_continue=False): """Send the currently buffered request and clear the buffer. Appends an extra \\r\\n to the buffer. @@ -1004,6 +1014,23 @@ def _send_output(self, message_body=None, encode_chunked=False): self.send(msg) if message_body is not None: + if expect_continue and not self.__response: + read_ready, _, _ = self._select([self.sock], [], [], + self.continue_timeout) + if read_ready: + if self.debuglevel > 0: + response = self.response_class(self.sock, + self.debuglevel, + method=self._method) + else: + response = self.response_class(self.sock, + method=self._method) + response.begin(ignore_100_continue=False) + if response.code != CONTINUE: + # Break without sending the body + self.__pending_response = response + return + response.close() # create a consistent interface to message_body if hasattr(message_body, 'read'): @@ -1202,7 +1229,8 @@ def putheader(self, header, *values): header = header + b': ' + value self._output(header) - def endheaders(self, message_body=None, *, encode_chunked=False): + def endheaders(self, message_body=None, *, encode_chunked=False, + expect_continue=False): """Indicate that the last header line has been sent to the server. This method sends the request to the server. The optional message_body @@ -1213,7 +1241,8 @@ def endheaders(self, message_body=None, *, encode_chunked=False): self.__state = _CS_REQ_SENT else: raise CannotSendHeader() - self._send_output(message_body, encode_chunked=encode_chunked) + self._send_output(message_body, encode_chunked=encode_chunked, + expect_continue=expect_continue) def request(self, method, url, body=None, headers={}, *, encode_chunked=False): @@ -1258,20 +1287,33 @@ def _send_request(self, method, url, body, headers, encode_chunked): else: encode_chunked = False + # If the Expect: 100-continue header is set, we will try to honor it + # (if possible). We can only do so if 1) the request has a body, and + # 2) there is no current incomplete response (since we need to read + # the response stream to check if the code is 100 or not) + expect_continue = ( + body and not self.__response + and 'expect' in header_names + and '100-continue' in {v.lower() for k, v in headers.items() + if k.lower() == 'expect'} + ) + for hdr, value in headers.items(): self.putheader(hdr, value) if isinstance(body, str): # RFC 2616 Section 3.7.1 says that text default has a # default charset of iso-8859-1. body = _encode(body, 'body') - self.endheaders(body, encode_chunked=encode_chunked) + self.endheaders(body, encode_chunked=encode_chunked, + expect_continue=expect_continue) - def getresponse(self): + def getresponse(self, ignore_100_continue=True): """Get the response from the server. If the HTTPConnection is in the correct state, returns an - instance of HTTPResponse or of whatever object is returned by - the response_class variable. + instance of HTTPResponse or of whatever object is returned by the + response_class variable. The connection will wait for a response other + than code 100 ('Continue') unless ignore_100_continue is set to False. If a request has not been sent or if a previous response has not be handled, ResponseNotReady is raised. If the HTTP @@ -1302,7 +1344,10 @@ def getresponse(self): if self.__state != _CS_REQ_SENT or self.__response: raise ResponseNotReady(self.__state) - if self.debuglevel > 0: + if self.__pending_response: + response = self.__pending_response + self.__pending_response = None + elif self.debuglevel > 0: response = self.response_class(self.sock, self.debuglevel, method=self._method) else: @@ -1310,19 +1355,21 @@ def getresponse(self): try: try: - response.begin() + response.begin(ignore_100_continue=ignore_100_continue) except ConnectionError: self.close() raise assert response.will_close != _UNKNOWN - self.__state = _CS_IDLE + if response.code != 100: + # Code 100 is effectively 'not a response' for this purpose + self.__state = _CS_IDLE - if response.will_close: - # this effectively passes the connection to the response - self.close() - else: - # remember this, so we can tell when it is complete - self.__response = response + if response.will_close: + # this effectively passes the connection to the response + self.close() + else: + # remember this, so we can tell when it is complete + self.__response = response return response except: diff --git a/Lib/test/test_httplib.py b/Lib/test/test_httplib.py index 9148169cc7c2e5..6b8dbc2ed4afe5 100644 --- a/Lib/test/test_httplib.py +++ b/Lib/test/test_httplib.py @@ -5,6 +5,7 @@ import os import array import re +import select import socket import threading @@ -117,6 +118,7 @@ def __init__(self, *args): super().__init__('example.com') self.fake_socket_args = args self._create_connection = self.create_connection + self._select = self.fake_select def connect(self): """Count the number of times connect() is invoked""" @@ -126,6 +128,21 @@ def connect(self): def create_connection(self, *pos, **kw): return FakeSocket(*self.fake_socket_args) + def fake_select(self, rlist, wlist, xlist, timeout=None): + """Real select.select() won't work on our fake socket""" + # Only needs to support read select for current tests + ret = [] + for r in rlist: + assert isinstance(r, FakeSocket) + if r.data and not hasattr(r, 'file'): + ret.append(r) # Has data to read, hasn't called makefile() yet + elif getattr(r, 'file', None) and not r.file_closed: + if r.file.read(1): + r.file.seek(r.file.tell() - 1) + ret.append(r) + return ret, [], [] + + class HeaderTests(TestCase): def test_auto_headers(self): # Some headers are added automatically, but should not be added by @@ -1860,6 +1877,126 @@ def test_binary_file_body(self): self.assertEqual(b'5\r\nbody\xc1\r\n0\r\n\r\n', f.read()) +class ExpectContinueTests(TestCase): + + def test_data_sent_on_continue(self): + # Prior versions would pass this too, as they send body immediately + # and ignore the Continue, but make sure it still behaves as expected. + conn = FakeSocketHTTPConnection( + b'HTTP/1.1 100 Continue\r\n' + b'\r\n' + b'HTTP/1.1 200 OK\r\n' + b'Content-Length: 0\r\n' + b'\r\n' + ) + conn.request('PUT', '/', headers={'Expect': '100-continue'}, + body=b'Body content') + resp = conn.getresponse() + self.assertEqual(resp.code, 200) + self.assertIn(b'Expect: 100-continue', conn.sock.data) + self.assertIn(b'Body content', conn.sock.data) + + def test_data_not_sent_on_error(self): + conn = FakeSocketHTTPConnection( + b'HTTP/1.1 429 Too Many Requests\r\n' + b'Content-Length: 0\r\n' + b'\r\n' + ) + conn.request('PUT', '/', headers={'Expect': '100-continue'}, + body=b'Body content') + resp = conn.getresponse() + self.assertEqual(resp.code, 429) + self.assertIn(b'Expect: 100-continue', conn.sock.data) + self.assertNotIn(b'Body content', conn.sock.data) + + def test_body_sent_on_timeout(self): + def client_thread(port): + conn = client.HTTPConnection('localhost', port, + continue_timeout=0.75) + conn.request('PUT', '/', + headers={'Expect': '100-continue'}, + body=b'Body content') + resp = conn.getresponse() + self.assertEqual(resp.code, 200) + resp.close() + conn.close() + + with socket.socket() as sock: + sock.bind(('localhost', 0)) + sock.listen(1) + t = threading.Thread(target=client_thread, + args=(sock.getsockname()[1],)) + t.start() + conn, _ = sock.accept() + req_data = conn.recv(4096) + self.assertTrue(req_data.startswith(b'PUT / HTTP/1.1\r\n')) + self.assertIn(b'Expect: 100-continue\r\n', req_data) + self.assertIn(b'Content-Length: 12\r\n', req_data) + self.assertNotIn(b'Body content', req_data) + # Client should not send body data yet + rr, _, _ = select.select([conn], [], [], 0.5) + self.assertEqual(rr, []) + # Client should time out and send body data in another ~0.25s + rr, _, _ = select.select([conn], [], [], 0.5) + self.assertEqual(rr, [conn]) + body = conn.recv(4096) + self.assertEqual(body, b'Body content') + conn.sendall( + b'HTTP/1.1 200 OK\r\n' + b'Content-Length: 0\r\n' + b'\r\n' + ) + conn.close() + sock.close() + t.join() + + def test_manual_getresponse(self): + def client_thread(port): + conn = client.HTTPConnection('localhost', port, + continue_timeout=0) + conn.putrequest("PUT", "/file") + conn.putheader('Expect', '100-Continue') + conn.putheader('Content-Length', '42') + conn.endheaders() + with conn.getresponse(ignore_100_continue=False) as resp: + self.assertEqual(resp.status, 100) + conn.send(b'Go away or I shall taunt you a second time') + with conn.getresponse() as resp: + self.assertEqual(resp.status, 200) + conn.close() + + with socket.socket() as sock: + sock.bind(('localhost', 0)) + sock.listen(1) + t = threading.Thread(target=client_thread, + args=(sock.getsockname()[1],)) + t.start() + conn, _ = sock.accept() + req_data = conn.recv(4096) + self.assertTrue(req_data.startswith(b'PUT /file HTTP/1.1\r\n')) + self.assertIn(b'Expect: 100-Continue\r\n', req_data) + self.assertIn(b'Content-Length: 42\r\n', req_data) + self.assertNotIn(b'I shall taunt you', req_data) + # Client is not expected to send body data until we respond, + # regardless of continue_timeout + rr, _, _ = select.select([conn], [], [], 1.0) + self.assertEqual(rr, []) + conn.sendall(b'HTTP/1.1 100 Continue\r\n\r\n') + rr, _, _ = select.select([conn], [], [], 0.5) + self.assertEqual(rr, [conn]) + b = conn.recv(42) + self.assertEqual(b, b'Go away or I shall taunt you a second time') + conn.sendall( + b'HTTP/1.1 200 OK\r\n' + b'Content-Length: 0\r\n' + b'\r\n' + ) + conn.close() + sock.close() + t.join() + + + class HTTPResponseTest(TestCase): def setUp(self): diff --git a/Misc/ACKS b/Misc/ACKS index 3e6a0aca35731d..5a7219ba946ca4 100644 --- a/Misc/ACKS +++ b/Misc/ACKS @@ -103,6 +103,7 @@ Cesar Eduardo Barros Des Barry Emanuel Barry Ulf Bartelt +Tim Bartlett Campbell Barton Don Bashford Pior Bastida diff --git a/Misc/NEWS.d/next/Library/2019-07-22-00-57-26.bpo-1346874.TentEE.rst b/Misc/NEWS.d/next/Library/2019-07-22-00-57-26.bpo-1346874.TentEE.rst new file mode 100644 index 00000000000000..90513d339c7678 --- /dev/null +++ b/Misc/NEWS.d/next/Library/2019-07-22-00-57-26.bpo-1346874.TentEE.rst @@ -0,0 +1,2 @@ +`http.client.HTTPConnection` now waits for a ``100 Continue`` response +before sending body content if the ``Expect: 100-Continue`` header is set. From 73908141415bf85a52656bb038d027f050b1cb6e Mon Sep 17 00:00:00 2001 From: tbartlett0 Date: Mon, 22 Jul 2019 20:02:59 +1000 Subject: [PATCH 2/2] Fix rstlint errors --- Doc/library/http.client.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Doc/library/http.client.rst b/Doc/library/http.client.rst index e580d0c5815593..c9f153ec112a00 100644 --- a/Doc/library/http.client.rst +++ b/Doc/library/http.client.rst @@ -326,7 +326,7 @@ HTTPConnection Objects By default, a server response of ``100 Continue`` will be ignored and the call will not return until a response other than code 100 is received. - Setting *ignore_100_continue* to `False` will allow a 100 response to be + Setting *ignore_100_continue* to ``False`` will allow a 100 response to be returned; you will then need to call :meth:`getresponse` a second time (after transmitting the body) to get the final response. @@ -428,7 +428,7 @@ also send your request step by step, by using the four functions below. Send a blank line to the server, signalling the end of the headers. The optional *message_body* argument can be used to pass a message body associated with the request. If a body is provided, setting - *expect_continue* to `True` will wait for a ``100 Continue`` response + *expect_continue* to ``True`` will wait for a ``100 Continue`` response from the server before sending the body. (This should generally be used only when an ``Expect: 100-Continue`` header has been sent.) If no response is received within the :class:`HTTPConnection`'s *continue_timeout*, @@ -628,7 +628,7 @@ a ``100 Continue`` response before sending the body regardless:: Here is a more complex example in which we manually check the response and decide whether to send the body. This may be useful if the body must be generated by some resource-intensive process which should be skipped if the -server will not accept it:: +server will not accept it. :: >>> import http.client >>> conn = http.client.HTTPConnection("localhost", 8080)