Skip to content

Commit e3110c3

Browse files
orsenthilAdamGoldFidget-Spinner
authored
[3.8] bpo-42967: only use '&' as a query string separator (GH-24297) (#24529)
* bpo-42967: only use '&' as a query string separator (#24297) bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl(). urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator. Co-authored-by: Éric Araujo <merwok@netwok.org> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com> Co-authored-by: Éric Araujo <merwok@netwok.org> (cherry picked from commit fcbe0cb) * [3.8] bpo-42967: only use '&' as a query string separator (GH-24297) bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl(). urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator. Co-authored-by: Éric Araujo <merwok@netwok.org> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com> Co-authored-by: Éric Araujo <merwok@netwok.org>. (cherry picked from commit fcbe0cb) Co-authored-by: Adam Goldschmidt <adamgold7@gmail.com> * Update correct version information. * fix docs and make logic clearer Co-authored-by: Adam Goldschmidt <adamgold7@gmail.com> Co-authored-by: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
1 parent 7777ae2 commit e3110c3

File tree

10 files changed

+166
-46
lines changed

10 files changed

+166
-46
lines changed

Doc/library/cgi.rst

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -277,14 +277,16 @@ These are useful if you want more control, or if you want to employ some of the
277277
algorithms implemented in this module in other circumstances.
278278

279279

280-
.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False)
280+
.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False, separator="&")
281281

282282
Parse a query in the environment or from a file (the file defaults to
283-
``sys.stdin``). The *keep_blank_values* and *strict_parsing* parameters are
283+
``sys.stdin``). The *keep_blank_values*, *strict_parsing* and *separator* parameters are
284284
passed to :func:`urllib.parse.parse_qs` unchanged.
285285

286+
.. versionchanged:: 3.8.8
287+
Added the *separator* parameter.
286288

287-
.. function:: parse_multipart(fp, pdict, encoding="utf-8", errors="replace")
289+
.. function:: parse_multipart(fp, pdict, encoding="utf-8", errors="replace", separator="&")
288290

289291
Parse input of type :mimetype:`multipart/form-data` (for file uploads).
290292
Arguments are *fp* for the input file, *pdict* for a dictionary containing
@@ -303,6 +305,9 @@ algorithms implemented in this module in other circumstances.
303305
Added the *encoding* and *errors* parameters. For non-file fields, the
304306
value is now a list of strings, not bytes.
305307

308+
.. versionchanged:: 3.8.8
309+
Added the *separator* parameter.
310+
306311

307312
.. function:: parse_header(string)
308313

Doc/library/urllib.parse.rst

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ or on combining URL components into a URL string.
165165
now raise :exc:`ValueError`.
166166

167167

168-
.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
168+
.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
169169

170170
Parse a query string given as a string argument (data of type
171171
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a
@@ -190,6 +190,9 @@ or on combining URL components into a URL string.
190190
read. If set, then throws a :exc:`ValueError` if there are more than
191191
*max_num_fields* fields read.
192192

193+
The optional argument *separator* is the symbol to use for separating the
194+
query arguments. It defaults to ``&``.
195+
193196
Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
194197
parameter set to ``True``) to convert such dictionaries into query
195198
strings.
@@ -201,8 +204,14 @@ or on combining URL components into a URL string.
201204
.. versionchanged:: 3.8
202205
Added *max_num_fields* parameter.
203206

207+
.. versionchanged:: 3.8.8
208+
Added *separator* parameter with the default value of ``&``. Python
209+
versions earlier than Python 3.8.8 allowed using both ``;`` and ``&`` as
210+
query parameter separator. This has been changed to allow only a single
211+
separator key, with ``&`` as the default separator.
212+
204213

205-
.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
214+
.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
206215

207216
Parse a query string given as a string argument (data of type
208217
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
@@ -226,6 +235,9 @@ or on combining URL components into a URL string.
226235
read. If set, then throws a :exc:`ValueError` if there are more than
227236
*max_num_fields* fields read.
228237

238+
The optional argument *separator* is the symbol to use for separating the
239+
query arguments. It defaults to ``&``.
240+
229241
Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
230242
query strings.
231243

@@ -235,6 +247,12 @@ or on combining URL components into a URL string.
235247
.. versionchanged:: 3.8
236248
Added *max_num_fields* parameter.
237249

250+
.. versionchanged:: 3.8.8
251+
Added *separator* parameter with the default value of ``&``. Python
252+
versions earlier than Python 3.8.8 allowed using both ``;`` and ``&`` as
253+
query parameter separator. This has been changed to allow only a single
254+
separator key, with ``&`` as the default separator.
255+
238256

239257
.. function:: urlunparse(parts)
240258

Doc/whatsnew/3.6.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2443,3 +2443,16 @@ because of the behavior of the socket option ``SO_REUSEADDR`` in UDP. For more
24432443
details, see the documentation for ``loop.create_datagram_endpoint()``.
24442444
(Contributed by Kyle Stanley, Antoine Pitrou, and Yury Selivanov in
24452445
:issue:`37228`.)
2446+
2447+
Notable changes in Python 3.6.13
2448+
================================
2449+
2450+
Earlier Python versions allowed using both ``;`` and ``&`` as
2451+
query parameter separators in :func:`urllib.parse.parse_qs` and
2452+
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
2453+
newer W3C recommendations, this has been changed to allow only a single
2454+
separator key, with ``&`` as the default. This change also affects
2455+
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
2456+
functions internally. For more details, please see their respective
2457+
documentation.
2458+
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)

Doc/whatsnew/3.7.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2556,3 +2556,16 @@ because of the behavior of the socket option ``SO_REUSEADDR`` in UDP. For more
25562556
details, see the documentation for ``loop.create_datagram_endpoint()``.
25572557
(Contributed by Kyle Stanley, Antoine Pitrou, and Yury Selivanov in
25582558
:issue:`37228`.)
2559+
2560+
Notable changes in Python 3.7.10
2561+
================================
2562+
2563+
Earlier Python versions allowed using both ``;`` and ``&`` as
2564+
query parameter separators in :func:`urllib.parse.parse_qs` and
2565+
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
2566+
newer W3C recommendations, this has been changed to allow only a single
2567+
separator key, with ``&`` as the default. This change also affects
2568+
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
2569+
functions internally. For more details, please see their respective
2570+
documentation.
2571+
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)

Doc/whatsnew/3.8.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2251,3 +2251,16 @@ The constant values of future flags in the :mod:`__future__` module
22512251
are updated in order to prevent collision with compiler flags. Previously
22522252
``PyCF_ALLOW_TOP_LEVEL_AWAIT`` was clashing with ``CO_FUTURE_DIVISION``.
22532253
(Contributed by Batuhan Taskaya in :issue:`39562`)
2254+
2255+
Notable changes in Python 3.8.8
2256+
===============================
2257+
2258+
Earlier Python versions allowed using both ``;`` and ``&`` as
2259+
query parameter separators in :func:`urllib.parse.parse_qs` and
2260+
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
2261+
newer W3C recommendations, this has been changed to allow only a single
2262+
separator key, with ``&`` as the default. This change also affects
2263+
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
2264+
functions internally. For more details, please see their respective
2265+
documentation.
2266+
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)

Lib/cgi.py

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,8 @@ def closelog():
115115
# 0 ==> unlimited input
116116
maxlen = 0
117117

118-
def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
118+
def parse(fp=None, environ=os.environ, keep_blank_values=0,
119+
strict_parsing=0, separator='&'):
119120
"""Parse a query in the environment or from a file (default stdin)
120121
121122
Arguments, all optional:
@@ -134,6 +135,9 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
134135
strict_parsing: flag indicating what to do with parsing errors.
135136
If false (the default), errors are silently ignored.
136137
If true, errors raise a ValueError exception.
138+
139+
separator: str. The symbol to use for separating the query arguments.
140+
Defaults to &.
137141
"""
138142
if fp is None:
139143
fp = sys.stdin
@@ -154,7 +158,7 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
154158
if environ['REQUEST_METHOD'] == 'POST':
155159
ctype, pdict = parse_header(environ['CONTENT_TYPE'])
156160
if ctype == 'multipart/form-data':
157-
return parse_multipart(fp, pdict)
161+
return parse_multipart(fp, pdict, separator=separator)
158162
elif ctype == 'application/x-www-form-urlencoded':
159163
clength = int(environ['CONTENT_LENGTH'])
160164
if maxlen and clength > maxlen:
@@ -178,10 +182,10 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
178182
qs = ""
179183
environ['QUERY_STRING'] = qs # XXX Shouldn't, really
180184
return urllib.parse.parse_qs(qs, keep_blank_values, strict_parsing,
181-
encoding=encoding)
185+
encoding=encoding, separator=separator)
182186

183187

184-
def parse_multipart(fp, pdict, encoding="utf-8", errors="replace"):
188+
def parse_multipart(fp, pdict, encoding="utf-8", errors="replace", separator='&'):
185189
"""Parse multipart input.
186190
187191
Arguments:
@@ -205,7 +209,7 @@ def parse_multipart(fp, pdict, encoding="utf-8", errors="replace"):
205209
except KeyError:
206210
pass
207211
fs = FieldStorage(fp, headers=headers, encoding=encoding, errors=errors,
208-
environ={'REQUEST_METHOD': 'POST'})
212+
environ={'REQUEST_METHOD': 'POST'}, separator=separator)
209213
return {k: fs.getlist(k) for k in fs}
210214

211215
def _parseparam(s):
@@ -315,7 +319,7 @@ class FieldStorage:
315319
def __init__(self, fp=None, headers=None, outerboundary=b'',
316320
environ=os.environ, keep_blank_values=0, strict_parsing=0,
317321
limit=None, encoding='utf-8', errors='replace',
318-
max_num_fields=None):
322+
max_num_fields=None, separator='&'):
319323
"""Constructor. Read multipart/* until last part.
320324
321325
Arguments, all optional:
@@ -363,6 +367,7 @@ def __init__(self, fp=None, headers=None, outerboundary=b'',
363367
self.keep_blank_values = keep_blank_values
364368
self.strict_parsing = strict_parsing
365369
self.max_num_fields = max_num_fields
370+
self.separator = separator
366371
if 'REQUEST_METHOD' in environ:
367372
method = environ['REQUEST_METHOD'].upper()
368373
self.qs_on_post = None
@@ -589,7 +594,7 @@ def read_urlencoded(self):
589594
query = urllib.parse.parse_qsl(
590595
qs, self.keep_blank_values, self.strict_parsing,
591596
encoding=self.encoding, errors=self.errors,
592-
max_num_fields=self.max_num_fields)
597+
max_num_fields=self.max_num_fields, separator=self.separator)
593598
self.list = [MiniFieldStorage(key, value) for key, value in query]
594599
self.skip_lines()
595600

@@ -605,7 +610,7 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
605610
query = urllib.parse.parse_qsl(
606611
self.qs_on_post, self.keep_blank_values, self.strict_parsing,
607612
encoding=self.encoding, errors=self.errors,
608-
max_num_fields=self.max_num_fields)
613+
max_num_fields=self.max_num_fields, separator=self.separator)
609614
self.list.extend(MiniFieldStorage(key, value) for key, value in query)
610615

611616
klass = self.FieldStorageClass or self.__class__
@@ -649,7 +654,7 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
649654
else self.limit - self.bytes_read
650655
part = klass(self.fp, headers, ib, environ, keep_blank_values,
651656
strict_parsing, limit,
652-
self.encoding, self.errors, max_num_fields)
657+
self.encoding, self.errors, max_num_fields, self.separator)
653658

654659
if max_num_fields is not None:
655660
max_num_fields -= 1

Lib/test/test_cgi.py

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -53,12 +53,9 @@ def do_test(buf, method):
5353
("", ValueError("bad query field: ''")),
5454
("&", ValueError("bad query field: ''")),
5555
("&&", ValueError("bad query field: ''")),
56-
(";", ValueError("bad query field: ''")),
57-
(";&;", ValueError("bad query field: ''")),
5856
# Should the next few really be valid?
5957
("=", {}),
6058
("=&=", {}),
61-
("=;=", {}),
6259
# This rest seem to make sense
6360
("=a", {'': ['a']}),
6461
("&=a", ValueError("bad query field: ''")),
@@ -73,8 +70,6 @@ def do_test(buf, method):
7370
("a=a+b&b=b+c", {'a': ['a b'], 'b': ['b c']}),
7471
("a=a+b&a=b+a", {'a': ['a b', 'b a']}),
7572
("x=1&y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
76-
("x=1;y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
77-
("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
7873
("Hbc5161168c542333633315dee1182227:key_store_seqid=400006&cuyer=r&view=bustomer&order_id=0bb2e248638833d48cb7fed300000f1b&expire=964546263&lobale=en-US&kid=130003.300038&ss=env",
7974
{'Hbc5161168c542333633315dee1182227:key_store_seqid': ['400006'],
8075
'cuyer': ['r'],
@@ -201,6 +196,30 @@ def test_strict(self):
201196
else:
202197
self.assertEqual(fs.getvalue(key), expect_val[0])
203198

199+
def test_separator(self):
200+
parse_semicolon = [
201+
("x=1;y=2.0", {'x': ['1'], 'y': ['2.0']}),
202+
("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
203+
(";", ValueError("bad query field: ''")),
204+
(";;", ValueError("bad query field: ''")),
205+
("=;a", ValueError("bad query field: 'a'")),
206+
(";b=a", ValueError("bad query field: ''")),
207+
("b;=a", ValueError("bad query field: 'b'")),
208+
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
209+
("a=a+b;a=b+a", {'a': ['a b', 'b a']}),
210+
]
211+
for orig, expect in parse_semicolon:
212+
env = {'QUERY_STRING': orig}
213+
fs = cgi.FieldStorage(separator=';', environ=env)
214+
if isinstance(expect, dict):
215+
for key in expect.keys():
216+
expect_val = expect[key]
217+
self.assertIn(key, fs)
218+
if len(expect_val) > 1:
219+
self.assertEqual(fs.getvalue(key), expect_val)
220+
else:
221+
self.assertEqual(fs.getvalue(key), expect_val[0])
222+
204223
def test_log(self):
205224
cgi.log("Testing")
206225

Lib/test/test_urlparse.py

Lines changed: 46 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -32,16 +32,10 @@
3232
(b"&a=b", [(b'a', b'b')]),
3333
(b"a=a+b&b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
3434
(b"a=1&a=2", [(b'a', b'1'), (b'a', b'2')]),
35-
(";", []),
36-
(";;", []),
37-
(";a=b", [('a', 'b')]),
38-
("a=a+b;b=b+c", [('a', 'a b'), ('b', 'b c')]),
39-
("a=1;a=2", [('a', '1'), ('a', '2')]),
40-
(b";", []),
41-
(b";;", []),
42-
(b";a=b", [(b'a', b'b')]),
43-
(b"a=a+b;b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
44-
(b"a=1;a=2", [(b'a', b'1'), (b'a', b'2')]),
35+
(";a=b", [(';a', 'b')]),
36+
("a=a+b;b=b+c", [('a', 'a b;b=b c')]),
37+
(b";a=b", [(b';a', b'b')]),
38+
(b"a=a+b;b=b+c", [(b'a', b'a b;b=b c')]),
4539
]
4640

4741
# Each parse_qs testcase is a two-tuple that contains
@@ -68,16 +62,10 @@
6862
(b"&a=b", {b'a': [b'b']}),
6963
(b"a=a+b&b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
7064
(b"a=1&a=2", {b'a': [b'1', b'2']}),
71-
(";", {}),
72-
(";;", {}),
73-
(";a=b", {'a': ['b']}),
74-
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
75-
("a=1;a=2", {'a': ['1', '2']}),
76-
(b";", {}),
77-
(b";;", {}),
78-
(b";a=b", {b'a': [b'b']}),
79-
(b"a=a+b;b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
80-
(b"a=1;a=2", {b'a': [b'1', b'2']}),
65+
(";a=b", {';a': ['b']}),
66+
("a=a+b;b=b+c", {'a': ['a b;b=b c']}),
67+
(b";a=b", {b';a': [b'b']}),
68+
(b"a=a+b;b=b+c", {b'a':[ b'a b;b=b c']}),
8169
]
8270

8371
class UrlParseTestCase(unittest.TestCase):
@@ -884,10 +872,46 @@ def test_parse_qsl_encoding(self):
884872
def test_parse_qsl_max_num_fields(self):
885873
with self.assertRaises(ValueError):
886874
urllib.parse.parse_qs('&'.join(['a=a']*11), max_num_fields=10)
887-
with self.assertRaises(ValueError):
888-
urllib.parse.parse_qs(';'.join(['a=a']*11), max_num_fields=10)
889875
urllib.parse.parse_qs('&'.join(['a=a']*10), max_num_fields=10)
890876

877+
def test_parse_qs_separator(self):
878+
parse_qs_semicolon_cases = [
879+
(";", {}),
880+
(";;", {}),
881+
(";a=b", {'a': ['b']}),
882+
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
883+
("a=1;a=2", {'a': ['1', '2']}),
884+
(b";", {}),
885+
(b";;", {}),
886+
(b";a=b", {b'a': [b'b']}),
887+
(b"a=a+b;b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
888+
(b"a=1;a=2", {b'a': [b'1', b'2']}),
889+
]
890+
for orig, expect in parse_qs_semicolon_cases:
891+
with self.subTest(f"Original: {orig!r}, Expected: {expect!r}"):
892+
result = urllib.parse.parse_qs(orig, separator=';')
893+
self.assertEqual(result, expect, "Error parsing %r" % orig)
894+
895+
896+
def test_parse_qsl_separator(self):
897+
parse_qsl_semicolon_cases = [
898+
(";", []),
899+
(";;", []),
900+
(";a=b", [('a', 'b')]),
901+
("a=a+b;b=b+c", [('a', 'a b'), ('b', 'b c')]),
902+
("a=1;a=2", [('a', '1'), ('a', '2')]),
903+
(b";", []),
904+
(b";;", []),
905+
(b";a=b", [(b'a', b'b')]),
906+
(b"a=a+b;b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
907+
(b"a=1;a=2", [(b'a', b'1'), (b'a', b'2')]),
908+
]
909+
for orig, expect in parse_qsl_semicolon_cases:
910+
with self.subTest(f"Original: {orig!r}, Expected: {expect!r}"):
911+
result = urllib.parse.parse_qsl(orig, separator=';')
912+
self.assertEqual(result, expect, "Error parsing %r" % orig)
913+
914+
891915
def test_urlencode_sequences(self):
892916
# Other tests incidentally urlencode things; test non-covered cases:
893917
# Sequence and object values.

0 commit comments

Comments
 (0)