gh-69753: Add Bytes Object Support to Shlex #22657

HassanAbouelela · 2020-10-12T00:42:08Z

Adds support for bytes objects in the shlex module.

https://bugs.python.org/issue25567

Issue: shlex.quote doesn't work on bytestrings #69753

Adds checking and conversions for bytes objects, to allow them to be used with shlex.

Adds tests that cover bytestrings in the shlex module.

Allows split to return a list of bytes, given a byte input string.

github-actions · 2020-12-17T00:29:45Z

This PR is stale because it has been open for 30 days with no activity.

JelleZijlstra · 2021-04-28T02:42:07Z

Lib/test/test_shlex.py

+        safeunquoted = string.ascii_letters + string.digits + '@%_-+=:,./'
+        unsafe = '"`$\\!'
+
+        self.assertEqual(shlex.quote(b''), b"''")


Would be good to test some bytestrings with non-ASCII characters, to make sure that they're handled correctly.

I added the missing unicode characters from the testQuote test, and changes the encoding to utf-8 to account for the changes. Added in bfb76be.

Please add all characters that are supported on POSIX (not just three of them):

unicode_chars = ('ßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ' 'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ')

(I'll let you CC the chars since I am not sure about my own CC)

Adds the unicode sample characters from the `testQuote` test to the bytes version.

MaxwellDupre

All shlex test run ok.
Also example in bugs look ok.

serhiy-storchaka

I see several problems here:

If bytes is passed to shlex(), should not iteration produce bytes rather of str?
If shlex() accepts str, bytes and text files, should not it support also binary files?
join() returns str for a sequence of str and bytes for a sequence of bytes. Unless the sequence is empty, in which case it always return str. The problem is that an empty sequence of str is the same as the empty sequence of bytes.

HassanAbouelela · 2024-07-18T16:31:43Z

@serhiy-storchaka Thanks for your review.
Some of this information may be inaccurate (it has been four years haha), but I'll try my best:

If bytes is passed to shlex(), should not iteration produce bytes rather of str?

The focus of this PR was to get byte objects to work with shlex.quote, and during the process, I added support for the other standalone methods as well. These support bytes completely, and return bytes when required (conversions aside). Adding support to the shlex class was done in a limited capacity, to at the very least provide functionality without requiring a larger rewrite. No reason it can't be added afterwards. However, the use-cases/need of such features should probably be discussed.

If shlex() accepts str, bytes and text files, should not it support also binary files?

Probably out of scope for this bug/pull-request.

join() returns str for a sequence of str and bytes for a sequence of bytes. Unless the sequence is empty, in which case it always return str. The problem is that an empty sequence of stris the same as the empty sequence ofbytes.

~~Can you provide a reproduction, or clarification on sequence?~~

>>> isinstance("", bytes)
False
>>> isinstance(b"", bytes)
True
>>> "" == b""
False

HassanAbouelela · 2024-07-18T16:35:49Z

On your last point, I misunderstood, and see your point now. Don't see a possible fix here though, or the use case.

picnixz · 2024-07-21T15:38:31Z

Lib/shlex.py

@@ -22,6 +22,10 @@ def __init__(self, instream=None, infile=None, posix=False,
                 punctuation_chars=False):
        if isinstance(instream, str):
            instream = StringIO(instream)
+        elif isinstance(instream, bytes):
+            # convert byte instreams to string
+            instream = StringIO(instream.decode("ascii", "surrogateescape"))


Why would you only support ASCII strings? for POSIX platforms, you should support also additional characters (see wordchars being extended).

picnixz · 2024-07-21T15:40:46Z

Lib/shlex.py

-    return list(lex)
+
+    if isinstance(s, bytes):
+        return [i.encode("ascii") for i in lex]


Again, this should not be restricted to ASCII characters only.

picnixz · 2024-07-21T15:44:48Z

Lib/shlex.py



 def join(split_command):
    """Return a shell-escaped string from *split_command*."""
-    return ' '.join(quote(arg) for arg in split_command)
+    if len(split_command) == 0:


This behaviour should be documented, saying that a string will be returned instead of bytes for an empty sequence of bytes. Or maybe add an additional a separate function which only accepts bytes inputs (less code and no warnings and more efficient).

picnixz · 2024-07-21T15:47:17Z

Lib/shlex.py

+    cleaned = []
+    warned = False
+
+    for command in split_command:


I think it is too complicated. The str-version assumes that the objects are all strings, namely that split_command is an iterable of strings.

picnixz · 2024-07-21T15:49:00Z

Lib/shlex.py

+        # use single quotes, and put single quotes into double quotes
+        # the string $'b is then quoted as '$'"'"'b'
+        return b"'" + s.replace(b"'", b"'\"'\"'") + b"'"
+


Suggested change

picnixz · 2024-07-21T15:51:21Z

Lib/test/test_shlex.py

@@ -171,6 +171,10 @@ def testSplitPosix(self):
        """Test data splitting with posix parser"""
        self.splitTest(self.posix_data, comments=True)

+    def testSplitBytes(self):
+        """Test byte objects splitting"""
+        self.assertEqual(shlex.split(b"split words"), [b"split", b"words"])


The coverage is insufficient. Use the self.splitTest as for the string case but do it for bytes inputs instead.

In addition, use more than just ASCII characters but also those that are supported by the POSIX platforms.

picnixz · 2024-07-21T15:54:10Z

Lib/test/test_shlex.py

+        safeunquoted = string.ascii_letters + string.digits + '@%_-+=:,./'
+        unsafe = '"`$\\!'
+
+        self.assertEqual(shlex.quote(b''), b"''")


Please add all characters that are supported on POSIX (not just three of them):

unicode_chars = ('ßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ' 'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ')

(I'll let you CC the chars since I am not sure about my own CC)

picnixz · 2024-07-21T15:55:20Z

Lib/test/test_shlex.py

@@ -358,6 +382,46 @@ def testJoinRoundtrip(self):
                resplit = shlex.split(joined)
                self.assertEqual(split_command, resplit)

+    def testJoinBytes(self):


Use the same dataset as for those used in strings, especially with quotation symbols or unsafe characters.

picnixz · 2024-07-21T15:56:42Z

Misc/NEWS.d/next/Library/2020-10-12-03-46-25.bpo-25567.xgfgij.rst

@@ -0,0 +1 @@
+Add support for bytes objects in the shlex module.


Suggested change

Add support for bytes objects in the shlex module.

Add support for :class:`bytes` objects to the :mod:`shlex` module.

You should probably explain that join() returns an empty string as well in the docs. This is important. Or introduce a separate function for bytes objects (which I think would be preferrable because you wouldn't have all those warnings to handle).

python-cla-bot · 2025-04-18T09:50:10Z

The following commit authors need to sign the Contributor License Agreement:

47495861+HassanAbouelela@users.noreply.github.com

HassanAbouelela added 2 commits October 12, 2020 03:36

Add Bytes Support to shlex

d5242f7

Adds checking and conversions for bytes objects, to allow them to be used with shlex.

Add Tests for shlex

f90f33b

Adds tests that cover bytestrings in the shlex module.

the-knights-who-say-ni added the CLA signed label Oct 12, 2020

bedevere-bot added the awaiting review label Oct 12, 2020

HassanAbouelela added 2 commits October 12, 2020 03:46

Add News Blurb

ab48481

Return Bytes from Split

a00ce01

Allows split to return a list of bytes, given a byte input string.

github-actions bot added the stale Stale PR or inactive for long period of time. label Dec 17, 2020

JelleZijlstra reviewed Apr 28, 2021

View reviewed changes

HassanAbouelela added 3 commits April 30, 2021 01:23

Merge branch 'master' into bpo-25567-shlex-bytestrings

571d7c9

Adds Non-Ascii Characters To Shlex Bytes Tests

bfb76be

Adds the unicode sample characters from the `testQuote` test to the bytes version.

Merge branch 'main' into bpo-25567-shlex-bytestrings

473c81d

HassanAbouelela force-pushed the bpo-25567-shlex-bytestrings branch from 8f847e8 to 473c81d Compare September 1, 2021 19:18

MaxwellDupre approved these changes Feb 3, 2022

View reviewed changes

bedevere-bot added awaiting core review and removed awaiting review labels Feb 3, 2022

Merge branch 'main' into bpo-25567-shlex-bytestrings

f8f2c06

ezio-melotti removed the CLA signed label Jul 13, 2022

github-actions bot removed the stale Stale PR or inactive for long period of time. label Jul 30, 2022

JonasThiem mannequin mentioned this pull request Jul 12, 2023

shlex.quote doesn't work on bytestrings #69753

Open

serhiy-storchaka self-requested a review July 15, 2024 17:08

serhiy-storchaka reviewed Jul 18, 2024

View reviewed changes

picnixz requested changes Jul 21, 2024

View reviewed changes

picnixz changed the title ~~bpo-25567: Add Bytes Object Support to Shlex~~ gh-69753: Add Bytes Object Support to Shlex Jul 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-69753: Add Bytes Object Support to Shlex #22657

gh-69753: Add Bytes Object Support to Shlex #22657

HassanAbouelela commented Oct 12, 2020 •

edited by bedevere-app bot

Loading

github-actions bot commented Dec 17, 2020

JelleZijlstra Apr 28, 2021

HassanAbouelela Apr 30, 2021

picnixz Jul 21, 2024 •

edited

Loading

MaxwellDupre left a comment

serhiy-storchaka left a comment •

edited by picnixz

Loading

HassanAbouelela commented Jul 18, 2024 •

edited

Loading

HassanAbouelela commented Jul 18, 2024

picnixz Jul 21, 2024

picnixz Jul 21, 2024

picnixz Jul 21, 2024

picnixz Jul 21, 2024

picnixz Jul 21, 2024

picnixz Jul 21, 2024

picnixz Jul 21, 2024 •

edited

Loading

picnixz Jul 21, 2024

picnixz Jul 21, 2024

python-cla-bot bot commented Apr 18, 2025

		@@ -0,0 +1 @@
		Add support for bytes objects in the shlex module.

	Add support for bytes objects in the shlex module.
	Add support for :class:`bytes` objects to the :mod:`shlex` module.

gh-69753: Add Bytes Object Support to Shlex #22657

Are you sure you want to change the base?

gh-69753: Add Bytes Object Support to Shlex #22657

Conversation

HassanAbouelela commented Oct 12, 2020 • edited by bedevere-app bot Loading

github-actions bot commented Dec 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

picnixz Jul 21, 2024 • edited Loading

Choose a reason for hiding this comment

MaxwellDupre left a comment

Choose a reason for hiding this comment

serhiy-storchaka left a comment • edited by picnixz Loading

Choose a reason for hiding this comment

HassanAbouelela commented Jul 18, 2024 • edited Loading

HassanAbouelela commented Jul 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

picnixz Jul 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

python-cla-bot bot commented Apr 18, 2025

HassanAbouelela commented Oct 12, 2020 •

edited by bedevere-app bot

Loading

picnixz Jul 21, 2024 •

edited

Loading

serhiy-storchaka left a comment •

edited by picnixz

Loading

HassanAbouelela commented Jul 18, 2024 •

edited

Loading

picnixz Jul 21, 2024 •

edited

Loading