Skip to content

re.sub confusion between count and flags args #56166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mindauga mannequin opened this issue Apr 29, 2011 · 30 comments
Closed

re.sub confusion between count and flags args #56166

mindauga mannequin opened this issue Apr 29, 2011 · 30 comments
Assignees
Labels
3.7 (EOL) end of life topic-regex type-feature A feature request or enhancement

Comments

@mindauga
Copy link
Mannequin

mindauga mannequin commented Apr 29, 2011

BPO 11957
Nosy @rhettinger, @terryjreedy, @vstinner, @ericvsmith, @jwilk, @ezio-melotti, @merwok, @serhiy-storchaka
Dependencies
  • bpo-23591: enum: Add Flags and IntFlags
  • Files
  • patch_11957
  • re_keyword_only.patch
  • re_check_flags_type.patch
  • re_deprecate_positional_count.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ezio-melotti'
    closed_at = None
    created_at = <Date 2011-04-29.18:27:10.198>
    labels = ['expert-regex', 'type-feature', '3.7']
    title = 're.sub confusion between count and flags args'
    updated_at = <Date 2017-04-14.18:55:29.189>
    user = 'https://bugs.python.org/mindauga'

    bugs.python.org fields:

    activity = <Date 2017-04-14.18:55:29.189>
    actor = 'jwilk'
    assignee = 'ezio.melotti'
    closed = False
    closed_date = None
    closer = None
    components = ['Regular Expressions']
    creation = <Date 2011-04-29.18:27:10.198>
    creator = 'mindauga'
    dependencies = ['23591']
    files = ['30821', '37067', '44825', '44826']
    hgrepos = []
    issue_num = 11957
    keywords = ['patch']
    message_count = 27.0
    messages = ['134806', '134820', '134830', '135371', '135386', '135391', '136657', '143520', '186784', '186825', '186832', '186844', '186856', '192416', '230223', '230224', '230226', '230236', '230237', '230238', '230358', '230375', '277386', '277398', '291655', '291659', '291677']
    nosy_count = 13.0
    nosy_names = ['rhettinger', 'terry.reedy', 'vstinner', 'eric.smith', 'jwilk', 'ezio.melotti', 'eric.araujo', 'mrabarnett', 'python-dev', 'mindauga', 'serhiy.storchaka', 'mmilkin', 'umi']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue11957'
    versions = ['Python 3.6', 'Python 3.7']

    Linked PRs

    @mindauga
    Copy link
    Mannequin Author

    mindauga mannequin commented Apr 29, 2011

    re.sub don't substitute not ASCII characters:

    Python 2.7.1 (r271:86832, Apr 15 2011, 12:11:58) Arch Linux

    >>import re

    >>a=u'aaa'
    >>print re.search('(\w+)',a,re.U).groups()
    (u'aaa')
    >>print re.sub('(\w+)','x',a,re.U)
    x

      BUT:
    

    >>a=u'ąąą'
    >>print re.search('(\w+)',a,re.U).groups()
    (u'\u0105\u0105\u0105')
    >>print re.sub('(\w+)','x',a,re.U)
    ąąą

    @ericvsmith
    Copy link
    Member

    The 4th parameter to re.sub() is a count, not flags.

    @ezio-melotti
    Copy link
    Member

    Since this has been reported already several times (see e.g. bpo-11947), and it's a fairly common mistake, I think we should do something to avoid it.

    A few possibilities are:

    1. add a warning in the doc;
    2. make count and flag keyword-only argument (raising a deprecation warning in 3.3 and actually change it later);
    3. change the regex flags to some object that can be distinguished from ints and raise an error when a flag is passed to count;

    @ezio-melotti ezio-melotti changed the title re.sub problem with unicode string re.sub confusion between count and flags args Apr 30, 2011
    @terryjreedy
    Copy link
    Member

    I like the idea of an internal REflag class with __new__, __or__, and __repr__==str. Str(re.A|re.L) might print as
    "REflag: re.ASCII | re.IGNORE"
    If it is *not* an int subclass, any attempt to use or mix with an int would raise. I checked and the doc only promises that flags can be or'ed. An __and__ method might be added if it were thought that people currently use & to check for flags set, though that is not currently promised.

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented May 6, 2011

    Something like "<re.Flag ASCII | IGNORE>" may be more Pythonic.

    @terryjreedy
    Copy link
    Member

    Agreed, if we go that route.

    @rhettinger rhettinger self-assigned this May 14, 2011
    @merwok
    Copy link
    Member

    merwok commented May 23, 2011

    I’d favor 1) or 2) over 3). Ints are short and very commonly used for flags.

    @ezio-melotti
    Copy link
    Member

    See also bpo-12888 for an error in the stdlib caused by this.

    @ezio-melotti ezio-melotti added the type-feature A feature request or enhancement label Apr 10, 2013
    @mmilkin
    Copy link
    Mannequin

    mmilkin mannequin commented Apr 13, 2013

    I like option #2, and I was thinking of working on it today, poke me if anyone has a problem with this.

    @mmilkin
    Copy link
    Mannequin

    mmilkin mannequin commented Apr 13, 2013

    There is no sane way to issue a warning without changing the signature and we don't want to change the signature without issuing a deprecation warning for the function, so sadly option 3 is the only way for this to work, (Im going to not touch this till ENUMS are merged in.)

    @ezio-melotti
    Copy link
    Member

    Can't you use *args and **kwargs and then raise a deprecation warning if count and/or flags are in args?
    Even if enums are merged in, there might still be issues depending on their implementation.

    @mmilkin
    Copy link
    Mannequin

    mmilkin mannequin commented Apr 13, 2013

    We could do that but we would be changing the signature before adding the warning

    @ezio-melotti
    Copy link
    Member

    The change would still be backwards compatible (even though inspect.signature and similar functions might return something different). Note that I'm not saying that's the best option, but it should be doable.

    @umi
    Copy link
    Mannequin

    umi mannequin commented Jul 6, 2013

    Please see my patch, I have changed flags to be instances of IntEnum and added a check to re.sub, re.subn and re.split. The patch contains some tests. This solution also allowed me to discover several bugs in the standard library, and I am going to create tickets for them shortly.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 29, 2014

    New changeset 767fd62b59a9 by Victor Stinner in branch 'default':
    Issue bpo-11957: Explicit parameter name when calling re.split() and re.sub()
    https://hg.python.org/cpython/rev/767fd62b59a9

    @vstinner
    Copy link
    Member

    I suggest to make the 2 last parameters of re.sub(), re.subn() and re.split() parameters as keyword-only. It will break applications using count and maxsplit parameters as index parameters, but it's easy to fix these applications if they want to support also Python 3.5.

    I checked Python 2.6: the name of the maxsplit and count parameters didn't change. So it's possible to write code working on Python 2.6-3.5 if the parameter name is explicitly used:

    • re.sub("a", "a", "a", count=1)
    • re.subn("a", "a", "a", count=1)
    • re.split("a", "a", maxsplit=1)

    The flags parameter was added to re.sub(), re.subn() and re.split() functions in Python 2.7:

    See my attached re_keyword_only.patch:

    • sub(), subn(): count and flags become keyword-only parameters
    • split(): maxsplit and flags become keyword-only parameters

    @vstinner
    Copy link
    Member

    Confusion between count/maxplit and count parameters is common, duplicated issues:

    See also issue bpo-13385 which proposed an explicit "re.NOFLAGS flag".

    @serhiy-storchaka
    Copy link
    Member

    Thank you for your patch Valentina. But it makes flags combinations not pickleable.

    >>> import re, pickle
    >>> pickle.dumps(re.I|re.S, 3)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    _pickle.PicklingError: Can't pickle <enum 'SubFlag'>: attribute lookup SubFlag on sre_constants failed
    >>> pickle.dumps(re.I|re.S, 4)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    _pickle.PicklingError: Can't pickle <enum 'SubFlag'>: attribute lookup BaseFlags.__or__.<locals>.SubFlag on sre_constants failed

    And I'm afraid that creating new class in the "|" operator can affect performance.

    @serhiy-storchaka
    Copy link
    Member

    As for 767fd62b59a9, I doubt that changing positional arguments to keyword argumennts in tests is justified. This can hide a bug.

    @serhiy-storchaka
    Copy link
    Member

    And again about patch_11957. I afraid that testing isinstance(count, sre_constants.BaseFlags) on every re.sub() call will hit performance too.

    @ezio-melotti
    Copy link
    Member

    I agree about 767fd62b59a9, there should be tests for args passed both by position and as keyword args.

    Serhiy, do you think the enum solution is worth pursuing, or is it better to just turn those args to keyword-only (after a proper deprecation process)?

    @serhiy-storchaka
    Copy link
    Member

    I think that the enum solution is worth pursuing, and that we need general
    class which represents the set of specified named flags. I'm working on
    implementation of enum.IntFlags.

    @Zac-HD
    Copy link
    Contributor

    Zac-HD commented Aug 8, 2023

    I'd like to propose that we merge Serhiy's second patch, which deprecates passing count and maxsplit arguments as positional arguments.

    If necessary, I'd be happy to write or organize a PR.

    @hauntsaninja
    Copy link
    Contributor

    Thanks for bumping this, I'd forgotten about it. As you can tell by my previous comments, I think this is something we need to do. Just fixed another one of these a couple weeks ago. I'll open a PR with Serhiy's patch

    hauntsaninja added a commit to hauntsaninja/cpython that referenced this issue Aug 8, 2023
    serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Aug 8, 2023
    …e functions
    
    Deprecate passing optional arguments maxsplit, count and flags in
    module-level functions re.split(), re.sub() and re.subn() as positional.
    They should only be passed by keyword.
    @serhiy-storchaka
    Copy link
    Member

    It is an old patch and it required polishing.

    serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Aug 8, 2023
    …e functions
    
    Deprecate passing optional arguments maxsplit, count and flags in
    module-level functions re.split(), re.sub() and re.subn() as positional.
    They should only be passed by keyword.
    hauntsaninja pushed a commit that referenced this issue Aug 16, 2023
    …tions (#107778)
    
    Deprecate passing optional arguments maxsplit, count and flags in
    module-level functions re.split(), re.sub() and re.subn() as positional.
    They should only be passed by keyword.
    tchaikov added a commit to tchaikov/scylladb that referenced this issue Sep 27, 2024
    since Python 3.13, passing count to `re.sub()` as positional argument
    has been deprecated. and when runnint `test.py` with Python 3.13, we
    have following warning:
    
    ```
    /home/kefu/dev/scylladb/./test.py:1477: DeprecationWarning: 'count' is passed as positional argument
      args.tests = set(re.sub(r'.* List configured unit tests\n(.*)\n', r'\1', out, 1, re.DOTALL).split("\n"))
    ```
    
    see also python/cpython#56166
    
    in order to silence this distracting warning, let's pass
    `count` using kwarg.
    
    Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
    xemul pushed a commit to scylladb/scylladb that referenced this issue Sep 30, 2024
    since Python 3.13, passing count to `re.sub()` as positional argument
    has been deprecated. and when runnint `test.py` with Python 3.13, we
    have following warning:
    
    ```
    /home/kefu/dev/scylladb/./test.py:1477: DeprecationWarning: 'count' is passed as positional argument
      args.tests = set(re.sub(r'.* List configured unit tests\n(.*)\n', r'\1', out, 1, re.DOTALL).split("\n"))
    ```
    
    see also python/cpython#56166
    
    in order to silence this distracting warning, let's pass
    `count` using kwarg.
    
    Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
    
    Closes #20859
    paul-elliott-arm added a commit to paul-elliott-arm/mbedtls-framework that referenced this issue Oct 25, 2024
    Python 13 requires count to be passed as a kwarg rather than a
    positional arg because of potential confusion with flags, see
    python/cpython#56166
    
    Signed-off-by: Paul Elliott <paul.elliott@arm.com>
    paul-elliott-arm added a commit to paul-elliott-arm/mbedtls-framework that referenced this issue Nov 13, 2024
    Python 13 requires count to be passed as a kwarg rather than a
    positional arg because of potential confusion with flags, see
    python/cpython#56166
    
    Signed-off-by: Paul Elliott <paul.elliott@arm.com>
    tchaikov added a commit to tchaikov/scylladb that referenced this issue Dec 31, 2024
    since Python 3.13, passing count to `re.sub()` as positional argument
    has been deprecated. and when runnint `test.py` with Python 3.13, we
    have following warning:
    
    ```
    /home/kefu/dev/scylladb/./test.py:1477: DeprecationWarning: 'count' is passed as positional argument
      args.tests = set(re.sub(r'.* List configured unit tests\n(.*)\n', r'\1', out, 1, re.DOTALL).split("\n"))
    ```
    
    see also python/cpython#56166
    
    in order to silence this distracting warning, let's pass
    `count` using kwarg.
    
    this change was created in the same spirit of c3be4a3.
    
    Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
    tchaikov added a commit to tchaikov/scylladb that referenced this issue Dec 31, 2024
    since Python 3.13, passing count to `re.sub()` as positional argument
    has been deprecated. and when runnint `test.py` with Python 3.13, we
    have following warning:
    
    ```
    /home/kefu/dev/scylladb/./test.py:1540: DeprecationWarning: 'count' is passed as positional argument
      args.modes = re.sub(r'.* List configured modes\n(.*)\n', r'\1',
    ```
    
    see also python/cpython#56166
    
    in order to silence this distracting warning, let's pass
    `count` using kwarg.
    
    this change was created in the same spirit of c3be4a3.
    
    Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
    avikivity pushed a commit to scylladb/scylladb that referenced this issue Jan 6, 2025
    since Python 3.13, passing count to `re.sub()` as positional argument
    has been deprecated. and when runnint `test.py` with Python 3.13, we
    have following warning:
    
    ```
    /home/kefu/dev/scylladb/./test.py:1540: DeprecationWarning: 'count' is passed as positional argument
      args.modes = re.sub(r'.* List configured modes\n(.*)\n', r'\1',
    ```
    
    see also python/cpython#56166
    
    in order to silence this distracting warning, let's pass
    `count` using kwarg.
    
    this change was created in the same spirit of c3be4a3.
    
    Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
    
    Closes #22085
    aarongable pushed a commit to chromium/chromium that referenced this issue Feb 25, 2025
    …/third_party/crashpad/update.py:231: DeprecationWarning: 'count' is passed as positional argument
      readme_content_new = re.sub(
    
    https://docs.python.org/3/library/re.html:
    > Deprecated since version 3.13: Passing count and flags as positional
    > arguments is deprecated. In future Python versions they will be
    > keyword-only parameters.
    
    python/cpython#56166,
    python/cpython#107778,
    https://docs.python.org/3/whatsnew/3.13.html#:~:text=gh%2D66543.)-,re%3A,-Deprecate%20passing%20the
    
    Change-Id: Id7909960a107163753ccceb5ebe66c639fe02833
    Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6297851
    Commit-Queue: Peter Boström <pbos@chromium.org>
    Auto-Submit: Mark Mentovai <mark@chromium.org>
    Reviewed-by: Peter Boström <pbos@chromium.org>
    Cr-Commit-Position: refs/heads/main@{#1424251}
    hubot pushed a commit to chromium/crashpad that referenced this issue Feb 25, 2025
    https://docs.python.org/3/library/re.html:
    > Deprecated since version 3.13: Passing count and flags as positional
    > arguments is deprecated. In future Python versions they will be
    > keyword-only parameters.
    
    python/cpython#56166,
    python/cpython#107778,
    https://docs.python.org/3/whatsnew/3.13.html#:~:text=gh%2D66543.)-,re%3A,-Deprecate%20passing%20the
    
    Change-Id: Ibda1394927062dd9e0353e2b9d643b510070deb6
    Reviewed-on: https://chromium-review.googlesource.com/c/crashpad/crashpad/+/6298792
    Reviewed-by: Leonard Grey <lgrey@chromium.org>
    Commit-Queue: Mark Mentovai <mark@chromium.org>
    gm4sl pushed a commit to backtrace-labs/crashpad that referenced this issue Apr 1, 2025
    https://docs.python.org/3/library/re.html:
    > Deprecated since version 3.13: Passing count and flags as positional
    > arguments is deprecated. In future Python versions they will be
    > keyword-only parameters.
    
    python/cpython#56166,
    python/cpython#107778,
    https://docs.python.org/3/whatsnew/3.13.html#:~:text=gh%2D66543.)-,re%3A,-Deprecate%20passing%20the
    
    Change-Id: Ibda1394927062dd9e0353e2b9d643b510070deb6
    Reviewed-on: https://chromium-review.googlesource.com/c/crashpad/crashpad/+/6298792
    Reviewed-by: Leonard Grey <lgrey@chromium.org>
    Commit-Queue: Mark Mentovai <mark@chromium.org>
    millerdev added a commit to dimagi/commcare-hq that referenced this issue Apr 8, 2025
    DeprecationWarning: 'maxsplit' is passed as positional argument
    
    python/cpython#56166
    millerdev added a commit to dimagi/commcare-hq that referenced this issue Apr 8, 2025
    DeprecationWarning: 'maxsplit' is passed as positional argument
    
    python/cpython#56166
    millerdev added a commit to dimagi/commcare-hq that referenced this issue Apr 8, 2025
    DeprecationWarning: 'maxsplit' is passed as positional argument
    
    python/cpython#56166
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life topic-regex type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    9 participants