Skip to content

Fractions instantiation revisited #72902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wm75 mannequin opened this issue Nov 16, 2016 · 14 comments
Open

Fractions instantiation revisited #72902

wm75 mannequin opened this issue Nov 16, 2016 · 14 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@wm75
Copy link
Mannequin

wm75 mannequin commented Nov 16, 2016

BPO 28716
Nosy @rhettinger, @mdickinson, @scoder, @stevendaprano, @serhiy-storchaka, @jdemeyer, @eric-wieser, @wm75, @MojoVampire
Files
  • fractions.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2016-11-16.18:06:27.650>
    labels = ['3.8', 'type-bug', 'library']
    title = 'Fractions instantiation revisited'
    updated_at = <Date 2019-08-05.09:04:19.247>
    user = 'https://github.com/wm75'

    bugs.python.org fields:

    activity = <Date 2019-08-05.09:04:19.247>
    actor = 'jdemeyer'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2016-11-16.18:06:27.650>
    creator = 'wolma'
    dependencies = []
    files = ['45507']
    hgrepos = []
    issue_num = 28716
    keywords = ['patch']
    message_count = 13.0
    messages = ['280977', '280981', '280995', '280997', '280998', '280999', '305611', '310470', '313208', '313741', '313753', '313786', '349037']
    nosy_count = 9.0
    nosy_names = ['rhettinger', 'mark.dickinson', 'scoder', 'steven.daprano', 'serhiy.storchaka', 'jdemeyer', 'Eric.Wieser', 'wolma', 'josh.r']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue28716'
    versions = ['Python 3.8']

    Linked PRs

    @wm75
    Copy link
    Mannequin Author

    wm75 mannequin commented Nov 16, 2016

    I've spent a bit of time lately trying to optimize the instantiation of Fractions. This is related to bpo-22464, but instead of focusing on constructing Fractions from ints, my attempts revolve around improving instantiation from strings, floats and decimals.
    I'm attaching a patch with all my changes for discussion, but here's an overview of what's in it:

    CHANGES TO INSTANTIATION FROM STRINGS:

    • isinstance checking for str is performed before the more expensive check for numbers.Rational (which has to go through abc)

    • instantiation from strings doesn't use a regex pattern anymore for parsing; this is faster in many cases (and never slower) than the current version

    • while reimplementing string parsing I added PEP-515 support (this is the only behavior change in the patch)

    combined this gives me the following performance changes for instantiation of Fractions from different arguments (profiled with 1,000,000 random inputs each):

    'x/y'-type of strings:
    ----------------------
    ncalls tottime percall cumtime percall filename:lineno(function)
    old:
    1000000 12.162 0.000 27.778 0.000 fractions.py:84(new)
    new:
    1000000 9.619 0.000 12.428 0.000 frc.py:68(new)

    scientific notation (e.g., '1.234E-7'):
    ---------------------------------------
    ncalls tottime percall cumtime percall filename:lineno(function)
    old:
    1000000 18.209 0.000 37.341 0.000 fractions.py:84(new)
    new:
    1000000 15.509 0.000 21.252 0.000 frc.py:68(new)

    integer strings (e.g. '123456'):
    --------------------------------
    ncalls tottime percall cumtime percall filename:lineno(function)
    old:
    1000000 11.272 0.000 26.201 0.000 fractions.py:84(new)
    new:
    1000000 9.347 0.000 12.425 0.000 frc.py:68(new)

    from other Fractions (to test effect on instantiation of numbers.Rational):
    ncalls tottime percall cumtime percall filename:lineno(function)
    old:
    1000000 4.708 0.000 10.093 0.000 fractions.py:84(new)
    new:
    1000000 4.835 0.000 10.572 0.000 frc.py:68(new)

    from int subclass (as another numbers.Rational):
    ncalls tottime percall cumtime percall filename:lineno(function)
    old:
    1000000 3.446 0.000 8.044 0.000 fractions.py:84(new)
    new:
    1000000 3.795 0.000 8.836 0.000 frc.py:68(new)

    SUMMARY of this part
    ====================

    + very significant speedup for instatiation from strings of different forms with (near) negligible effects on instantiation from numbers.Rational.

    + correct parsing of PEP-515-like number strings

    • possibly slower error bubbling with invalid input (untested)

    CHANGES TO INSTANTIATION FROM FLOAT AND DECIMAL:

    • no explicit check for float and decimal in standard constructor; instead, simply try to call as_integer_ratio as last resort (even after checking for numbers.Rational)

    • speed up alternate from_float and from_decimal constructor class methods by rearranging type checks and making use of _normalize flag

    • import decimal only on demand (when using from_decimal)

    Resulting performance changes:

    standard constructor with float:
    --------------------------------
    ncalls tottime percall cumtime percall filename:lineno(function)
    old:
    1000000 4.362 0.000 12.452 0.000 fractions.py:84(new)
    new:
    1000000 6.693 0.000 16.020 0.000 frc.py:68(new)

    from_float:
    -----------
    ncalls tottime percall cumtime percall filename:lineno(function)
    old:
    1000000 3.146 0.000 17.769 0.000 fractions.py:193(from_float)
    new:
    1000000 2.544 0.000 7.591 0.000 frc.py:205(from_float)

    standard constructor with decimal:
    --------------------------------
    ncalls tottime percall cumtime percall filename:lineno(function)
    old:
    1000000 4.672 0.000 20.795 0.000 fractions.py:84(new)
    new:
    1000000 7.097 0.000 24.526 0.000 frc.py:68(new)

    from_decimal:
    -------------
    ncalls tottime percall cumtime percall filename:lineno(function)
    old:
    1000000 5.054 0.000 34.473 0.000 fractions.py:207(from_decimal)
    new:
    1000000 2.942 0.000 16.013 0.000 frc.py:220(from_decimal)

    SUMMARY of this part:

    • standard constructor becomes a bit slower for floats and Decimals
      specialized class methods become a lot faster (for Decimal the benchmarks are based on _pydecimal - with the C implementation the percent differences would be even larger)
    • eliminates decimal from regularly imported modules
    • allows Fraction instantiation from duck-typing classes that provide as_integer_ratio

    I hope at least some of this is interesting.

    @wm75 wm75 mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Nov 16, 2016
    @serhiy-storchaka
    Copy link
    Member

    Profiling give you only approximate results. In normal execution the effect of your changes can be opposite. Could you please provide benchmarks without using profiling?

    @wm75
    Copy link
    Mannequin Author

    wm75 mannequin commented Nov 16, 2016

    sure, I just happened to have the profiling available since I used it to optimize things. Here's similar microbenchmarks using perf:

    STRINGS
    =======

    $ python -m perf timeit -s "from fractions import Fraction" "Fraction('1.23456e-7')"
    .....................
    Median +- std dev: 17.0 us +- 0.3 us
    
    $ python -m perf timeit -s "from frc import Fraction" "Fraction('1.23456e-7')"
    .....................
    Median +- std dev: 8.95 us +- 0.16 us
    
    
    $ python -m perf timeit -s "from fractions import Fraction" "Fraction('234/567')"
    .....................
    Median +- std dev: 12.6 us +- 0.1 us
    
    $ python -m perf timeit -s "from frc import Fraction" "Fraction('234/567')"
    .....................
    Median +- std dev: 5.45 us +- 0.16 us
    
    
    $ python -m perf timeit -s "from fractions import Fraction" "Fraction('123456')"
    .....................
    Median +- std dev: 12.4 us +- 0.6 us
    
    $ python -m perf timeit -s "from frc import Fraction" "Fraction('123456')"
    .....................
    Median +- std dev: 5.77 us +- 0.12 us
    
    
    $ python -m perf timeit -s "from fractions import Fraction; f=Fraction(3/4)" "Fraction(f)"
    .....................
    Median +- std dev: 4.36 us +- 0.06 us
    
    $ python -m perf timeit -s "from frc import Fraction; f=Fraction(3/4)" "Fraction(f)"
    .....................
    Median +- std dev: 4.59 us +- 0.07 us
    
    
    $ python -m perf timeit -s "from fractions import Fraction" -s "class myInt(int): pass" -s "i=myInt(123456)" "Fraction(i)"
    .....................
    Median +- std dev: 4.04 us +- 0.07 us
    
    $ python -m perf timeit -s "from frc import Fraction" -s "class myInt(int): pass" -s "i=myInt(123456)" "Fraction(i)"
    .....................
    Median +- std dev: 4.27 us +- 0.06 us

    FLOATS
    ======

    $ python -m perf timeit -s "from fractions import Fraction" "Fraction(1.23456e-7)"
    .....................
    Median +- std dev: 6.30 us +- 0.28 us
    
    $ python -m perf timeit -s "from frc import Fraction" "Fraction(1.23456e-7)"
    .....................
    Median +- std dev: 8.64 us +- 0.13 us
    
    
    $ python -m perf timeit -s "from fractions import Fraction" "Fraction.from_float(1.23456e-7)"
    .....................
    Median +- std dev: 8.68 us +- 0.14 us
    
    $ python -m perf timeit -s "from frc import Fraction" "Fraction.from_float(1.23456e-7)"
    .....................
    Median +- std dev: 3.37 us +- 0.17 us

    DECIMALS (using C implementation this time)
    ===========================================

    $ python -m perf timeit -s "from fractions import Fraction; from decimal import Decimal; d=Decimal('123456')" "Fraction(d)".....................
    Median +- std dev: 6.95 us +- 0.21 us
    
    $ python -m perf timeit -s "from frc import Fraction; from decimal import Decimal; d=Decimal('123456')" "Fraction(d)"
    .....................
    Median +- std dev: 8.43 us +- 0.17 us
    
    
    $ python -m perf timeit -s "from fractions import Fraction; from decimal import Decimal; d=Decimal('123456')" "Fraction.from_decimal(d)"
    .....................
    Median +- std dev: 11.6 us +- 0.2 us
    
    $ python -m perf timeit -s "from frc import Fraction; from decimal import Decimal; d=Decimal('123456')" "Fraction.from_decimal(d)"
    .....................
    Median +- std dev: 4.14 us +- 0.28 us

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Nov 16, 2016

    Might it make sense to make instantiation from numbers.Rational duck typing based as well? Just try to get the numerator and denominator attributes, on AttributeError, skip it and move on. Unless there is some concern that a non-Rational type might have both attributes and not intend them to mean it's a Rational?

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Nov 16, 2016

    Similarly, type checking for int might be replaced with calling operator.index on the input and handling the TypeError; that way, anything that has declared itself logically int-like is handled without explicit type checking at all.

    @wm75
    Copy link
    Mannequin Author

    wm75 mannequin commented Nov 16, 2016

    No, I don't think the numeric tower ABC should be replaced by duck-typing. One of the very reasons the fractions module exists is that it showcases how to use the numeric tower. If you want a class to be picked up as a Rational it should be registered as such. Pretty much the same goes for integer-like things. The direct type check for ints only exists to speed up the most obvious case, but anything integer-like should be registered as a numbers.Integral and it will just work with fractions. Beautiful stuff!

    This is not the case for as_integer_ratio because that method is not defined in the numeric tower and this is why I think duck-typing could be of interest here (I'm not sure though whether its worth the somewhat decreased performance). My reasoning is that the standard constructor is anyway slower for floats and ints than the specialized classmethods for the two so people who really care about speed here (probably not too many) can just use the classmethods.

    @eric-wieser
    Copy link
    Mannequin

    eric-wieser mannequin commented Nov 5, 2017

    allows Fraction instantiation from duck-typing classes that provide as_integer_ratio

    This would allow the numpy np.floating types to take part in Fraction conversion as well, which would be great.

    As far as I can tell, Decimal already follows this duck-typing, so it would be a shame for other modules not to.

    Perhaps an num, den = operator.rational(x) is needed to unify this code across the places that coerce rational numbers.

    @scoder
    Copy link
    Contributor

    scoder commented Jan 23, 2018

    Not sure if it's relevant for this specific change, but here's a benchmark that you could use for Fractions: bpo-22458

    @scoder
    Copy link
    Contributor

    scoder commented Mar 4, 2018

    Just FYI and as further motivation, I reimplemented this dedicated parser for quicktions (in Cython, so the timings and speedups are not comparable).

    scoder/quicktions@cc034e0

    I was able to get another quite visible improvement by caching the values of "10 ** shift" for 0 <= shift < 58 in a tuple. Higher values are very unlikely in practice, and the memory size of a tuple with 58 values gives a nice multiple of the usual cache line size. (I originally stored 64 values in my commit but then cut it down later.)

    scoder/quicktions@c20add5

    I suspect that the difference won't be as big for the Python implementation, but it still seems worth a try.

    The overall speedup that I got, compared to the initial regex implementation, is 50-70%.

    [regex] $ python3.7 -m timeit -s 'from quicktions import Fraction as F' 'F("153456/789344")'
    200000 loops, best of 5: 1.19 usec per loop
    [native] $ python3.7 -m timeit -s 'from quicktions import Fraction as F' 'F("153456/789344")'
    500000 loops, best of 5: 593 nsec per loop

    [regex] $ python3.7 -m timeit -s 'from quicktions import Fraction as F' 'F("15.3456789E+4")'
    100000 loops, best of 5: 2.3 usec per loop
    [native] $ python3.7 -m timeit -s 'from quicktions import Fraction as F' 'F("15.3456789E+4")'
    500000 loops, best of 5: 667 nsec per loop

    It could be even higher if I additionally moved the int() integer parsing into Cython. Might try that at some point. But that's also not something that concerns the implementation in CPython.

    @scoder scoder added 3.8 (EOL) end of life and removed 3.7 (EOL) end of life labels Mar 4, 2018
    @mdickinson
    Copy link
    Member

    [Eric Wieser]

    This would allow the numpy np.floating types to take part in Fraction conversion as well, which would be great.

    I'm confused: as far as I can tell, the NumPy floating-point types don't implement as_integer_ratio. (Except for np.float64, which subclasses float and so inherits is_integer_ratio from it.) Or are you suggesting that if they were to implement as_integer_ratio at some point in the future, then they'd become compatible with Fraction?

    @eric-wieser
    Copy link
    Mannequin

    eric-wieser mannequin commented Mar 13, 2018

    Are you suggesting that if they were to implement as_integer_ratio at some point in the future, then they'd become compatible with Fraction

    Yes, exactly. Conversely, there's little gain in implementing it until Fraction supports calling it.

    @rhettinger
    Copy link
    Contributor

    FYI, adding int.as_integer_ratio() is being taken care in a separate issue: https://bugs.python.org/issue33073 . I've set it aside for a beginning core-dev mentee to work on because it is simple and self-contained. Pretty much all the related work can be done here.

    @jdemeyer
    Copy link
    Contributor

    jdemeyer commented Aug 5, 2019

    See https://discuss.python.org/t/pep-3141-ratio-instead-of-numerator-denominator/2037/24?u=jdemeyer for a proposal to define a new dunder __ratio__ (instead of as_integer_ratio) for this.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    skirpichev added a commit to skirpichev/cpython that referenced this issue May 1, 2025
    @skirpichev skirpichev self-assigned this May 1, 2025
    @skirpichev skirpichev added type-feature A feature request or enhancement performance Performance or resource usage and removed type-bug An unexpected behavior, bug, or error 3.8 (EOL) end of life labels May 1, 2025
    @skirpichev
    Copy link
    Member

    @wm75, here is a pr to speedup from_*() methods: #133251.

    Let me know if you wish to continue your work on the Fraction constructor. Or I can prepare a patch and mention you as co-author.

    IIUIC, at this point relevant parts of your proposal are: 1) move "str" case up (over numbers.Rational), 2) reimplement string parsing without regexps.

    Below my simple benchmarks. The patch1 just do 1), the patch2 (included below) - solves also 2).

    Benchmark ref patch1 patch2
    Fraction(myint) 5.29 us 6.71 us: 1.27x slower 6.67 us: 1.26x slower
    Fraction('123') 17.8 us 14.9 us: 1.19x faster 9.31 us: 1.91x faster
    Fraction('1/3') 18.3 us 15.3 us: 1.20x faster 9.17 us: 1.99x faster
    Fraction('1.2e-3') 24.8 us 21.9 us: 1.13x faster 14.3 us: 1.73x faster
    Fraction('-.2') 22.2 us 19.3 us: 1.15x faster 13.5 us: 1.65x faster
    Geometric mean (ref) 1.08x faster 1.54x faster
    # bench.py
    import pyperf
    from fractions import Fraction as F
    from numbers import Integral
    
    runner = pyperf.Runner()
    s = 'Fraction'
    class myint:
        numerator = 123
        denominator = 1
        def __int__(self):
            return 123
        def __repr__(self):
            return "myint"
    Integral.register(myint)
    for v in [myint(), "123", "1/3", "1.2e-3", "-.2"]:
        r = s + '(' + repr(v) + ')'
        runner.bench_func(r, F, v)
    diff --git a/Lib/fractions.py b/Lib/fractions.py
    index fa722589fb..55d1113b4a 100644
    --- a/Lib/fractions.py
    +++ b/Lib/fractions.py
    @@ -53,20 +53,6 @@ def _hash_algorithm(numerator, denominator):
         result = hash_ if numerator >= 0 else -hash_
         return -2 if result == -1 else result
     
    -_RATIONAL_FORMAT = re.compile(r"""
    -    \A\s*                                  # optional whitespace at the start,
    -    (?P<sign>[-+]?)                        # an optional sign, then
    -    (?=\d|\.\d)                            # lookahead for digit or .digit
    -    (?P<num>\d*|\d+(_\d+)*)                # numerator (possibly empty)
    -    (?:                                    # followed by
    -       (?:\s*/\s*(?P<denom>\d+(_\d+)*))?   # an optional denominator
    -    |                                      # or
    -       (?:\.(?P<decimal>\d*|\d+(_\d+)*))?  # an optional fractional part
    -       (?:E(?P<exp>[-+]?\d+(_\d+)*))?      # and optional exponent
    -    )
    -    \s*\Z                                  # and optional whitespace to finish
    -""", re.VERBOSE | re.IGNORECASE)
    -
     
     # Helpers for formatting
     
    @@ -238,11 +224,6 @@ def __new__(cls, numerator=0, denominator=None):
                     self._denominator = 1
                     return self
     
    -            elif isinstance(numerator, numbers.Rational):
    -                self._numerator = numerator.numerator
    -                self._denominator = numerator.denominator
    -                return self
    -
                 elif (isinstance(numerator, float) or
                       (not isinstance(numerator, type) and
                        hasattr(numerator, 'as_integer_ratio'))):
    @@ -252,31 +233,56 @@ def __new__(cls, numerator=0, denominator=None):
     
                 elif isinstance(numerator, str):
                     # Handle construction from strings.
    -                m = _RATIONAL_FORMAT.match(numerator)
    -                if m is None:
    +                fraction_literal = numerator
    +                num, _, denom = fraction_literal.partition('/')
    +                try:
    +                    num = num.strip()
    +                    denom = denom.strip()
    +                    if _ == '':
    +                        # Numerator-only form allows for optional fractional
    +                        # and exponent parts.
    +                        # Partition the fraction literal on the exponent sign,
    +                        # while ignoring leading and trailing whitespace
    +                        # and the case of the exponent sign.
    +                        num, _, exp = num.replace('e', 'E').partition('E')
    +                        if _ and not exp:
    +                            # No exponent notation without value!
    +                            raise ValueError
    +                        num, _, decimal = num.partition('.')
    +                        shift = 0
    +                        if decimal:
    +                            # Zero-pad integer-less fraction literals.
    +                            if num == '' or num == '+':
    +                                num = '0'
    +                            elif num == '-':
    +                                num += '0'
    +                            # Joining literal parts by underscores ensures
    +                            # that result strings can only be parsed as ints
    +                            # if there parts were parseable.
    +                            num = f'{num}_{decimal}'
    +                            shift -= len(decimal)-decimal.count('_')
    +                        if exp:
    +                            shift += int(exp)
    +                        numerator = int(num)
    +                        denominator = 1
    +                        if shift > 0:
    +                            numerator *= 10 ** shift
    +                        elif shift < 0:
    +                            denominator = 10 ** -shift
    +
    +                    elif num and denom and num[-1].isdigit() and denom[0].isdigit():
    +                        numerator = int(num)
    +                        denominator = int(denom)
    +                    else:
    +                        raise ValueError
    +                except ValueError:
                         raise ValueError('Invalid literal for Fraction: %r' %
    -                                     numerator)
    -                numerator = int(m.group('num') or '0')
    -                denom = m.group('denom')
    -                if denom:
    -                    denominator = int(denom)
    -                else:
    -                    denominator = 1
    -                    decimal = m.group('decimal')
    -                    if decimal:
    -                        decimal = decimal.replace('_', '')
    -                        scale = 10**len(decimal)
    -                        numerator = numerator * scale + int(decimal)
    -                        denominator *= scale
    -                    exp = m.group('exp')
    -                    if exp:
    -                        exp = int(exp)
    -                        if exp >= 0:
    -                            numerator *= 10**exp
    -                        else:
    -                            denominator *= 10**-exp
    -                if m.group('sign') == '-':
    -                    numerator = -numerator
    +                                     fraction_literal)
    +
    +            elif isinstance(numerator, numbers.Rational):
    +                self._numerator = numerator.numerator
    +                self._denominator = numerator.denominator
    +                return self
     
                 else:
                     raise TypeError("argument should be a string or a Rational "

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants