Skip to content

gh-87790: support thousands separators for formatting fractional part of floats #125304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Feb 25, 2025

Conversation

skirpichev
Copy link
Contributor

@skirpichev skirpichev commented Oct 11, 2024

>>> f"{123_456.123_456:_._f}"  # Whole and fractional
'123_456.123_456'
>>> f"{123_456.123_456:_f}"    # Integer component only
'123_456.123456'
>>> f"{123_456.123_456:._f}"   # Fractional component only
'123456.123_456'
>>> f"{123_456.123_456:.4_f}"  # with precision (._4f is illegal now)
'123456.123_5'
>>> f"{123_456.123_456:.4,f}"  # with comma
'123456.123,5'

Notes for reviewers.

  • This is a simplest version, using _PyUnicode_InsertThousandsGrouping() like for the integer part. But perhaps we should insert digits in the fractional part, starting from the least significand digit instead, i.e. last example will print 123456.123_5 instead. Second commit puts separators, iterating from the least significand digit of fractional part.
  • The issue thread has several suggestions for grouping: e.g. 3 or 5. 3 was chosen.
  • Maybe grouping length should be configurable (e.g. ._3f - for grouping in three digits).
  • Heh, just for completeness: maybe break compatibility (see this)? I.e. _f will be "use underscore's both in the integer and the fractional part".
  • Also comma as a separator? (Perhaps, it's better to do in a separate pr.)

📚 Documentation preview 📚: https://cpython-previews--125304.org.readthedocs.build/

…floats

```pycon
>>> f"{123_456.123_456:_._f}"  # Whole and fractional
'123_456.123_456'
>>> f"{123_456.123_456:_f}"    # Integer component only
'123_456.123456'
>>> f"{123_456.123_456:._f}"   # Fractional component only
'123456.123_456'
>>> f"{123_456.123_456:.4_f}"  # with precision
'123456.1_235'
```
@skirpichev skirpichev force-pushed the fmt-frac-grouping-87790 branch from b648865 to 5858d8c Compare October 11, 2024 09:04
@skirpichev skirpichev marked this pull request as ready for review October 11, 2024 09:46
Copy link
Contributor

@rruuaanng rruuaanng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@skirpichev skirpichev changed the title gh-87790: support underscore for formatting fractional part of floats gh-87790: support underscore/comma for formatting fractional part of floats Nov 16, 2024
@skirpichev skirpichev changed the title gh-87790: support underscore/comma for formatting fractional part of floats gh-87790: support thousands separators for formatting fractional part of floats Nov 16, 2024
@skirpichev skirpichev requested a review from vstinner November 16, 2024 04:01
@skirpichev
Copy link
Contributor Author

(JFR, review request on d.p.o: https://discuss.python.org/t/71229)


/* Determine the grouping, separator, and decimal point, if any. */
if (get_locale_info(format->type == 'n' ? LT_CURRENT_LOCALE :
format->thousands_separators,
format->frac_thousands_separator,
Copy link
Contributor

@nineteendo nineteendo Nov 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we going to do with the locale? At the very least I would reject specifying the separator manually:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'af_za')
>>> print(f"{123_456.123_456:.12,n}")
123.456,123,456 # <- 2 decimal points

Would it be a breaking change to do this?

>>> print(f"{123_456.123_456:.12n}")
123.456,123.456

Copy link
Contributor Author

@skirpichev skirpichev Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just reject separators with the n type.

Edit:

Would it be a breaking change to do this?

Yes, this will be a compatibility break.

IMO, without a legacy - it would be better to use thousands separators both for integer and fractional part. Then we could do same with n type formatting.

Edit2:

Maybe we should try something more closer to the original implementation? I.e. with a distinct optional frac_grouping option, which can have _ value. If present:

  1. for floating-point format types that means: "apply current settings for thousands separators both to integer and fractional part";
  2. for n type that means: "apply current locale settings for thousands separators both to integer and fractional part".

This is less flexible (you can't put separators to the fractional part only), but will more consistently interact with locale.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible to use the underscore separator with n format:

>>> format(1234, "_n")
ValueError: Cannot specify '_' with 'n'.

So I agree that we should reject separators with the n type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See invalid_thousands_separator_type() in parse_internal_render_format_spec().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vstinner, this check already added. Unfortunately, that means with n type we can have locale-dependent separators only for integer part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, should we allow different separators? So far it's permitted:

>>> f"{122222.22222:_.7,f}"
'122_222.222,220,0'


/* Determine the grouping, separator, and decimal point, if any. */
if (get_locale_info(format->type == 'n' ? LT_CURRENT_LOCALE :
format->thousands_separators,
format->frac_thousands_separator,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible to use the underscore separator with n format:

>>> format(1234, "_n")
ValueError: Cannot specify '_' with 'n'.

So I agree that we should reject separators with the n type.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the multiple updates.

@vstinner
Copy link
Member

@serhiy-storchaka: Would you mind to review this change?

@@ -312,7 +312,7 @@ non-empty format specification typically modifies the result.
The general form of a *standard format specifier* is:

.. productionlist:: format-spec
format_spec: [[`fill`]`align`][`sign`]["z"]["#"]["0"][`width`][`grouping_option`]["." `precision`][`type`]
format_spec: [[`fill`]`align`][`sign`]["z"]["#"]["0"][`width`][`grouping_option`]["." `precision` [`grouping_option`]][`type`]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not accurate. More accurate would be ["." `precision` [`grouping_option`] | "." `grouping_option`] or ["." (`precision` [`grouping_option`] | `grouping_option`)] or ["." [`precision`][`grouping_option`]] with addition that either precision or grouping_option should be specified after ".".

Copy link
Contributor Author

@skirpichev skirpichev Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

.. productionlist:: format-spec
   format_spec: [`options`][`width_and_precision`][`type`]
   options: [[`fill`]`align`][`sign`]["z"]["#"]["0"]
   fill: <any character>
   align: "<" | ">" | "=" | "^"
   sign: "+" | "-" | " "
   width_and_precision: [`width_with_grouping`][`precision_with_grouping`]
   width_with_grouping: [`width`][`grouping_option`]
   precision_with_grouping: "." (`precision`[`grouping_option`] | `grouping_option`)
   width: `~python-grammar:digit`+
   grouping_option: "_" | ","
   precision: `~python-grammar:digit`+
   type: "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g"
       : | "G" | "n" | "o" | "s" | "x" | "X" | "%"

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant [...] in precision_with_grouping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

pos++;
}

while (pos<end && Py_ISDIGIT(PyUnicode_READ(kind, data, pos))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in the if block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, which if?

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this the way we will go, the implementation LGTM.

@skirpichev
Copy link
Contributor Author

@vstinner, @serhiy-storchaka should we ask someone else to review this pr?

@vstinner vstinner merged commit f39a07b into python:main Feb 25, 2025
42 checks passed
@vstinner
Copy link
Member

I merged your PR, thanks.

@skirpichev skirpichev deleted the fmt-frac-grouping-87790 branch February 25, 2025 16:05
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
…l part of floats (python#125304)

```pycon
>>> f"{123_456.123_456:_._f}"  # Whole and fractional
'123_456.123_456'
>>> f"{123_456.123_456:_f}"    # Integer component only
'123_456.123456'
>>> f"{123_456.123_456:._f}"   # Fractional component only
'123456.123_456'
>>> f"{123_456.123_456:.4_f}"  # with precision
'123456.1_235'
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants