Skip to content

RFC: doc: extensions: Explain how printf/seq handle precision #7641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

drinkcat
Copy link
Contributor

@drinkcat drinkcat commented Apr 3, 2025

There are some difference in behaviour vs GNU coreutils, explain what those are.


RFC, I'd like to get opinions on this. This is basically how uutils will behave after #7631. I think it's better to just say we support arbitrary precision, rather than trying to downgrade accuracy to emulate 64/80/128-bit float behavior.

In particular, for seq, the GNU coreutils intentionally does not give guarantees in terms of precision (https://www.gnu.org/software/coreutils/manual/coreutils.html#seq-invocation), so I feel it's ok to do better.

Be careful when using seq with outlandish values: otherwise you may see surprising results, as seq uses floating point internally. For example, on the x86 platform, where the internal representation uses a 64-bit fraction, the command:
seq 1 0.0000000000000000001 1.0000000000000000009
outputs 1.0000000000000000007 twice and skips 1.0000000000000000008.

There is nothing mentioned in printf about precision, so I think providing arbitrary precision is also fair. Hexadecimal floating point is a bit complicated, so worth mentioning what architecture-specific behavior we picked.

(there are quite a few issues related to this, #7186, #5759, #6244 at least)

Copy link

github-actions bot commented Apr 3, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

@drinkcat
Copy link
Contributor Author

drinkcat commented Apr 3, 2025

@tertsdiepraam @RenjiSann @jfinkels @sylvestre FYI (trying to figure out who touched floating point code and might be interested ,-))

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a great idea to document these differences!

I think my position would be that these differences are acceptable, since they are correct and also any script that parses these numbers would need to support the formats that we print.

Additionally, I think that our behaviour is more portable, which is a nice bonus.

However, the arbitrary precision might be a performance problem? Have you checked that? If that is negligible, I feel this is acceptable.

@@ -97,3 +176,5 @@ Similar to the proc-ps implementation and unlike GNU/Coreutils, `uptime` provide
## `base32/base64/basenc`

Just like on macOS, `base32/base64/basenc` provides `-D` to decode data.

[^1] https://en.cppreference.com/w/c/io/fprintf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the best source for that. Generally, we should refer to GNU docs or the POSIX specification.

You might find the info on one of these pages:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this has what we need: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap05.html#tag_05

(GNU docs reword the same thing, and I think C99 is also very similar in wording)

format specified, etc.), so its output will be more correct than GNU coreutils for
some inputs (e.g. small fractional increments where GNU coreutils uses `long double`).

The only limitation is that the position of the decimal point is stored in a i64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The only limitation is that the position of the decimal point is stored in a i64,
The only limitation is that the position of the decimal point is stored in a `i64`,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed thanks.

There are some difference in behaviour vs GNU coreutils, explain
what those are.
@drinkcat
Copy link
Contributor Author

drinkcat commented Apr 4, 2025

I think it's a great idea to document these differences!

I think my position would be that these differences are acceptable, since they are correct and also any script that parses these numbers would need to support the formats that we print.

Thanks! Yeah, I also suppose that anybody who uses seq with such small increments would not really rely on the exact values. Not that I'm too clear why you'd actually call GNU seq with such values in the first place, given how imprecise it is...

It's perhaps a bit more arguable with printf, e.g. if somebody relied on printf to understand precision losses with some not-really-well-specified floating point format... But that seems a bit... unlikely as well.

Additionally, I think that our behaviour is more portable, which is a nice bonus.

Right. And in a way, it's almost acceptable that a coreutils->uutils update would cause subtle changes somewhat similar to a x86->arm update?

However, the arbitrary precision might be a performance problem? Have you checked that? If that is negligible, I feel this is acceptable.

It's fine. For printf it obviously doesn't matter as we only deal with a few numbers at most.

For seq, we're in the same ballpark performance for floating points (maybe 20% faster). The GNU implementation has some fast path when dealing with positive integers, I could get within ~10% with this PR: #7564 (otherwise we're 15-20 times slower, but that's not only because of the added precision).


And now that I'm looking into #7475, it's interesting to see similar precision questions (GNU timeout rounds very small duration values to 0, which has a totally different meaning...)

Copy link

github-actions bot commented Apr 4, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@jfinkels
Copy link
Collaborator

jfinkels commented Apr 8, 2025

I agree, it seems sensible to me to maintain the arbitrary precision and document the differences. You've documented them clearly here!

@drinkcat drinkcat marked this pull request as ready for review April 22, 2025 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants