Skip to content

sort -f produces a different output then GNU #5849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
oss-fuzz-robot opened this issue Jan 17, 2024 · 5 comments
Closed

sort -f produces a different output then GNU #5849

oss-fuzz-robot opened this issue Jan 17, 2024 · 5 comments
Labels

Comments

@oss-fuzz-robot
Copy link

OSS-Fuzz has found a bug in this project. Please see https://oss-fuzz.com/testcase?key=5184835421274112 for details and reproducers.

This issue is mirrored from https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=65860 and will auto-close if the status changes there.

If you have trouble accessing this report, please file an issue at https://github.com/google/oss-fuzz/issues/new.

@sylvestre
Copy link
Contributor

failed with:


Running test ["sort", "-f"]
	Input: ["-f"]
	Rust stdout: (IÃ7Ã7LWPcp8(pzZ9Ã
	0Y5iRJ63yz
	3ÃM
	H7ÃJn8Cx1L(C
	nr4ÃYzÃ
	R(cHY1E
	U
	w
	Z
	zI2muK
	
	GNU stdout: 0Y5iRJ63yz
	3ÃM
	H7ÃJn8Cx1L(C
	(IÃ7Ã7LWPcp8(pzZ9Ã
	nr4ÃYzÃ
	R(cHY1E
	U
	w
	Z
	zI2muK
	
	Diff=
	-(IÃ7Ã7LWPcp8(pzZ9Ã
	 0Y5iRJ63yz
	 3ÃM
	 H7ÃJn8Cx1L(C
	+(IÃ7Ã7LWPcp8(pzZ9Ã
	 nr4ÃYzÃ
	 R(cHY1E
	 U
	 w
	 Z
	 zI2muK
	
	Discrepancy detected: stdout differs
	

@sylvestre sylvestre changed the title OSS-Fuzz issue 65860 sort -f produces a different output then GNU Jan 23, 2024
@cakebaker
Copy link
Contributor

You can simplify it to:

$ printf "A\n(C\nb\n" | sort -f
A
b
(C
$ printf "A\n(C\nb\n" | cargo run sort -f
(C
A
b

@ohanf
Copy link

ohanf commented Jan 27, 2024

I took a brief look at this, admittedly without actually running any code from this crate, and I believe this is a locale issue. I did read through the code and believe issue lies in how cmp_chars determines sort order. Looking at the reference docs for the GNU version there is a footnote about collation that states

If you use a non-POSIX locale (e.g., by setting LC_ALL to ‘en_US’), then sort may produce output that is sorted differently than you’re accustomed to. In that case, set the LC_ALL environment variable to ‘C’.

I tested this with GNU sort on my machine and was able to produce both sorting orders with GNU sort just by changing locale:

echo -e "a\n(c\nb\nd" | LC_ALL=C sort
(c
a
b
d

vs

echo -e "a\n(c\nb\nd" | LC_ALL=en_US.UTF-8 sort
a
b
(c
d

Based on this (very limited character set) it looks like rust is using the C locale and the GNU test is not? I did a quick search on locales in rust and it seems there has been some recent discussion on on it, although the discussion was more focused on international comparisons, which I am unclear of if it is in scope or not. I did find a crate that in theory can load locales since a quick search indicated that the stdlib does not respect them, which a quick playground test seemed to confirm.

Hopefully this is helpful :)

@cakebaker
Copy link
Contributor

Yes, you are right, uutils sort uses the C locale (there is no localization support yet) whereas GNU sort uses whatever locale is defined. Though I don't understand why the fuzzer doesn't use the C locale because we set env::set_var("LC_COLLATE", "C"); before running GNU sort. In #5889 I changed it to LC_ALL, maybe it will fix the issue.

@oss-fuzz-robot
Copy link
Author

OSS-Fuzz has closed this bug. Please see https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=65860 for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Status: Done
Development

No branches or pull requests

4 participants