|
From: | GNU bug Tracking System |
Subject: | bug#49239: closed (Unexpected results with sort -V) |
Date: | Sun, 13 Feb 2022 05:32:02 +0000 |
Your message dated Sat, 12 Feb 2022 21:31:33 -0800 with message-id <80ac3d45-b23f-7730-f9dc-e2c86136a29a@cs.ucla.edu> and subject line Re: bug#49239: Unexpected results with sort -V has caused the debbugs.gnu.org bug report #49239, regarding Unexpected results with sort -V to be marked as done. (If you believe you have received this mail in error, please contact help-debbugs@gnu.org.) -- 49239: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=49239 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems
--- Begin Message ---Subject: Unexpected results with sort -V Date: Sun, 27 Jun 2021 00:04:53 +0200 Hi,I found some unexpected results with sort -V. I hope this is the correct place to send a bug report to [1].They are caused by a bug in filevercmp inside gnulib, specifically in the function match_suffix.I assume it should, as documented, match a file ending as defined by this regex:/(\.[A-Za-z~][A-Za-z0-9~]*)*$/
However, I found two cases where this does not happen:
1) Two consecutive dots. It is not checked if the character after a dot is a dot. This results in nothing being matched in a case like "a..a", even though it should match ".a" according to the regex.
Testcase: printf "a..a\na.+" | sort -V # a..a should be before a.+ I think
2)
A trailing dot. If there is no additional character after a dot, it is still matched (e.g. for "a." the . is matched).
Testcase: printf "a.\na+" | sort -V # I think a+ should be before a.
Additionally I noticed that filevercmp ignores all characters after a NULL byte.
This can be seen here: printf "a\0a\na" | sort -Vs
sort seems to otherwise consider null bytes (that's why the --stable flag is necessary in the above example). Is this the expected behavior?
Finally I wanted to ask if it is the expected behavior for filevercmp to do a strcmp if it can't find another difference, at least from the perspective of sort.
This means that the --stable flag for sort has no effect in combination with --version-sort (well, except if the input contains NULL bytes, as mentioned above :)
I'll attach a rather simple patch to fix 1) and 2) (including test), I hope that's right.
Have a nice day,
Michael
diff.txt
Description: Text document
--- End Message ---
--- Begin Message ---Subject: Re: bug#49239: Unexpected results with sort -V Date: Sat, 12 Feb 2022 21:31:33 -0800 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 On 6/28/21 10:54, Kamil Dudka wrote:You are right. The matching algorithm was not implemented correctly and the patch you attached fixes it.I looked into Bug#49239 and found some more places where the documentation disagreed with the code. I installed the attached patches into Gnulib and Coreutils, respectively, which should bring the two into agreement and should fix the bugs that Michael reported albeit in a different way than his proposed patch. Briefly:* The code didn't allow file name suffixes to be the entire file name, but the documentation did. Here I went with the documentation. I could be talked into the other way; it shouldn't matter much either way.* The code did the preliminary test (without suffixes) using strcmp, the documentation said it should use version comparison. Here I went with the documentation.* As Michael mentioned, sort -V mishandled NUL. I fixed this by adding a Gnulib function filenvercmp that treats NUL as just another character.* As Michael also mentioned, filevercmp fell back on strcmp if version sort found no difference, which meant sort's --stable flag was ineffective. I fixed this by not having filevercmp fall back on strcmp.* I fixed the two-consecutive dot and trailing-dot bugs Michael mentioned, by rewriting the suffix finder to not have that confusing READ_ALPHA state variable, and to instead implement the regular expression's nested * operators in the usual way with nested loops.Thanks, Michael, for reporting the problem. I'm boldly closing the Coreutils bug report as fixed.0001-filevercmp-fix-several-unexpected-results.patch
Description: Text Data0001-sort-fix-several-version-sort-problems.patch
Description: Text Data
--- End Message ---
[Prev in Thread] | Current Thread | [Next in Thread] |