>From 553d6d6c95af2fe89ec93558b46e699122c5deca Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= Date: Thu, 11 Jul 2019 15:34:17 +0100 Subject: [PATCH] doc: adjustments to version sorting docs * doc/sort-version.texi: ... --- doc/sort-version.texi | 71 ++++++++++++++++++++++++--------------------------- 1 file changed, 33 insertions(+), 38 deletions(-) diff --git a/doc/sort-version.texi b/doc/sort-version.texi index e5f8fb7..e2a4283 100644 --- a/doc/sort-version.texi +++ b/doc/sort-version.texi @@ -1,4 +1,4 @@ -@c GNU Verion-sort ordering documentation +@c GNU Version-sort ordering documentation @c Copyright (C) 2019 Free Software Foundation, Inc. @@ -19,10 +19,10 @@ @node Version sort overview @section Version sort overview -@dfn{version sort} ordering (and simiarly, @dfn{natural sort} +@dfn{version sort} ordering (and similarly, @dfn{natural sort} ordering) is a method to sort items such as file names and lines of text in an order that feels more natural to people, when the text -contain a mixture of letters and digits. +contains a mixture of letters and digits. Standard sorting usually does not produce the order that one expects because comparisons are made on a character-by-character basis. @@ -38,13 +38,13 @@ a13 a13 a2 a120 @end example -version sort funtionality in GNU coreutils is available in the @samp{ls -v}, +version sort functionality in GNU coreutils is available in the @samp{ls -v}, @samp{ls --sort=version}, @samp{sort -V}, @samp{sort --version-sort} commands. -@node Using version sort in GNU Coreutils -@subsection Using version sort in GNU Coreutils +@node Using version sort in GNU coreutils +@subsection Using version sort in GNU coreutils Two GNU coreutils programs use version sort: @command{ls} and @command{sort}. @@ -113,8 +113,8 @@ In coreutils this algorithm was slightly modified to work on more general input such as textual strings and file names (see @ref{Differences from the official Debian Algorithm}). -In other contextes, such as other programs and other programming -languages, a similar sorting funtionality is called +In other contexts, such as other programs and other programming +languages, a similar sorting functionality is called @uref{https://en.wikipedia.org/wiki/Natural_sort_order,natural sort}. @@ -125,34 +125,29 @@ Currently there is no standard for version/natural sort ordering. That is: there is no one correct way or universally agreed-upon way to order items. Each program and each programming language can decide its -own ordering algorithm and call it ’natural sort’ (or other various +own ordering algorithm and call it 'natural sort' (or other various names). -Therefore there is no point in complaining about incorrect sorting -order or unexpected results: Coreutils’ version sort order is not -incorrect, it might just differ from other similarly named -implementation, or differ from personal expectations. - See @ref{Other version/natural sort implementations} for many examples of differing sorting possibilities, each with its own rules and variations. -If you do suspect a bug in coreutils’ implementation of version-sort, +If you do suspect a bug in coreutils' implementation of version-sort, see @ref{Reporting bugs or incorrect results} on how to report them. @node Implementation Details @section Implementation Details -GNU Coreutils’ version sort algorithm is based on +GNU coreutils' version sort algorithm is based on @uref{https://www.debian.org/doc/debian-policy/ch-controlfields.html#version, -Debian’s versioning scheme}, specifically on the "upstream version" +Debian's versioning scheme}, specifically on the "upstream version" part. This section describe the ordering rules. The next section (@ref{Differences from the official Debian Algorithm}) describes some differences between GNU coreutils -implementation and Debian’s official algorithm. +implementation and Debian's official algorithm. @node Version-sort ordering rules @@ -287,7 +282,7 @@ $ sort -n input4 $ sort -V input4 Numeric sort (@samp{sort -n}) treats the entire string as a single numeric value, and compares it to other values. For example, @code{8.1}, @code{8.10} and -@code{8.100} are numerically equivalent, and are ordered together. Simiarly, +@code{8.100} are numerically equivalent, and are ordered together. Similarly, @code{8.49} is numerically smaller than @code{8.5}, and appears before first. Version sort (@samp{sort -v}) first breaks down the string into digits and @@ -301,7 +296,7 @@ remaining digits are compared numerically (@code{1} and @code{01}) - which are numerically equivalent. Hence, @code{8.01} and @code{8.1} are grouped together. -Simiarly, comparing @code{8.5} to @code{8.49} - the @samp{@code{8}} +Similarly, comparing @code{8.5} to @code{8.49} - the @samp{@code{8}} and @samp{@code{.}} parts are identical, then the numeric values @code{5} and @code{49} are compared. The resulting @code{5} appears before @code{49}. @@ -354,7 +349,7 @@ $ touch 1.0.5_src.tar.gz 1.0%zzzzz.gz The same reasoning applies to the following example: The character @samp{@code{.}} has ASCII value 46, and is smaller than slash -characeter @samp{@code{/}} ASCII value 47: +character @samp{@code{/}} ASCII value 47: @example $ cat input5 @@ -431,7 +426,7 @@ and is listed first in the sorted output. The remaining lines (@code{1}, @code{1%}, @code{1.2}, @code{1~}) follow similar logic: The digit part is extracted (1 for all strings) -and compares identical. The following extracted parts for the remainig +and compares identical. The following extracted parts for the remaining input lines are: empty part, @code{%}, @code{.}, @code{~}. Tilde sorts before all others, hence the line @code{1~} appears next. @@ -475,14 +470,14 @@ value 37 is smaller, hence @samp{@code{a%}} is listed before @samp{@code{aα}}. @section Differences from the official Debian Algorithm The GNU coreutils' version sort algorithm differs slightly from the -official Debian algorith, in order to accomodate more general usage +official Debian algorithm, in order to accommodate more general usage and file name listing. @node Minus/Hyphen @samp{-} and Colons @samp{:} characters @subsection Minus/Hyphen @samp{-} and Colons @samp{:} characters -In Debian’s version string syntax the version consists of three parts: +In Debian's version string syntax the version consists of three parts: @code{[epoch:]upstream_version[-debian_revision]} (@code{epoch} and @code{debian_revision} are optional). @@ -504,7 +499,7 @@ If epoch is not present, colons @samp{:} are not allowed. If these parts are present, hyphen and/or colons can appear only onces in valid Debian version strings. -In GNU Coreutils such restrictions are not reasonable (a filename can +In GNU coreutils such restrictions are not reasonable (a file name can have many hyphens, a line of text can have many colons). As a result, in GNU coreutils hyphens and colons are treated exactly @@ -530,10 +525,10 @@ With Debian's @command{dpkg} they will be listed as @code{ab-cd} first and For further technical details see @uref{https://bugs.gnu.org/35939,bug35939}. -@node Additional hard-coded priorities In GNU coreutils’ version sort -@subsection Additional hard-coded priorities In GNU coreutils’ version sort +@node Additional hard-coded priorities In GNU coreutils' version sort +@subsection Additional hard-coded priorities In GNU coreutils' version sort -In GNU coreutils’ version sort algorithm, he following items have +In GNU coreutils' version sort algorithm, the following items have special priority and sort earlier than all other characters (listed in order); @@ -574,7 +569,7 @@ the ordering rules are the same. @node Special handling of file extensions @subsection Special handling of file extensions -GNU coreutils’ version sort algorithm implements specialized handling +GNU coreutils' version sort algorithm implements specialized handling of file extensions (or strings that look like file names with extensions). @@ -591,7 +586,7 @@ letter or tilde, followed by one or more letters, digits, or tildes @code{(\.[A-Za-z~][A-Za-z0-9~]*)*}). @item -If the strings contains suffixes, the sufffixes are temporarily +If the strings contains suffixes, the suffixes are temporarily removed, and the strings are compared without them (using the @ref{Version-sort ordering rules,algorithm,algorithm} above). @@ -600,7 +595,7 @@ If the suffix-less strings are identical, the suffix is restored and the entire strings are compared. @item -If the suffix-les strings differ, the result is returned and the +If the non-suffixed strings differ, the result is returned and the suffix is effectively ignored. @end enumerate @@ -704,12 +699,12 @@ being first. A real-world example would be listing files such as: @file{gcc_10.fc9.tar.gz} -and @file{gcc_10.8.12.7rc2.fc9.tar.bz2}: Debian’s algorithm would list +and @file{gcc_10.8.12.7rc2.fc9.tar.bz2}: Debian's algorithm would list @file{gcc_10.8.12.7rc2.fc9.tar.bz2 first}, while @samp{ls -v} will list @file{gcc_10.fc9.tar.gz} first. These priorities make sense for @samp{ls -v}: -versioned files will be listed in a more natural order. +Versioned files will be listed in a more natural order. For @samp{sort -V} these priorities might seem arbitrary. However, because the sorting code is shared between the ls and sort program, @@ -774,7 +769,7 @@ ab-cd abb abb ab-cd @end example -To illustrate the differnt handling of file extension: (see @ref{Special +To illustrate the different handling of file extension: (see @ref{Special handling of file extensions}): @example @@ -797,7 +792,7 @@ output of @samp{ls -v} or @samp{sort -V}), please first check the following: @enumerate @item -Is the result consistant with Debian’s own ordering (using @command{dpkg}, see +Is the result consistent with Debian's own ordering (using @command{dpkg}, see @ref{Comparing two strings using Debian's algorithm}) ? If it is, then this is not a bug - please do not report it. @@ -809,7 +804,7 @@ then this is not a bug - please do not report it. @item If you have a question about specific ordering which is not explained here, please write to coreutils@@gnu.org, and provide concrete yet -concise example that will helps us help you. +concise example that will help us help you. @item If you still suspect a bug which is not explained by the above, please @@ -869,7 +864,7 @@ function to compare two directory entries (despite the names, they are not identical to GNU coreutils' version sort ordering). @item -Using Debian’s sorting algorithm in: +Using Debian's sorting algorithm in: @itemize @item @@ -901,7 +896,7 @@ Debian's code which performs the @code{upstream_version} comparison: version.c}. @item -GNULIB code (used by GNU Coreutils) which performs the version comparison: +GNULIB code (used by GNU coreutils) which performs the version comparison: @uref{https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/filevercmp.c, filevercmp.c}. @end itemize -- 2.9.3