bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Support in sort for human-readable numbers


From: Jim Meyering
Subject: Re: Support in sort for human-readable numbers
Date: Sun, 04 Jan 2009 11:35:37 +0100

"Vitali Lovich" <address@hidden> wrote:
> I've read the proposed patches that have been batted around on the mailing
> list (after coming up with my own implementation :D of course).  My proposed
> solution is less generic, but I believe more robust, than the other
> approaches.
>
> I've proposed my reasoning below, but I've posted it as a bug on launchpad
> to track this issue 313152 <https://bugs.launchpad.net/bugs/313152>.  The
> patch is against 6.10 instead of trunk mainly because I was too lazy to get
> the build-system set up on Ubuntu.  That being said, I'm pretty sure the
> patch should still work against the trunk.  In any case, if it's necessary,
> I could also do the diff against the trunk.
>
> Code review?
> What would I need to do to get this mainlined (aside from adding the
> documentation changes)?
>
> REASONING:
>
> One of my major assumption is that all the numbers are well formatted.
>
> In other words, there's an explicit demarcation in the number line (at least
> internal to the input being sorted) after which the suffix increases and the
> number starts again near 0.  For instance, if M represents 1050 Kilobytes,
> then there's no 1051K - it's represented as 1.001M or something along those
> lines.  Again, this would only rely on the input being internally consistent
> - sort needs no knowledge or hints of what those suffixes represent.

Thanks for the patch and for writing up your assumptions.

The above requirement is key... and perhaps too restrictive.
I.e., it makes it sound like your sort could mishandle
sizes printed by a mix of output from du -h and du --si runs,
not to mention numbers generated manually or by other tools.

However, this assumption might be acceptable (other opinions welcome),
on the condition that the code behind this option diagnoses any violation.

One of the first tasks for getting such an option into upstream is
to describe and reach agreement on what the input grammar should be.
I.e., is the "Gi" suffix allowed?  What about "GB" and "GiB"?
If "Gi" is allowed, is it treated differently from "G"?

As to what else would be required, see the guidelines in HACKING.
E.g., you'd need to add many tests of this new feature.

> Also, there can be no exponential numbers when in this mode mainly because
> it's unclear whether an `E' represents the beginning of the exponent or an
> exabyte.  Since both would be uncommon as use cases.  Exabytes are really

That sounds reasonable.

> really big right now, and exponents would be meaningless since they could
> only be used for extremely small numbers or numbers that are bigger than a Y
...




reply via email to

[Prev in Thread] Current Thread [Next in Thread]