bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Support in sort for human-readable numbers


From: Vitali Lovich
Subject: Re: Support in sort for human-readable numbers
Date: Tue, 13 Jan 2009 19:23:06 -0500

On Tue, Jan 13, 2009 at 6:42 PM, Matthew Woehlke
<address@hidden> wrote:
>
> Vitali Lovich wrote:
>>
>> Perhaps - but for sort, at least from my thinking of how I would
>> implement this, the additional logic (at least to behave correctly on
>> all inputs) would be somewhat complicated.  Can you please explain why
>> you believe this belongs in sort and wouldn't be better served by
>> pre-processing the text before sort & post-processing it after as
>> necessary?
>
> I'd like to point out that, if you're going to require that, you've defeated 
> the purpose of sort understanding human-readable numbers in the first place. 
> If I have to write
>something more convoluted than 'du -sh * | sort -h', I might as well write 
>'sdu -s *'. (Which, in fact, I did. 'sdu' is a script that expects normal 
>everything-in-bytes output,
>does a plain old 'sort -n' on it, and then uses awk to make the sizes 
>human-readable.)
You are correct.  However, if you look, the implementation I posted
(and I explained this in the original design assumptions) is
specifically designed to handle du & df (so du -sh * | sort -h works
perfectly as does sorting the various output columns of df -h).  My
question was strictly regarding trying to parse the longer versions of
those suffixes (i.e. MiB & MB) - does it make sense to support this
option.  When things settle down for me and I get time, I'll post my
implementation for this so people can determine whether or not it
makes sense to do this in the code.

>
> IOW, if you make people format the output anyway, you might as well just 
> forget -h and make the post-formatting be what converts from raw integer 
> sizes to human-friendly sizes. At least, that's my $0.02.
And that's a valid point about the post-formatting - perhaps another
tool would be useful that somehow formats & converts numbers within
output.  The syntax might have to be invented (or you could just have
primitive switches for common conversions).  But that still wouldn't
mean that the -h flag within sort wouldn't be useful just because sort
is a far more popular tool (maybe if your post-formatting tool gains
popularity, then there could be an argument made for deprecating the
-h flag).  Also, such a tool may have issues because it would have to
convert the string number into an actual number first, which leads to
overflow & precision problems, whether or not you should allow the
exponent 'E' to indicate a power of 10 multiplier, etc.

And again, obviously the thing about pre-formatting/post-formatting
was just a suggestion on how to support longer suffixes.  The script
to do this is far more trivial and easier to create than the
equivalent scripts needed to sort du output correctly (and most of
those times those scripts are tool specific meaning it's non trivial
to port those scripts to sort the output of df for instance).  Sort -h
is also a far more generic solution than any custom wrapper script (at
least the implementation is far more straightforward).  Also, the
equivalent sed scripts to do this are far easier to write than adding
that additional code in C (because sed is meant for manipulating
text).

Vitali




reply via email to

[Prev in Thread] Current Thread [Next in Thread]