coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Command-line program to convert 'human' sizes?


From: Assaf Gordon
Subject: Re: Command-line program to convert 'human' sizes?
Date: Fri, 07 Dec 2012 10:07:55 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.4) Gecko/20120510 Icedove/10.0.4

Thank you for your feedback.
I'm working on fixing those issues.


Some comments/questions:

Pádraig Brady wrote, On 12/06/2012 06:59 PM:
> I noticed This command will core dump:
> $ /bin/ls -l | src/numfmt --to-unit=1 --field=5
> <snip>
> so I'm thinking `numfmt` should support --header too.
> 
I'll add --header.


> The following should essentially be a noop with this data,
> but notice how the original spacing wasn't taken
> into account, and thus the alignment is broken:
> 
> $ /bin/ls -l | tail -n+2 | head -n3 | src/numfmt --to-unit=1 --field=5
> -rw-rw-r--.  1 padraig padraig 93787 Aug 23  2011 ABOUT-NLS
> -rw-rw-r--.  1 padraig padraig 49630 Dec  6 22:32 aclocal.m4
> -rw-rw-r--.  1 padraig padraig 3669 Dec  6 22:29 AUTHORS

I'm a bit wary of adding automatic/heuristic kind of padding - could lead to 
some weird outputs,
and also (when combined with header) will not produce proper output (because 
the header will be skipped, but the lines would re-padded?).

Wouldn't it be better to either force the user to specify '--padding', or 
switch from 'white-space' to an explicit delimiter, and then let "expand" 
handle the expanding correctly?

e.g.
===
$ cat white-space-data.txt | \
    sed 's/  */\t/g' | \
    numfmt --field=5 --delimiter=$'\t' --to=SI | \
    expand > output
===

A bit more convoluted, but more reliable?

> 
> With this the alignment is broken as before,
> but I also notice the differing width output of each number.
> 
> $ /bin/ls -l | tail -n+2 | head -n3 | src/numfmt --to=SI --field=5
> -rw-rw-r--.  1 padraig padraig 94k Aug 23  2011 ABOUT-NLS
> -rw-rw-r--.  1 padraig padraig 50k Dec  6 22:32 aclocal.m4
> -rw-rw-r--.  1 padraig padraig 3.7k Dec  6 22:29 AUTHORS
> 

Again this is the automatic padding issue -
For example "94K" vs "3.7K" - should we always pad SI/IEC output to 5 
characters (e.g. " 94K") even if the user didn't specify padding?
This would conflict with non-whitespace delimiters... e.g.:

Hello:94000:world

Would be converted to:

Hello:<space>94K:world

Which is not intuitive at all

Or perhaps the whole 'auto' padding should be enabled IFF delimiter is not 
specified (and defaults to white-space) ?

> 
> Notice in the above I've used capital K for SI.
> I think human() from gnulib may be using k for 1000 and K for 1024.
> That's non standard and ambiguous and I see no need to do that.

> So for IEC we'd have:
> 
> $ /bin/ls -l | tail -n+2 | head -n3 | src/numfmt --to=IEC --field=5
> -rw-rw-r--.  1 padraig padraig  3.6Ki Dec  6 22:29 AUTHORS
> 

I tried to use 'human_readable()' as-is, but I guess this is not sufficient.
I'll duplicate the code, and modify it to avoid this issue (lower/upper case K, 
and the "i" suffix)

 
> Another thing I thought of there, was it would be
> good to be able to parse number formats that it can generate:

Sounds like two separate (but related) issues:

> $ echo '1,234' | src/numfmt --from=auto
> src/numfmt: invalid suffix in input '1,234': ',234'

1. Is there already a gnulib function that can accept locale-grouped values? 
can the "xstrtoXXX" functions handle that?

> $ echo '3.7K' | src/numfmt --from=auto
> src/numfmt: invalid suffix in input '3.7K': '.7K'

2. Would you recommend switching internal representation to doubles (from the 
current uintmax_t),
 or just add special code to detect decimal point (which, as Bernhard 
mentioned, is also locale dependent).

> While I said before it would be better to error rather than warn
> on parse error, on consideration it's probably best to write a
> warning to stderr on parse error, and leave the original number in place.

I'll change the code accordingly. 


Regarding Bernhard's comments (from a different email):

Bernhard Voelker wrote, On 12/07/2012 03:25 AM:
> On 12/07/2012 12:59 AM, Pádraig Brady wrote:
> 
> Therefore this is my first test:
>   $ echo 11505426432 | src/numfmt
>   11505426432
> Hmm, shouldn't it converting that to a human-readable
> number then? ;-)

From Pádraig's original specification ( 
http://lists.gnu.org/archive/html/coreutils/2012-02/msg00085.html ) I assumed 
that the default of both "--from" and "--to" is not to scale - So one needs to 
explicitly use "--to" or "--from".

But those defaults can be changed, if you prefer.

> Looking at scale_from_args: I'd favor lower-case arguments,
> i.e. "si" and "iec" instead of "SI" and "IEC".
> WDYT?

I'll change those.


Regarding the help text and documentation:
I copied many of the texts from previous emails (the "Reformat numbers like 
11505426432 to the more human-readable 11G" comes verbatim from one of Jim 
Meyering's emails) - all of them would require better phrasing later.


Thanks,
 -gordon







reply via email to

[Prev in Thread] Current Thread [Next in Thread]