bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#28152: Human readable units (-h/--human-readable vs --si) - Wrong pr


From: Michael Weiss
Subject: bug#28152: Human readable units (-h/--human-readable vs --si) - Wrong prefix and missing unit
Date: Sat, 19 Aug 2017 21:27:02 +0200
User-agent: Mutt

Imho the units used in the output of df, du, ls, etc. with the
-h/--human-readable option can be very misleading/ambiguous and in the
case of -h/--human-readable even wrong according to standards.

I don't want to flame about this but I'd love it if we could discuss
this objectively by considering the official standards and change the
output appropriately.

First of all I hope we can agree that the current output is ambiguous
and therefore not really useful unless the exact command that generated
that output is known (or at least if --si or -h was used). Imho this is
not desirable and already causes some problems when sharing that output
without providing the command.

If we look at the standards Wikipedia [0] provides the following table
(I've removed the JEDEC units as they shouldn't be relevant here ("Unit
prefixes for semiconductor storage capacity")):

Prefixes for multiples of bits (bit) or bytes (B)
Decimal            | Binary
Value  SI          | Value  IEC
1000   k  kilo     | 1024   Ki  kibi
10002  M  mega     | 10242  Mi  mebi
10003  G  giga     | 10243  Gi  gibi
10004  T  tera     | 10244  Ti  tebi
10005  P  peta     | 10245  Pi  pebi
10006  E  exa      | 10246  Ei  exbi
10007  Z  zetta    | 10247  Zi  zebi
10008  Y  yotta    | 10248  Yi  yobi

These are the unit prefixes that I'm used to and they have the advantage
that they're unambiguous and standardized.

"With the aim of avoiding ambiguity the International Electrotechnical
Commission (IEC) adopted new binary prefixes in 1998 (IEC 80000-13:2008
formerly subclauses 3.8 and 3.9 of IEC 60027-2:2005) Each binary prefix
is formed from the first syllable of the decimal prefix with the similar
value, and the syllable "bi". The symbols are the decimal symbol, always
capitalised, followed by the letter "i". According to these standards,
kilo, mega, giga et seq. would only be used in the decimal sense, even
when referring to data storage capacities: kilobyte and megabyte would
denote one thousand and one million bytes respectively (consistent with
the metric system), while new terms such as kibibyte, mebibyte and
gibibyte, with symbols KiB, MiB and GiB, would denote 210, 220 and 230
bytes respectively." [1]

And last but not least we should provide the actual unit as well. In
this case all units are in bytes which we can abbreviate with B (not
with a lowercase b as that would mean bits). This should make the output
completely unambiguous, follow the standards and avoid the possibility
of misinterpretation.

I can understand that changing such historic things might always cause
some minor problems but delaying them doesn't make them magically go
away. And since this change would only affect the human readable output
it shouldn't really break any scripts.

An example:

Old:
114M    fileA
120M    fileA
New:
114MiB  fileA
120MB   fileA
Or alternatively:
114 MiB fileA
120 MB  fileA

Links/References:
- https://en.wikipedia.org/wiki/Unit_prefix#Binary_prefixes
- https://en.wikipedia.org/wiki/Data_rate_units
- http://man7.org/linux/man-pages/man7/units.7.html
- http://man7.org/linux/man-pages/man1/numfmt.1.html
- https://debbugs.gnu.org/cgi/bugreport.cgi?bug=7176
- https://debbugs.gnu.org/cgi/bugreport.cgi?bug=18119

GNU coreutils version: 8.27
OS: GNU/Linux

Kind regards,

Michael

[0]: https://en.wikipedia.org/wiki/Unit_prefix
[1]: https://en.wikipedia.org/wiki/Unit_prefix#Binary_prefixes





reply via email to

[Prev in Thread] Current Thread [Next in Thread]