[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: du: POSIX mandating a single space instead of tab?
From: |
Stephane Chazelas |
Subject: |
Re: du: POSIX mandating a single space instead of tab? |
Date: |
Tue, 28 Apr 2015 17:50:58 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
2015-04-28 16:51:06 +0100, Pádraig Brady:
[...]
> > POSIX is already clear that anyone parsing for literal tabs is broken
> > when trying to parse du output. The only safe way to parse du output is
> > to break on all whitespace (the way awk already does). I'm 70-30 in
> > favor of changing to spaces.
>
> What about file names with leading whitespace,
> which now couldn't be split if we didn't use a single tab.
[...]
The point is that it cannot be parsed portably, because all
POSIX guarantees is that there will be at least one blank (and I
suppose the definition of blank is locale-dependent) between the
number and the file name.
Also note that tab and newline (and other blank characters in
your locale) are as valid as the space character in a file name.
If you have to parse the output of du reliably, you need to do
things like:
LC_ALL=C du -k .//*.txt ///var
And look for those .// or /// in the output to see where the
file paths begin (and they end on the line before the one that
contains the next //).
Something like:
LC_ALL=C du -k .//*.txt ///var | LC_ALL=C awk '
function process() {
if (NR > 1) {
print "disk usage for \"" file "\": " n
}
}
{
if (offset = index($0, "//")) {
process()
n = $1
file = substr($0, offset + 2)
} else {
file = file "\n" $0
}
}
END {process()}'
>
> I don't think the gain is enough to break compat,
> given the greater alignment control etc. possible
> with expand(1) or numfmt(1) etc.
> I just checked an old wrapper script for du that I use,
> and see that it would be broken for example:
> http://www.pixelbeat.org/scripts/dutop
[...]
I'd tend to agree it would not be worth changing.
--
Stephane