[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RFE: head,tail: -z, --zero-terminated
From: |
Pádraig Brady |
Subject: |
Re: RFE: head,tail: -z, --zero-terminated |
Date: |
Fri, 8 Jan 2016 22:07:06 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 |
On 08/01/16 19:04, Assaf Gordon wrote:
> Hello Pádraig and all,
>
> On 01/08/2016 11:56 AM, Pádraig Brady wrote:
> [...]
>> Possible additions to this class:
>>
>> nl (N/A as primarily text rather than record oriented)
>> numfmt (ditto)
>> expand (ditto)
>> unexpand (ditto)
>>
>
> Attached similarly structured patch adding -z to numfmt (it does not include
> a NEWS entry, yet).
Cool. I was wondering a bit about numfmt, and thinking more this could be
useful for:
du -0 ... | numfmt -z
> an open question:
> With -z, do embedded newlines count as whitespace/field delimiters ?
> (not sure if this applies to other programs).
>
> For example:
>
> $ printf "A B\tC\nD 1000\x00"
>
> Should the newline count as whitespace/field delimiter (since numfmt defaults
> to whitespace delimiters) ?
> If so, the "1000" should be the fifth field.
> If not, the "1000" should be in the fourth field (and "C\nD" cound as one
> field).
>
> Currently, because the numfmt code uses "isblank()", newlines DO NOT count as
> whitespace:
>
> $ printf "A B\tC\nD 1000\x00" | ./src/numfmt -z --to=si --field=4 | od -a
> 0000000 A sp B sp C nl D sp 1 . 0 K nul
> 0000015
A very good point.
This is not an issue for the utils in my current patch set I think,
but is for field processing utils like numfmt, sort, join, uniq
(cut delimits fields with a char rather than a class).
I.E. should these utils use isspace() rather than isblank()
when -z is specified? More conservatively they probably
should use isblank(c) || c=='\n'.
> Also,
> Two minor questions:
>
> 1. If null-terminated test fail due to incorrect output, the log will contain:
> numfmt.pl: test z4: stdout mismatch, comparing z4.2 (expected) and z4.O
> (actual)
> Binary files z4.2 and z4.O differ
>
> This will make it hard for users to send us bug reports.
> Perhaps it's worth thinking about how to display a diff even for
> null-terminated lines (not sure how best to approach this).
Maybe we should have something like bcompare
that diffs the base64 of two files?
> 2. In the patch for "wc", the long-form of the parameter (for getopt_long) is
> "zero" instead of "zero-terminated" - is that intentional ?
Yes, to match other uses in that "class" of programs, like basename, etc.
Anyway -z may be moot for wc as discussed elsewhere in the thread.
thanks for the careful review!
Padraig.