[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#19240: cut 8.22 adds newline
From: |
John Kendall |
Subject: |
bug#19240: cut 8.22 adds newline |
Date: |
Thu, 4 Dec 2014 18:41:48 +0000 |
Bob Proulx wrote:
> Eric Blake wrote:
>> I'll leave it to other contributors to weigh in on whether omitting
>> the final newline on output when it was missing on input is worth
>> the complexity of a change.
>
>> Pádraig Brady wrote:
>>> If we were just implementing now, I'd not output the extra '\n',
>>> but changing at this stage needs to be carefully considered,
>>> and with all the textutils, not just cut(1).
>>
>> I tend to go the opposite - producing text output, even on non-text
>> input, is more likely to be useful when piping files to other utilities
>> that don't handle non-text files as gracefully as the coreutils. But I
>> definitely agree that it is not something we change lightly.
>
> I have these thoughts and comments to make.
>
> 1. I don't "like" input file lines that don't have trailing newlines.
> It raises the question of whether the input is actually valid input.
> It feels to me like any line missing a newline is incomplete. There
> is likely to have been an error in the creation of it. Handling it
> silently feels like ignoring the error. But raising an actual error
> by exit code or by emitting a warning or error message feels too heavy
> handed. I would lean toward assuming that any incomplete input line
> is actually terminated by a newline as the lessor of the evils.
>
> 2. The suggesion for for handling *fields* that do not end with a
> trailing newline differently from those that do doesn't make any sense
> to me at all. What is a field? Is the newline part of the field? I
> think not. Consider this.
>
> $ printf "one two" | awk '{print$1}'
> one
>
> $ printf "one two" | awk '{print$2}'
> two
>
> $ printf "one two\n" | awk '{print$1}'
> one
>
> $ printf "one two\n" | awk '{print$2}'
> two
>
> The newline is not part of field two. Otherwise printing it would
> result in the second having two newlines output.
>
> $ printf "one two" | cut -d' ' -f1
> one
>
> $ printf "one two" | cut -d' ' -f2
> two
>
> $ printf "one two\n" | cut -d' ' -f1
> one
>
> $ printf "one two\n" | cut -d' ' -f2
> two
>
> Same thing for cut. The newline is not part of any of the fields.
> The newline terminates the input line. The newline is not associated
> with any of the delimited fields contained in an input line.
>
> For byte or character operations in the utils such as head -c those
> are binary operations and should be interpreted strictly according to
> the bytes. But not for cut -c which is column based.
>
> John Kendall wrote:
>> # Solaris cut
>> $ printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
>> 1
>> 12
>> 123
>> 1234
>> 1234
>> 1234$
>
> That is tickling non-portable behavior. I had a friend run some tests
> on HP-UX and IBM AIX and the results there were different from
> Solaris. Seems Solaris is already the unusual case.
>
> When looking count the "1234" lines carefully. Because HP-UX and
> older AIX don't process the line without a trailing newline at all.
> It is omitted there. Newer AIX appears to handle it like GNU.
>
> # uname -srm
> HP-UX B.10.20 9000/785
> # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
> 1
> 12
> 123
> 1234
> 1234
> #
>
> # uname -srm
> HP-UX B.11.31 ia64
> # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
> 1
> 12
> 123
> 1234
> 1234
> #
>
> # uname -s ; oslevel
> AIX
> 4.3.3.0
> # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
> 1
> 12
> 123
> 1234
> 1234
> #
>
> # uname -s ; oslevel
> AIX
> 7.1.0.0
> # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
> 1
> 12
> 123
> 1234
> 1234
> 1234
> #
>
> # head -1 /etc/motd ; uname -m
> Compaq Tru64 UNIX V5.0A
> alpha
> # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
> 1
> 12
> 123
> 1234
> 1234
> #
>
> # uname -s
> Darwin
> # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
> 1
> 12
> 123
> 1234
> 1234
> 1234
> #
>
> Using input lines without a trailing newline is already a minefield of
> portability problems. It depends upon details of the implementation.
>
> I think what Solaris cut must be doing is processing the emission of
> characters across the line character by character. When it hits the
> input newline it knows it is done and emits a newline itself and
> starts again on a new line. When it hits EOF on the input it probably
> just stops doing anything and exits itself without printing anything
> more and therefore not emitting a newline. Likely just an accident of
> implementation.
>
> This is what makes "lines" without a newline such an unportable thing
> to count upon. It causes it to depend upon an implementation detail.
> Different implementation might do different things. And in fact
> different ones do actually do different things. This probably isn't
> too widespread of an issue or it would have come up more often. And
> more specific to the Solaris code port there would be similar problems
> differently if trying to use other legacy Unix platforms. Best to
> avoid the construct entirely for robust operation.
>
>> I came upon this while porting scripts from Solaris 10 to Centos 7.
>
> Can you share with us the specific construct that caused this to
> arise? I have done a lot of script porting to and from HP-UX systems
> and am curious as to the issue.
>
The construct in question if just for formatting the output
of a script that compares disc files to what's in a database.
echo "$FILE ===========================\c"| cut -c1-30
echo " matches =========="
The output on Solaris might look something like this (with
monospaced font on a terminal all the "matches" line up):
getDFL_info ================== matches ==========
transWestim_msg ============== matches ==========
selfBillDepotStoHan ========== matches ==========
addSale_invoice ============== matches ==========
buildInvoice ================= matches ==========
addInvoice =================== matches ==========
chgUnit ====================== matches ==========
updSale_invoice ============== matches ==========
The gnu output is:
getDFL_info ==================
matches ==========
transWestim_msg ==============
matches ==========
selfBillDepotStoHan ==========
matches ==========
addSale_invoice ==============
matches ==========
buildInvoice =================
matches ==========
addInvoice ===================
matches ==========
chgUnit ======================
matches ==========
updSale_invoice ==============
matches ==========
This can be re-written, of course. (There is one corner case that
Solaris's cut handled nicely that I have not been able to come up
with a quick fix.)
John
> Bob
- bug#19240: cut 8.22 adds newline, John Kendall, 2014/12/01
- bug#19240: cut 8.22 adds newline, Eric Blake, 2014/12/01
- Message not available
- bug#19240: cut 8.22 adds newline, Eric Blake, 2014/12/01
- bug#19240: cut 8.22 adds newline, Pádraig Brady, 2014/12/01
- bug#19240: cut 8.22 adds newline, Paul Eggert, 2014/12/01
- bug#19240: cut 8.22 adds newline, Eric Blake, 2014/12/01
- bug#19240: cut 8.22 adds newline, Pádraig Brady, 2014/12/01
- bug#19240: cut 8.22 adds newline, Bob Proulx, 2014/12/04
- bug#19240: cut 8.22 adds newline,
John Kendall <=
- bug#19240: cut 8.22 adds newline, Paul Eggert, 2014/12/04
- bug#19240: cut 8.22 adds newline, John Kendall, 2014/12/04
- bug#19240: cut 8.22 adds newline, Eric Blake, 2014/12/04
- bug#19240: cut 8.22 adds newline, Eric Blake, 2014/12/04
- bug#19240: cut 8.22 adds newline, John Kendall, 2014/12/04
- bug#19240: cut 8.22 adds newline, Bob Proulx, 2014/12/04
- bug#19240: cut 8.22 adds newline, Eric Blake, 2014/12/04
- bug#19240: cut 8.22 adds newline, Bob Proulx, 2014/12/04
- bug#19240: cut 8.22 adds newline, Bob Proulx, 2014/12/04
- bug#19240: cut 8.22 adds newline, Eric Blake, 2014/12/04