[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gnu sed's 'l' command behavior with -z (and without)
From: |
Jim Meyering |
Subject: |
Re: gnu sed's 'l' command behavior with -z (and without) |
Date: |
Sun, 7 Aug 2016 08:40:47 -0700 |
On Sat, Aug 6, 2016 at 12:02 PM, Assaf Gordon <address@hidden> wrote:
> Hello,
>
> (starting a new thread from previous discussion:
> http://lists.gnu.org/archive/html/sed-devel/2016-08/msg00000.html )
>
> regarding this:
>
>> On Aug 1, 2016, at 13:41, Jim Meyering <address@hidden> wrote:
>>
>> On Sat, Jul 30, 2016 at 11:46 PM, Assaf Gordon <address@hidden> wrote:
>>> sed: adjust line-terminator of F/l/= commands when -z is used
>>
>> In the second patch, this change
>>
>> if (width+olen >= line_len && line_len > 0) {
>> - ck_fwrite("\\\n", 1, 2, fp);
>> + ck_fwrite("\\", 1, 1, fp);
>> + ck_fwrite(&buffer_delimiter, 1, 1, fp);
>>
>> appears to change from emitting backslash-NL-continued lines to
>> backslash-NUL with -z. When using -z, do you still want to emit that
>> backslash?
>> Note that this is in code to honor sed's --line-length=N (-l) option,
>> which one can argue is not relevant with -z.
>
> I think we should output 'backslash-NUL' in such cases, unless we decide to
> make 'l' command output with '-z' mode ignore line-length limitation and
> never fold.
My first reaction was that with -z (implying machine-readable and no
line-length limitation), there should be no line splitting. Hence my
"Note ...". But perhaps that is too invasive making the the 'l'
command ignore its numeric operand when used with -z.
After reading all of this (thanks!), I agree that backslash-NUL does
make more sense, if you choose to split lines even with -z.
You're welcome to make the call. I'll be happy with your patch or with
one that does no folding with -z, albeit leaning 60:40 in favor of
your patch.
> Without backslash-NUL for folded lines, the output will be inconsistent
> compared to regalur newline output.
> For example, the following will not be equivalent:
>
> printf '%s\0' aaaaaaaa bbbbbbbb | ./sed/sed -nz 'N;l5' | tr '\000' '\n' |
> sed 's/\\000/\n/g'
> printf '%s\n' aaaaaaaa bbbbbbbb | ./sed/sed -n 'N;l5'
>
> and vise-versa:
>
> printf '%s\n' aaaaaaaa bbbbbbbb | ./sed/sed -n 'N;l5' | tr '\n' '\000' |
> sed 's/\\n/\\000/g'
> printf '%s\0' aaaaaaaa bbbbbbbb | ./sed/sed -nz 'N;l5'
> As a side note,
>
> It seems gnu sed's 'l' command output differs from FreeBSD/MacOS's sed in
> regards to embedded newlines.
> Reading the POSIX standard, it's not clear to me which is correct (or perhaps
> both are correct). POSIX does not say that embedded newline should be
> converted to '\n'.
>
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html:
>
> "(The letter ell.) Write the pattern space to standard output in a visually
> unambiguous form. The characters listed in XBD Escape Sequences and
> Associated Actions ( '\\', '\a', '\b', '\f', '\r', '\t', '\v' ) shall be
> written as the corresponding escape sequence; the '\n' in that table is not
> applicable. Non-printable characters not in that table shall be written as
> one three-digit octal number (with a preceding <backslash>) for each byte in
> the character (most significant byte first)."
>
>
> In practical terms, it means gnu sed prints '$<NEWLINE>' at the end of the
> printed pattern,
> while freebsd sed prints '$<NEWLINE>' at the end of every printed line.
>
> The following will demonstrate:
>
> $ printf "aXa\n" aXa | freebsd-sed -n 'y/X/\n/;l'
> a$
> a$
>
> $ printf "%s\n" aXa | gnu-sed -n 'y/X/\n/;l'
> a\na$
>
> $ printf "%s\n" aaa bbb | freebsd-sed -n 'N;l'
> aaa$
> bbb$
>
> $ printf "%s\n" aaa bbb | gnu-sed -n 'N;l'
> aaa\nbbb$
I prefer GNU sed's approach.
> Adding line-folding complicates matters:
>
> $ printf "%s\n" aXaaa | COLUMNS=3 freebsd-sed -n 'y/X/\n/;l'
> a$
> aa\
> a$
>
> $ printf "%s\n" aXaaa | gnu-sed -l3 -n 'y/X/\n/;l'
> a\
> \n\
> aa\
> a$
>
> (gnu-sed ignores COLUMNS envvar, but provides '-l N' extension or 'lN'
> command-extension).
I approve of ignoring envvars :-)
> In freebsd-sed, there are only two options:
> either 'backslash-<newline>' is printed, indicating line-folding,
> or 'dollar-<newline>' is printed, indicated end-of-line.
>
> gnu-sed adds a third option: 'backslash-<n>' indicates an embedded newline in
> the pattern.
>
> That's another reason I'd like to keep printing 'backslash-NUL' with -z:
> It makes the output consistent:
> Either 'backslash-DELIMITER' or 'dollar-DELIMTER' or
> 'backslash-ESCAPE-DELIMITER' (meaning '\n' or '\000') - regardless of what
> delimiter it is.
>
> regards,
> - assaf
>
> P.S.
> This is obviously bike-shedding, as the '-z' option has been added in
> feb-2012 (commit a08590648) and it doesn't seem anyone ever complained about
> -z with 'l'.
Thanks for taking the time to write all of this.