bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#21251: sed: POSIX and the z command


From: Stephane Chazelas
Subject: bug#21251: sed: POSIX and the z command
Date: Thu, 13 Aug 2015 15:55:20 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Last one for today ;)

The GNU sed documentation has:

`z'
     This command empties the content of pattern space.  It is usually
     the same as `s/.*//', but is more efficient and works in the
     presence of invalid multibyte sequences in the input stream.
     POSIX mandates that such sequences are _not_ matched by `.', so
     that there is no portable way to clear `sed''s buffers in the
     middle of the script in most multibyte locales (including UTF-8
     locales).

The part about the POSIX requirement is not true. The behaviour
of sed on non-text input is unspecified, so it doesn't require
that . not match a byte that is not part of a valid character.

GNU sed's (or grep's for that matters) . (or [^[:alnum:]]...)
could just as well match every byte that doesn't otherwise form
part of a valid character (which would be a much better
behaviour IMO) and still be POSIX compliant.

That POSIX requirement is true for regexec() but not for text
utilities.

See that discussion on the Austin Group mailing list:
http://thread.gmane.org/gmane.comp.standards.posix.austin.general/11059/focus=11098

-- 
Stephane





reply via email to

[Prev in Thread] Current Thread [Next in Thread]