emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#22237: closed (sed no longer removes high-ascii ch


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#22237: closed (sed no longer removes high-ascii characters as it did formerly.)
Date: Sat, 26 Dec 2015 21:20:01 +0000

Your message dated Sat, 26 Dec 2015 13:19:07 -0800
with message-id <address@hidden>
and subject line Re: bug#22237: sed no longer removes high-ascii characters as 
it did formerly.
has caused the debbugs.gnu.org bug report #22237,
regarding sed no longer removes high-ascii characters as it did formerly.
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
22237: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22237
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: sed no longer removes high-ascii characters as it did formerly. Date: Fri, 25 Dec 2015 06:21:41 -0600 User-agent: edbrowse/3.4.4
Well, sometimes it do and sometimes it don't.

Script started on Fri 25 Dec 2015 05:53:04 AM CS
~$ed sample
50
l
subject now that thanksgiving has come and gone\342\246$
q
~$
~$sed -i 's/[^a-z 0-9]//g' sample
~$ed sample
50
l
subject now that thanksgiving has come and gone\342\246$
q
~$
~$unsed --version
sed (GNU sed) 4.2.2
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Jay Fenlason, Tom Lord, Ken Pizzini,
and Paolo Bonzini.
GNU sed home page: <http://www.gnu.org/software/sed/>.
General help using GNU software: <http://www.gnu.org/gethelp/>.
E-mail bug reports to: <address@hidden>.
Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.
~$exit

Script done on Fri 25 Dec 2015 05:59:12 AM CS



--- End Message ---
--- Begin Message --- Subject: Re: bug#22237: sed no longer removes high-ascii characters as it did formerly. Date: Sat, 26 Dec 2015 13:19:07 -0800
On Fri, Dec 25, 2015 at 4:21 AM, Brian Tew <address@hidden> wrote:
> Well, sometimes it do and sometimes it don't.
>
> Script started on Fri 25 Dec 2015 05:53:04 AM CS
> ~$ed sample
> 50
> l
> subject now that thanksgiving has come and gone\342\246$
> q
> ~$
> ~$sed -i 's/[^a-z 0-9]//g' sample

To remove all but the matched bytes, you probably want something like
this instead:

  LC_ALL=C sed -i 's/[^[:alnum:] ]//'

Note I've done two things: used LC_ALL=C to override your default
locale (probably a UTF8 one), and to use [:alnum:] in place of that
nonportable a-z range and 0-9.

In general, with UTF8-based locales, a byte sequence like your
\342\246 will match no regular expression, since it is not a valid
UTF8 character.

What probably changed is that older versions of sed did not properly
handle multi-byte locales, or your other experience was using a
single-byte locale.

If you still think there is a problem with sed-4.22, please provide
more detail and I'll reopen this issue.


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]