[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#18266: handling bytes not part of the charset, and other garbage
From: |
Vincent Lefevre |
Subject: |
bug#18266: handling bytes not part of the charset, and other garbage |
Date: |
Fri, 12 Sep 2014 10:29:16 +0200 |
User-agent: |
Mutt/1.5.23-6361-vl-r59709 (2014-07-25) |
On 2014-09-11 20:26:12 -0700, Paul Eggert wrote:
> Vincent Lefevre wrote:
>
> >ypig% LC_ALL=C locale charmap
> >ANSI_X3.4-1968
>
> That may be what the 'locale' command says, but bytes with the top bit on
> are considered to be valid single-byte characters. There are no encoding
> errors. So, in that sense it's not strict ASCII.
Glibc regards it as ASCII:
$ printf '\xe8' | LC_ALL=C iconv
iconv: illegal input sequence at position 0
> >the current behavior breaks the sometimes used "grep ." solution
> >to match non-empty lines.
>
> "grep ." matches lines containing one or more characters. Encoding errors
> are not characters, at least, not as far as plain grep is concerned.
I just mean that "grep ." is a method given by some people, that
was working before UTF-8.
--
Vincent Lefèvre <address@hidden> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
bug#18266: handling bytes not part of the charset, and other garbage (was: grep -P and invalid exits with error), Vincent Lefevre, 2014/09/11
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/11
- bug#18266: handling bytes not part of the charset, and other garbage, Vincent Lefevre, 2014/09/11
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/11
- bug#18266: handling bytes not part of the charset, and other garbage, Vincent Lefevre, 2014/09/11
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/11
- bug#18266: handling bytes not part of the charset, and other garbage,
Vincent Lefevre <=
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Vincent Lefevre, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Jim Meyering, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Vincent Lefevre, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Vincent Lefevre, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/12
- bug#18266: handling bytes not part of the charset, and other garbage, Paul Eggert, 2014/09/15