[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Nano-devel] nano and mixed encodings
From: |
Benno Schulenberg |
Subject: |
[Nano-devel] nano and mixed encodings |
Date: |
Mon, 20 Jul 2015 22:02:34 +0200 |
A little while ago, Mike reported some surprising search behaviour [1]
of nano when a file contains a mix of both UTF-8 and ISO-8859-1 encoded
characters: in a UTF-8 locale (which should be the default for most people
nowadays), nano will find both the "misencoded" ISO-8859-1 character and
the proper UTF-8 one.
[1] https://lists.gnu.org/archive/html/nano-devel/2009-02/msg00018.html
For example, do:
echo "0000000: 2020 c3ba 2020 c3bc 2020 fa20 20fc" | xxd -r >foo
and then open 'nano foo' and see how it shows:
ú ü � �
If you then search for ú or ü, nano will find each of them twice.
Which is strange, because you can't even see what these "misencoded"
characters are.
In a reply, Chris suggests that nano should do what vim and emacs
do in this case. Well..., running 'vim foo' shows this:
ú ü ú ü
and searching for ú or ü will only find the misencoded second one.
Vim apparently autoconverts the file when it finds bytes in there
that are not valid UTF-8 and then assumes it to be ISO-8859-x.
Running 'emacs foo' shows this:
\303ş \303ĵ ú ü
(That is in my Esperanto locale; in other UTF-8 locales it will show
the same as vim.) Searching finds, of course, just one ú or ü.
Running 'dex foo' shows this:
ú ü <fa> <fc>
(where the <fa> and <fc> are coloured to indicate their invalidity).
This is nice: it doesn't show puzzling question marks but directly
displays the invalid bytes. It doesn't do any conversion.
Pico will show this:
ú ü ?
(Yes, just one question mark.) And again, searching will find,
of course, just one ú or ü.
So... when nano wants to be like Pico, it should find only the
validly encoded ú and ü. The patch attached to the following
rereported bug (https://savannah.gnu.org/bugs/?45579) does this.
Comments?
Benno
--
http://www.fastmail.com - Send your email first class
- [Nano-devel] nano and mixed encodings,
Benno Schulenberg <=