[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Nano-devel] [patch] properly show invalid byte sequences in UTF-8
From: |
Benno Schulenberg |
Subject: |
[Nano-devel] [patch] properly show invalid byte sequences in UTF-8 |
Date: |
Mon, 13 Apr 2015 21:49:33 +0200 |
Hi all,
When doing for example:
echo "0000000: 20c2 bb6f 6f6f 20c2 7878 78" | xxd -r >botched
and then opening the file 'botched' in nano (in a UTF-8 locale),
it will show:
»ooo »xxx
But the second guillemet isn't really there (if you search for it, the
first one wil be the only occurrence), it is just a ghost. If you type
other multibyte characters before it, it will change its appearance.
That's because the second (and third and fourth) byte of the preceding
multibyte character will still be present in the corresponding memory
locations of the used variable. Below patch blots the second byte to
zero, so that only a single and thus invalid byte wil be seen, that will
then get represented as "�", the Unicode replacement character.
So with below patch, the above line will get shown as:
»ooo �xxx
Index: src/winio.c
===================================================================
--- src/winio.c (revision 5195)
+++ src/winio.c (working copy)
@@ -2043,6 +2043,10 @@
char *nctrl_buf_mb = charalloc(mb_cur_max());
int nctrl_buf_mb_len, i;
+ /* Make sure an invalid sequence starter is chopped off
+ * after the first byte. */
+ null_at(&buf_mb, buf_mb_len);
+
nctrl_buf_mb = mbrep(buf_mb, nctrl_buf_mb,
&nctrl_buf_mb_len);
Benno
--
http://www.fastmail.com - Does exactly what it says on the tin
- [Nano-devel] [patch] properly show invalid byte sequences in UTF-8,
Benno Schulenberg <=