|
From: | Gilbert, Brandon (Synchrony) |
Subject: | Re: [bug-gawk] [External] Re: Invalid Characters Causing Problems in awk 4.0.2 |
Date: | Thu, 23 Aug 2018 20:35:05 +0000 |
Thank you. I have noticed that doing a
wc -c on a record with a special character, the character count is 2 bytes less than a record that does not have a special character in it.
Would this indicate the multibyte encoding? …Brandon From: Wolfgang Laun <address@hidden>
Hi Gilbert, programs on a system with the setting en_US.UTF-8 and acting accordingly will process Ñ ñ encoded as \xc3\x91 \xc3\xb1 correctly and without any complaint. If the program is led to believe that the data is encoded according to ISO-8859-1,
not much would happen except that a single Ñ or ñ would result in two characters. If, however, Ñ ñ are encoded according to ISO-8859-1 as \xd1 and \xf1, a program following
en_US.UTF-8 will have to indicate an error since no UTF-8 encoding (a multibyte encoding) begins with either characters. Using /usr/bin/od to look at the "raw" data is a useful first step to see what is going on. -W |
[Prev in Thread] | Current Thread | [Next in Thread] |