vile
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: insert-string garbling characters


From: Thomas Dickey
Subject: Re: insert-string garbling characters
Date: Tue, 5 Jan 2021 04:23:23 -0500
User-agent: Mutt/1.10.1 (2018-07-13)

On Mon, Dec 21, 2020 at 08:13:23PM +0000, Thomas Dupond wrote:
> Dear Thomas,
> 
> Please excuse my very late response.  I applied the patch as

I'm late too (have been working on an upgrade to configure scripts...)
>>I've attached the test-scripts that I used:
>>      I is mapped to an insert (using UTF-8)
>>      J is mapped to an insert using ^Vu
>>      K sets the buffer to 8bit
>>      M uses the insert-string command.

> you described and I will try to describe concisely what I
> observed using your foo?.rc files.
> 
> For foo.rc (é):
> 
>       I: inserts \?E9
>       J: inserts é
>       M: inserts \?E9
>       K: Turns \?E9 into é and é into é
> 
>       After using K; I, J and M insert é correctly.

that seems correct: after K, the current buffer is "8bit" (not UTF-8),
so that UTF-8 data which was inserted into the buffer will be shown
as its 8-bit bytes.  Also, inserts using literal strings (as done in
a map or the insert-string command) will insert the actual bytes
used in those commands.  The ^Vu mapping tries to convert the value,
so you'll see some difference between I and J.

Using ^G in vile (editing this message), I see this information:

é -> char is 0xE9 or 0351
ǩ -> char is 0x1E9 or 0751

My "hex" program shows the bytes for those two cases:

0xe9: 233 0351 0xe9 text "\351" utf8 \303\251
0x1E9: 489 0751 0x1e9 text "\001\351" utf8 \307\251

and (showing what those octal values are):

0303: 195 0303 0xc3 text "\303" utf8 \303\203
0251: 169 0251 0xa9 text "\251" utf8 \302\251
0307: 199 0307 0xc7 text "\307" utf8 \303\207
0251: 169 0251 0xa9 text "\251" utf8 \302\251

à -> char is 0xC3 or 0303
© -> char is 0xA9 or 0251
Ç -> char is 0xC7 or 0307
© -> char is 0xA9 or 0251
 
> For foo2.rc (ǩ):
> 
>       I: inserts ǩ
>       J: inserts ǩ
>       M: inserts ǩ
>       K: Turns ǩ into Ç©
> 
>       After using K; I and M insert Ç©, J inserts é
> 
> For foo3.rc (⇩)
> 
>       I: inserts ⇩
>       J: inserts ⇩
>       M: inserts ⇩
>       K: Turns ⇩ into â\x87©
> 
>       After using K; I and M insert â\x87©, J inserts é
> 
> Regards,
> Thomas
> 
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> Le mercredi, décembre 2, 2020 1:58 AM, Thomas Dickey <dickey@his.com> a écrit 
> :
> 
> > On Tue, Dec 01, 2020 at 04:18:48AM -0500, Thomas Dickey wrote:
> >
> > > On Mon, Nov 30, 2020 at 02:06:40PM +0000, Thomas Dupond wrote:
> > >
> > > > Thank you very much for your fast reply and this small patch.
> > > > I downloaded version 9.8u and applied the patch but now
> > > > whenever I try to insert an é interactively I only get : \?E9
> > >
> > > ouch - I'll continue investigating a fix for this (thanks)
> >
> > Here's a followup (apply after the previous patch) which seems to work.
> >
> > I've attached the test-scripts that I used:
> > I is mapped to an insert (using UTF-8)
> > J is mapped to an insert using ^Vu
> > K sets the buffer to 8bit
> > M uses the insert-string command.
> >
> > > > And this goes for every character like èéêîïôöûù they do not
> > > > appear correctly. When I switch to 8bit encoding with
> > > > `setl fk=8bit` \?E9 appears correctly as é.
> > > > On the bright side, when I use insert-string with UTF-8
> > > > encoding, the characters appear normally in the command prompt
> > > > and insert-string works as intended.
> > >
> > > halfway there :-)
> > >
> > > > I realised that you use `setl fk=8bit` and with this encoding
> > > > everything work as intended, interactive and insert-string. But
> > > > I would rather use UTF-8 than ISO-8859.
> > > > I'm sorry I cannot be of much help, I have very little
> > > > knowledge of C programming.
> > > > Regards,
> > > > Thomas Dupond
> > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > > Le lundi, novembre 30, 2020 1:59 AM, Thomas Dickey dickey@his.com a 
> > > > écrit :
> > > >
> > > > > On Sun, Nov 29, 2020 at 04:21:24PM +0000, Thomas Dupond wrote:
> > > > >
> > > > > > Hello,
> > > > > > I'm just starting on exploring vile and I fell on something I
> > > > > > cannot solve. I was trying to write a macro and while vile
> > > > > > seems to handle UTF8 really well it doesn't seem to work well
> > > > > > with the function insert-string.
> > > > > > I can insert "é" but when I use "insert-string é" it prints this
> > > > > > mess : ᅢᄅ
> > > > >
> > > > > yes... scripting hasn't been as well-tested as interactive stuff :-(
> > > > >
> > > > > > Any idea on how to solve this ?
> > > > >
> > > > > Here's a fix for the most common case (it won't handle a special
> > > > > case where the buffer is non-UTF-8), which should work for you.
> > > > > also attaching a script I used for testing, e.g.,
> > > > > ./configure --enable-trace --with-builtin-filters && make
> > > > > ./vile @foo.rc makefile
> > > > >
> > > > > > My locale is :
> > > > > > LANG=en_GB.UTF-8
> > > > > > LANGUAGE=en_GB:en
> > > > >
> > > > > ...
> > > > >
> > > > > > I'm on debian 4.19 and I compiled vile-9.8 from source with
> > > > >
> > > > > 9.8's getting a little stale - 9.8u is current.
> > > > > I put snapshots in github (but have too many concurrent things to
> > > > > polish off 9.8v)
> > > > >
> > > > > > ./configure --with-builtin-filters
> > > > > > And $cfgopts = hypertext,locale,iconv,multibyte,terminfo
> > > > > > Kind regards,
> > > > > > Thomas
> > > > >
> > > > > --
> > > > > Thomas E. Dickey dickey@invisible-island.net
> > > > > https://invisible-island.net
> > > > > ftp://ftp.invisible-island.net
> > >
> > > --
> > > Thomas E. Dickey dickey@invisible-island.net
> > > https://invisible-island.net
> > > ftp://ftp.invisible-island.net
> >
> > --
> >
> > Thomas E. Dickey dickey@invisible-island.net
> > https://invisible-island.net
> > ftp://ftp.invisible-island.net
> 
> 

-- 
Thomas E. Dickey <dickey@invisible-island.net>
https://invisible-island.net
ftp://ftp.invisible-island.net

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]