[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
UTF-8 line editing problems
UTF-8 line editing problems
Sun, 31 Oct 2004 13:36:26 +0100
(I've already reported this problem, but no solution and no answer came,
only a patch that is supposed to solve this problem but it doesn't. Also
I've got some new pieces of information which might help, hence I'm writing
a new mail summarizing all that I know which might make it easier to fix
Readline and bash have some troubles with line editing in UTF-8 mode. I'm
using "bc" to test standalone readline without bash, and of course I'm also
testing the behavior of bash.
All the test cases described below have some common properties. The line
being edited and the position of the cursor are always imagined correctly by
readline/bash, so if I type blindly and don't look at the screen, but listen
whether a keypress (left or right arrow, backspace or delete) beeps, and I
check the line that is finally accepted by hitting Enter, I find that
everything's perfect. After pressing ^L, the line shown and the position of
the cursor are repaired visually and they're perfect. However, while editing
the line, the line is often printed incorrectly on the terminal and the
cursor is often shown at a wrong place.
Also please note that I'm not that lamer kind of user who tries to use UTF-8
locale inside a Latin-1 terminal or similar bad things... :-) Whenever I'm
talking about UTF-8, the terminal is set to UTF-8 mode, and many
applications (mutt, vim, joe...) work correctly inside them.
In bc, type "á" leftarrow "í". You should see "íá" and the cursor should
stand over "á". You actually see "íá", however, the cursor is beyond "á", at
an incorrect position.
Now if you press "ó", you see "íáóá" on the screen instead of "íóá", and the
cursor stands beyond the four letters instead of over "á". As always, ^L
repairs the screen.
The behavior is exactly the same with vanilla readline 5.0 and with the
current 5 official patches applied, even though the fifth of these patches
is supposed to solve this particular problem based on an earlier report of
mine, but it doesn't solve.
Now download SUSE 9.2's bash source:
and apply the bash-3.0-utf8.patch from this archive to readline.
No matter if you use the five official patches to readline or no, readline's
behavior tested with bc gets perfect by applying the SUSE patch.
>From now on I'll use SUSE's patch, since without it bash is behaving
incorrectly similarly to standalone readline. However, SUSE's patch is still
not 100% perfect. (It is absolutely irrelevant whether or not I apply the 5
mainstream patches to readline and the 14 patches to bash 3.0.)
If I have a boring monochrome prompt, particularly this:
then line editing is now perfect both in legacy 8-bit and in UTF-8 mode.
I type "á" leftarrow "í" leftarrow "ó", and as a result, I see "óíá" and the
cursor stands over "í". Perfect.
However, I prefer having a blue prompt:
This works perfectly in legacy 8-bit mode, but misbehaves when using UTF-8.
Typing the same sequence ("á" leftarrow "í" leftarrow "ó") yields in "óíá"
being displayed but the cursor incorrectly stands over "á".
And obviously once it goes wrong, typing further accented letters and
sometimes pressing left or right arrow causes much more weird things to
Finally I mention that in all buggy cases I tried to show the simplest way
to trigger a bug, however, whenever I said something works correctly, I
performed some stress-test, that is, randomly pressing mostly accented
letters but also some 3-byte UTF-8 characters and plain ascii letters,
left-right arrow, backspace, delete for at least a minute and everything
seemd to be okay. So please don't just fix those particular bugs I've
mentioned, please also perform a similar line editing stress-test before
stating that the problem is fixed. I wouldn't normally give such pieces of
instuctions to developers, but after seeing an official fix which doesn't
fix anything, please understand my doubts, and apologize me for this.
Hoping a real fix in the not so far future... :)
- UTF-8 line editing problems,
Egmont Koblinger <=