[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-readline] rl_point, multibyte strings, and the cursor position
From: |
Ulf Magnusson |
Subject: |
Re: [Bug-readline] rl_point, multibyte strings, and the cursor position |
Date: |
Tue, 17 Feb 2015 21:28:42 +0100 |
Thanks for the feedback!
The -1 comparison should be safe in practice on non-exotic systems
where the size (rank) of size_t is at least that of int, but yeah,
it's kinda pointless and stupid to leave out the cast.
I think I'll roll the mbrtowc -2 case into the error case as wc_len <
0 for now. It'd be weird MB_CUR_MAX gave -2, but it's worth checking
for at least.
I added control character handling by doing the following btw:
width += iswcntrl(wc) ? 2 : max(0, wcwidth(wc));
Guess that might catch more characters than it should though.
I also noticed that readline outputs things like "~Z" for some (meta?)
characters. Might want to get back to that later...
/Ulf
On Tue, Feb 17, 2015 at 5:44 PM, Chet Ramey <address@hidden> wrote:
> On 2/16/15 4:52 PM, Ulf Magnusson wrote:
>> On Mon, Feb 16, 2015 at 4:43 PM, Ulf Magnusson <address@hidden> wrote:
>>> I'll try it. Thanks for the suggestion!
>>>
>>> /Ulf
>>>
>>
>> Here's what I came up with in case someone else runs into the same
>> problem. I'm sure there's more stuff to handle (not sure what to do
>> for non-printable characters for example), but it seems to handle
>> multibyte (tested using åäö's and Chinese) and combining characters
>> correctly for UTF-8 at least:
>
> This is basically what an implementation of wcswidth looks like. A couple
> of suggestions:
>
>> // Returns the total width (in columns) of the characters in the 'n'-byte
>> // prefix of the null-terminated multibyte string 's'. If 'n' is larger than
>> // 's', returns the total width of the string. Suitable for calculating a
>> // cursor position.
>> //
>> // Makes a guess for malformed strings.
>> static size_t strnwidth(const char *s, size_t n) {
>> mbstate_t shift_state;
>> wchar_t wc;
>> size_t wc_len;
>> size_t width = 0;
>>
>> // Start in the initial shift state.
>> memset(&shift_state, '\0', sizeof shift_state);
>>
>> for (size_t i = 0; i < n; i += wc_len) {
>> // Extract the next multibyte character.
>> wc_len = mbrtowc(&wc, s + i, MB_CUR_MAX, &shift_state);
>> if (wc_len == 0)
>> // Reached the end of the string.
>> break;
>> if (wc_len == -1)
>
> wc_len is a size_t, which is usually unsigned. You need to cast the -1
> to (size_t)-1. You also need to handle mbrtowc returning (size_t)-2.
>
>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
> ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, ITS, CWRU address@hidden http://cnswww.cns.cwru.edu/~chet/
- [Bug-readline] rl_point, multibyte strings, and the cursor position, Ulf Magnusson, 2015/02/16
- Re: [Bug-readline] rl_point, multibyte strings, and the cursor position, Chet Ramey, 2015/02/16
- Re: [Bug-readline] rl_point, multibyte strings, and the cursor position, Ulf Magnusson, 2015/02/16
- Re: [Bug-readline] rl_point, multibyte strings, and the cursor position, Chet Ramey, 2015/02/16
- Re: [Bug-readline] rl_point, multibyte strings, and the cursor position, Ulf Magnusson, 2015/02/16
- Re: [Bug-readline] rl_point, multibyte strings, and the cursor position, Ulf Magnusson, 2015/02/16
- Re: [Bug-readline] rl_point, multibyte strings, and the cursor position, Chet Ramey, 2015/02/17
- Re: [Bug-readline] rl_point, multibyte strings, and the cursor position,
Ulf Magnusson <=
- Re: [Bug-readline] rl_point, multibyte strings, and the cursor position, Ulf Magnusson, 2015/02/17
- Re: [Bug-readline] rl_point, multibyte strings, and the cursor position, Ulf Magnusson, 2015/02/17
- Re: [Bug-readline] rl_point, multibyte strings, and the cursor position, Chet Ramey, 2015/02/18