[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: international alnum characters not part of words even with locale se
From: |
Chet Ramey |
Subject: |
Re: international alnum characters not part of words even with locale set |
Date: |
Thu, 2 Jan 2003 10:41:34 -0500 |
> The bindable functions backward-word and forward-word can be used
> to move across word boundaries. However, these functions does not
> seem to care about the current locale (LC_CTYPE).
>
> For example, I have 'export LC_CTYPE=en_GB' in my .bash_profile.
> This locale supports ISO8859-1 characters. Type 'cat lösenord'
> in bash, and press ctrl+left or whatever key that is bound to
> backward-word. The cursor will now move to the 's', because it
> thinks 'ö' is a word delimiter. This happen in bash as well in
> other readline programs. (ö has byte-value 246.)
>
> I also have these in my .inputrc:
>
> set output-meta on
> set input-meta on
> set convert-meta off
>
> I think the problem is in chardefs.h, in the definition of NON_NEGATIVE:
>
> #define NON_NEGATIVE(c) ((unsigned char)(c) == (c))
>
> (NON_NEGATIVE is used in _rl_lowercase_p, _rl_uppercase_p,
> _rl_pure_alphabetic and ALPHABETIC in the same header file.) This
> does not take into account the fact that input-meta/meta-flag is set
> to true. Maybe this definition would be more correct:
>
> #define NON_NEGATIVE(c) (_rl_meta_flag || (unsigned char)(c) == (c))
>
> (I haven't tested it though.)
The intent of NON_NEGATIVE is to prevent buffer overflows and potential
core dumps with the ctype.h macros.
Consider a system where chars are signed, and the is* ctype.h macros are
implemented using table lookups. isprint('\200') is therefore equivalent
to isprint(-128), which is undefined and will potentially cause a core
dump.
It's tempting to try to work around this using simple casts to `unsigned char'
when calling the is* macros, but this mishandles EOF: in the case where
c == EOF, (unsigned char) (c) == 255, which is a printable character in many
locales.
Your patch, while it may solve your particular problem, re-exposes the
problem NON_NEGATIVE was intended to solve.
Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
( ``Discere est Dolere'' -- chet )
Chet Ramey, ITS, CWRU chet@po.CWRU.Edu http://cnswww.cns.cwru.edu/~chet/
- Re: international alnum characters not part of words even with locale set,
Chet Ramey <=