[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: grep-2.10 testing
From: |
Bruno Haible |
Subject: |
Re: grep-2.10 testing |
Date: |
Mon, 21 Nov 2011 14:55:27 +0100 |
User-agent: |
KMail/1.13.6 (Linux/2.6.37.6-0.5-desktop; KDE/4.6.0; x86_64; ; ) |
Hi Jim,
> diff --git a/src/dfa.c b/src/dfa.c
> index e28726d..8f79508 100644
> --- a/src/dfa.c
> +++ b/src/dfa.c
> @@ -1071,8 +1071,18 @@ parse_bracket_exp (void)
> return CSET + charclass_index(ccl);
> }
>
> +/* Add this to the test for whether a byte is word-constituent, since on
> + BSD-based systems, many values in the 128..255 range are classified as
> + alphabetic, while on glibc-based systems, they are not. */
> +#ifdef __GLIBC__
> +# define octet_valid_as_wide_char(c) 1
> +#else
> +# define octet_valid_as_wide_char(c) (MBS_SUPPORT && btowc (c) != WEOF)
> +#endif
> +
> /* Return non-zero if C is a `word-constituent' byte; zero otherwise. */
> -#define IS_WORD_CONSTITUENT(C) (isalnum(C) || (C) == '_')
> +#define IS_WORD_CONSTITUENT(C) \
> + (octet_valid_as_wide_char(C) && (isalnum(C) || (C) == '_'))
>
This code would do the job.
Only, I find this macro name 'octet_valid_as_wide_char' confusing -
because values such as 0xC3 are valid octets and also valid wide characters.
I would call this macro 'is_valid_single_byte_character' or
'is_valid_unibyte_character'. Then it's clear why it has to map 0xC3 to false
in UTF-8 encoding.
Bruno
--
In memoriam Ricardo Flores Magón
<http://en.wikipedia.org/wiki/Ricardo_Flores_Magón>
- Re: grep-2.9.69-f91c testing, (continued)
Re: grep-2.10 testing (was: grep-2.9.69-f91c testing), Bruno Haible, 2011/11/20
- Re: grep-2.10 testing, Jim Meyering, 2011/11/20
- Message not available
- Re: grep-2.10 testing, Jim Meyering, 2011/11/20
- Re: grep-2.10 testing, Bruno Haible, 2011/11/20
- Re: grep-2.10 testing, Jim Meyering, 2011/11/21
- Re: grep-2.10 testing,
Bruno Haible <=
- Re: grep-2.10 testing, Jim Meyering, 2011/11/21
- Re: grep-2.10 testing, Jim Meyering, 2011/11/21