Re: grep-2.10 testing

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep-2.10 testing

From:	Bruno Haible
Subject:	Re: grep-2.10 testing
Date:	Mon, 21 Nov 2011 14:55:27 +0100
User-agent:	KMail/1.13.6 (Linux/2.6.37.6-0.5-desktop; KDE/4.6.0; x86_64; ; )

Hi Jim,

> diff --git a/src/dfa.c b/src/dfa.c
> index e28726d..8f79508 100644
> --- a/src/dfa.c
> +++ b/src/dfa.c
> @@ -1071,8 +1071,18 @@ parse_bracket_exp (void)
>    return CSET + charclass_index(ccl);
>  }
> 
> +/* Add this to the test for whether a byte is word-constituent, since on
> +   BSD-based systems, many values in the 128..255 range are classified as
> +   alphabetic, while on glibc-based systems, they are not.  */
> +#ifdef __GLIBC__
> +# define octet_valid_as_wide_char(c) 1
> +#else
> +# define octet_valid_as_wide_char(c) (MBS_SUPPORT && btowc (c) != WEOF)
> +#endif
> +
>  /* Return non-zero if C is a `word-constituent' byte; zero otherwise.  */
> -#define IS_WORD_CONSTITUENT(C) (isalnum(C) || (C) == '_')
> +#define IS_WORD_CONSTITUENT(C) \
> +  (octet_valid_as_wide_char(C) && (isalnum(C) || (C) == '_'))
> 

This code would do the job.

Only, I find this macro name 'octet_valid_as_wide_char' confusing -
because values such as 0xC3 are valid octets and also valid wide characters.
I would call this macro 'is_valid_single_byte_character' or
'is_valid_unibyte_character'. Then it's clear why it has to map 0xC3 to false
in UTF-8 encoding.

Bruno
-- 
In memoriam Ricardo Flores Magón 
<http://en.wikipedia.org/wiki/Ricardo_Flores_Magón>

[Prev in Thread]

Current Thread

[Next in Thread]

Re: grep-2.9.69-f91c testing, (continued)

Prev by Date: Re: grep-2.10 testing
Next by Date: Re: grep-2.10 testing
Previous by thread: Re: grep-2.10 testing
Next by thread: Re: grep-2.10 testing
Index(es):
- Date
- Thread