help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[:alpha:] classe does not support isalpha characters


From: Jorge Gustavo Rocha
Subject: [:alpha:] classe does not support isalpha characters
Date: Mon, 27 Oct 2003 23:55:30 +0000
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)

Hi,

I've made a simple test:
Using a ISO-8859-1 encoded text, I would like to get all words using
something like [[:alpha:]]+
Words with accent characters are not recognized as a whole word, with this rule! I've test each non-recognized character with C function isalpha, and they are isalpha chars.

Here is my simple input (in Portuguese):
A Conceição está a tocar órgão
tendo preferência por música ácida.

Here is my scanner:
        #include <locale.h>
        #include <libintl.h>
        #include <ctype.h>
        int cont=0;
PALAVRA [[:alnum:]]+
%option nomain
%%
{PALAVRA}       { printf("%d - %s\n", ++cont, yytext); }
" "   |
\.      ;
.       { printf("Char:%c %d\n", yytext[0], isalpha(yytext[0])); }
%%
int main() {
        setlocale(LC_ALL, "pt_BR");
        printf("%s\n", setlocale(LC_ALL, "") );
        yylex();
}
int yywrap() {
        printf("Words: %d\n", cont);
        return 1;
}

The accented characters are not part of words, but they are isalpha, when I call isalpha(yytext[0]). Here is the first lines of the output:
1 - A
2 - Concei
Char:ç 1014
Char:ã 1024
3 - o
(etc)

It is a flex limitation? Is it a limitation of mine?
Any feedback will be appreciated.

Thank you,

Jorge
--
jorge gustavo rocha
departamento de informática
universidade do minho
4710-057 braga
portugal
N 41º33'44,5" W 8º23'40,5"
tel +351 253604470 fax +351 253604471 cel +351 919690914







reply via email to

[Prev in Thread] Current Thread [Next in Thread]