[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[:alpha:] classe does not support isalpha characters
From: |
Jorge Gustavo Rocha |
Subject: |
[:alpha:] classe does not support isalpha characters |
Date: |
Mon, 27 Oct 2003 23:55:30 +0000 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax) |
Hi,
I've made a simple test:
Using a ISO-8859-1 encoded text, I would like to get all words using
something like [[:alpha:]]+
Words with accent characters are not recognized as a whole word, with
this rule!
I've test each non-recognized character with C function isalpha, and
they are isalpha chars.
Here is my simple input (in Portuguese):
A Conceição está a tocar órgão
tendo preferência por música ácida.
Here is my scanner:
#include <locale.h>
#include <libintl.h>
#include <ctype.h>
int cont=0;
PALAVRA [[:alnum:]]+
%option nomain
%%
{PALAVRA} { printf("%d - %s\n", ++cont, yytext); }
" " |
\. ;
. { printf("Char:%c %d\n", yytext[0], isalpha(yytext[0])); }
%%
int main() {
setlocale(LC_ALL, "pt_BR");
printf("%s\n", setlocale(LC_ALL, "") );
yylex();
}
int yywrap() {
printf("Words: %d\n", cont);
return 1;
}
The accented characters are not part of words, but they are isalpha,
when I call isalpha(yytext[0]). Here is the first lines of the output:
1 - A
2 - Concei
Char:ç 1014
Char:ã 1024
3 - o
(etc)
It is a flex limitation? Is it a limitation of mine?
Any feedback will be appreciated.
Thank you,
Jorge
--
jorge gustavo rocha
departamento de informática
universidade do minho
4710-057 braga
portugal
N 41º33'44,5" W 8º23'40,5"
tel +351 253604470 fax +351 253604471 cel +351 919690914
- [:alpha:] classe does not support isalpha characters,
Jorge Gustavo Rocha <=