aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Aspell-user] byte offsets vs. character offsets


From: Gitanjali Bhatia
Subject: [Aspell-user] byte offsets vs. character offsets
Date: Tue, 19 Sep 2006 09:45:53 -0700

Hi,

I am using the C interface to the aspell library to parse the incoming text that I need to spell check. This is what I am doing:

 

spell_document_checker_process(checker, pText, -1);

        

 while (token = aspell_document_checker_next_misspelling(checker), token.len != 0)

 

The text that I am passing in is Vietnamese UTF-8 encoded text. What I am seeing is that the token.offset that I get is in bytes and not characters. This means that if a character before the misspelled word was 3 bytes long, then the offset of the misspelled word would be off by 3 as well. This causes a problem for me to highlight or replace the word. I looked into the aspell manual and saw that there is an option to set the byte-offsets. I tried setting it to both true and false, but the offset seems to be the same each time. I set it through a config file that I then load the following way:

 

AspellConfig* config = new_aspell_config();

aspell_config_replace( config, "conf", "aspell.conf" );  

 

Is there any other way to get the character offset?

 

Thanks

-Gitanjali


reply via email to

[Prev in Thread] Current Thread [Next in Thread]