[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: International support
From: |
Samphan Raruenrom |
Subject: |
Re: International support |
Date: |
Sun, 10 Jan 1999 07:32:57 +0700 |
Asger K. Alstrup Nielsen wrote:
> In Danish (and I guess the other Scandinavian languages), we form
> words by concatenating others.
> For instance, "pan cake" is written "pandekage", being composed
> of "pande" and "kage".
> Sometimes, we add an extra consonant (typically s) in between
> the two words, but there are no simple rules for this.
>
> The point is that this word construction is so common that it's
> virtually impossible to list all words in a statis dictionary.
> Therefor, it makes sense to have a switch that will accept
> those words that seem to be run-together.
Very interesting!
In Thai, we don't put spaces between words at all so
the same situation happends naturally.
Typical Thai word-segmentation algorithm (which usually
do spelling check also) use maximal-match backtracking
algorithm with trie word list(s).
My implementation is at http://www.thai.net/libinthai/
IBM Classes for Unicode implementation is at
http://www.ibm.com/java/education/boundaries/boundaries.html
- International support, Asger K. Alstrup Nielsen, 1999/01/09
- Re: International support,
Samphan Raruenrom <=