savannah-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Savannah-hackers] submission of Book Index Generator for Thai Book - s


From: b4205072
Subject: [Savannah-hackers] submission of Book Index Generator for Thai Book - savannah.nongnu.org
Date: Sun, 05 Jan 2003 02:14:04 -0500
User-agent: Mozilla/5.0 Galeon/1.2.7 (X11; Linux i686; U; ) Gecko/20021208 Debian/1.2.7-5

A package was submitted to savannah.nongnu.org
This mail was sent to address@hidden, address@hidden


Vee Satayamas <address@hidden> described the package as follows:
License: gpl
Other License: 
Package: Book Index Generator for Thai Book
System name: thbookidx
Type: non-GNU

Description:
Book Index Generator for Thai Book generates Indies at back of the book 
automatically. It requires Thai text processing.
Thai language is an asian language which is no space between
each words but space is used to seperate the sentences.
This project generate back of book index base on Salton
algorithm which is the algorithm to calculate the weight of
any word to determine if the word is important to enough to
be an index or not but the major task of this project is
to process Thai text which required :

1. Word segmentation process because there is no space between
Thai words. Nowaday effect algorithm to segment thai words is
base on dictionary but to add all of word in to dictionary is
not possible and there are quite a lot of ambiguity to determine
the word boundary. I try to improve this process and it become
subproject which can find at http://thaiwordseg.sourceforge.net/

2. Noun phrase analysis and word formation process.
Back of book index is not only the words but phrases
therefore phrases and complex words are need to find also.
And it also important to find index and subindex.

There are some class diagram and proposal of this project
but in Thai language at
http://vivaldi.cpe.ku.ac.th/~vee/wiki.php/BookIndex

Other Software Required:


Other Comments:






reply via email to

[Prev in Thread] Current Thread [Next in Thread]