koha-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Koha-devel] Koha 3.0, Zebra and UTF-8


From: Joshua M. Ferraro
Subject: Re: [Koha-devel] Koha 3.0, Zebra and UTF-8
Date: Thu, 23 Aug 2007 08:57:53 -0500 (CDT)

Hi Zeno,

See below ...

----- address@hidden wrote:
> as i know Koha 3.0 it will base the opac on Zebra for the largest
> sites. But as I see here
> http://lists.indexdata.dk/pipermail/zebralist/2007-May/001522.html,
> Zebra has same limit on a full support of UTF-8.
> 
> Viewing the problem for Koha, where are the limits ?
> Can I input data in Latin, Arabic, Chinese, etc scripts and search
> them ?
> 
> With a mix of input scripts do you seggest to use Koha 3.0 without
> Zebra ?

There are plenty of examples of folks using Zebra to manage non-latin-1
languages - for instance, 

greek + english
russian + english
scandinavian languages + english
turkish + english

However, it is currently not possible to index more than two-three of
these simultanous in the same document corpus, as there is a hard
restriction on 256 indexable chars available.

The Index Data folks are in the process of integrating the ICU Unicode
libraries into Zebra, which will give Zebra the capability to index
the full UTF-8 character set in a single document corpus, with no
restriction on indexable characters.

The ICU UFT-8 integration work will provide character normalization and
tokenization over the full UTF-8 range of characters, but it may not
provide tokenization of languages like Japanese and Korean, that may
take a deep linguistic knowledge of the language and could be a lifetime
study in itself. That said, it should minimally provide support for
languages that use whitespace as the word separator.

Note that in Koha, we can do some stemming, synonym expansion, and
article removal/stopword creation pre-index and pre-search, for the
languages that aren't directly supported in Zebra.

Hope that answers your question without getting too technical ;-)

Cheers,

-- 
Joshua Ferraro                       SUPPORT FOR OPEN-SOURCE SOFTWARE
President, Technology       migration, training, maintenance, support
LibLime                                Featuring Koha Open-Source ILS
address@hidden |Full Demos at http://liblime.com/koha |1(888)KohaILS









reply via email to

[Prev in Thread] Current Thread [Next in Thread]