sdx-developers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[sdx-developers] Herein Thesaurus


From: Jos Snellings
Subject: [sdx-developers] Herein Thesaurus
Date: Thu, 16 Dec 2004 16:12:11 +0100

 

Bonjour,

[Ce message est en anglais: une copie est destinée aux autres membres du groupe thésaurus dont beaucoup ne comprennent pas la langue de Molière]

It is about the "Herein Thesaurus" project: the Herein thesaurus group as been continuing efforts to construct an european thesaurus on cultural heritage policy and legislation. We are reaching the point where we are working with a stable thesaurus structure that has a few interesting possibilities. The thesaurus is compliant with the norm for international thesauri, wich implies features like:

- poly-hierarchical

- support for several types of linguistic equivalences between languages

- support of any character set (use of unicode)

If you want you can view a part of it; its presentation is on:

http://dev-www.european-heritage.net/sdx/herein -> select "thesaurus" on the menu

This is not fully operational yet, but you can get a good idea by searching a descriptor "that starts with [wildcard]". This demo comes with excuses for the presentation, for instance, the presentation of the linguistic context is subject to vast improvements, its current look is quite poor.

So far, for presentation and lookup, this is very useful, but up til now, the capabilities for using the thesaurus to perform queries in SDX documentbases were very limited, because they have been restricted to a rather clumsy implementation. Injecting query extensions into a search form is hardly efficient.

Now, in order to benefit from this work to full extent, there exists a need to integrate the thesaurus on a deeper level with SDX. Therefore, we would need a class that on one hand works on the herein thesaurus structure, and on the other hand matches the interface as close as possible, thus minimizing the impact on the SDX project in terms of code changes.

So, "otherThesaurus" [name to be chosen] would have to fully implement the interface SDXThesaurus:

It would be a class that stands beside the

* public class LuceneThesaurus extends LuceneDocumentBase implements SDXThesaurus

say,

* public class otherThesaurus implements SDXThesaurus

Its methods are precisely what we need:

public Concept[] search(String query) throws SDXException

public Results expandQuery(fr.gouv.culture.sdx.search.lucene.query.Query query)

public Results expandQuery(fr.gouv.culture.sdx.search.lucene.query.Query query, String fieldName) throws SDXException

public Concept[] getRelations(Concept[] concepts)

and so on ...

It could be that the coding effort is no bigger than extending constants, describing relations etc, and adding a some of configuration handling at system startup, since the internal access methods to the thesaurus database are handled by the class itself.

Of course, this is just wild thinking, therefore I first check with you, but our proposal would be: (hereafter 'we' = Herein Thesaurus Group)

- we write a SDX-compliant module, a java class that implements SDXThesaurus and accesses a database with our structure

- we integrate this thesaurus class in an experimental SDX distribution and communicate the impact analysis to you

- based upon this, AJSLM ensures the convergence of this module with the version evolution of SDX

- the module is made available to all SDX-users who wish to use it

- the module can be fully integrated in a future version of SDX

This is, in a nutshell what we would like to propose.

Of course I cannot fully speak on behalf of the whole thesaurus group, but I wrote the software and I have been communicating these intentions clearly to them and have obtained their warm approval to pursuit this approach earlier this year.

I would appreciate it very much if you would be so kind to share your thoughts about this.

Kind regards,

Jos Snellings


reply via email to

[Prev in Thread] Current Thread [Next in Thread]