aramorph-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Aramorph-users] XML tables


From: Ahmed El-dawy
Subject: Re: [Aramorph-users] XML tables
Date: Wed, 15 Jun 2005 11:55:16 +0300

Hello,
> <entry>w</entry> : why not "unvocalized" ?
> <voc>wa</voc> : why not "vocalized" ?
> <morpy>Pref-Wa</morpy> : why not "morphology" ? Or
> "morphological-category" to be in sync with the english docs ;-)...
> <gloss>and</gloss>
> <pos>wa/CONJ+</pos> : may be "grammatical category"...
Actually I made this part like this because it is so similar to this
of the sample at LDC. Also it keeps the xml files small if we consider
this matter. Anyway, it is changed now to be more readable. Actually,
I don't know the meaning of the word (pos) till now :)

> May be changed to :
> <glosses>
>   <gloss>and</gloss>
>   <gloss>by/with</gloss>
> </glosses>
Yes, you are right at this. I've changed it in the new version.

> And, of course, the arabic words sould be encoded... in arabic.
I will do it after making the xml files, maybe at the same program who
translates current dictionary to xml files. By the way, there's a
problem if we transformed to xml using the transliteration. One symbol
used is (>) which is already used for closing tag names in XML. We
will have to transform this into &gt;.

> Regarding, the stems dictionary, the format has to be slightly different
> because we have additional information (see
> http://www.nongnu.org/aramorph/english/dictionaries.html) :
> 
> <root>ktb</root>
> <lemmaID>katab-u_1</lemmaID>
> 
> and, maybe, a "normalised" lemma
> <lemma>katab</lemmaID>
> 
I know that the lemma is the one at a line starting with two
semicolons (;;), but what is this root?

> Regarding the compatibility tables, something like this would be nice :
> 
See the current version (attached) and tell me

----------------------------------------------------------------------------------------------------
On 6/15/05, Pierrick Brihaye <address@hidden> wrote:
> Hi,
> 
> Ahmed El-dawy wrote:
> 
> >   I've attached a supposed format for dictionary and compatiblity tables.
> > You will find two .dtd files one for dictionary and the other for
> > compatibility tables. Also you wil find some .xml files as examples.
> > Once we agree on some xml structure I can write a small class to
> > transform dictionaries to the new format.
> 
> Fine. Here are my comments :
> 
> I start with prefix.xml :
> 
> <entry>w</entry> : why not "unvocalized" ?
> <voc>wa</voc> : why not "vocalized" ?
> <morpy>Pref-Wa</morpy> : why not "morphology" ? Or
> "morphological-category" to be in sync with the english docs ;-)...
> <gloss>and</gloss>
> <pos>wa/CONJ+</pos> : may be "grammatical category"...
> 
> Well, as you can see, I like verbose XML :-)
> 
> Also, we could use the capabilities of XML :
> 
> <gloss>and + by/with</gloss>
> 
> May be changed to :
> <glosses>
>   <gloss>and</gloss>
>   <gloss>by/with</gloss>
> </glosses>
> 
> Similarly :
> 
> <pos>wa/CONJ+bi/PREP+</pos>
> 
> May be changed to :
> <grammatical-categories>
>  <grammatical-category>wa/CONJ</grammatical-category>
>  <grammatical-category>bi/PREP</grammatical-category>
> </grammatical-categories>
> 
> And, of course, the arabic words sould be encoded... in arabic.
> 
> Regarding, the stems dictionary, the format has to be slightly different
> because we have additional information (see
> http://www.nongnu.org/aramorph/english/dictionaries.html) :
> 
> <root>ktb</root>
> <lemmaID>katab-u_1</lemmaID>
> 
> and, maybe, a "normalised" lemma
> <lemma>katab</lemmaID>
> 
> Regarding the compatibility tables, something like this would be nice :
> 
> <compatibility-table>
> 
>   <compatibility>
>     <prefix>Pref-0</prefix>
>     <stem>FW</stem>
>   </compatibility>
> 
>   <compatibility>
>     <prefix>Pref-0</prefix>
>     <stem> FW-Wa</stem>
>   </compatibility>
> 
>   <compatibility>
>     <prefix>Pref-0</prefix>
>     <stem>FW-WaBi</stem>
>   </compatibility>
> 
> ...
> 
> </compatibility-table>
> 
> And, of course, we may merge the 3 ones.
> 
> What do you think ?
> 
> Cheers,
> 
> --
> Pierrick Brihaye, informaticien
> Service régional de l'Inventaire
> DRAC Bretagne
> mailto:address@hidden
> +33 (0)2 99 29 67 78
> 
> 
> _______________________________________________
> Aramorph-users mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/aramorph-users
> 


-- 
Regards,
Ahmed Saad

Attachment: tables2.zip
Description: Zip archive


reply via email to

[Prev in Thread] Current Thread [Next in Thread]