silpa-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [silpa-discuss] (no subject)


From: Irshad Ahmad
Subject: Re: [silpa-discuss] (no subject)
Date: Thu, 3 Mar 2016 19:12:19 +0530 (IST)

Sorry missed the publication link in previous mail.

Here it is:

http://dl.acm.org/citation.cfm?id=2824872

----- Original Message -----
From: "Irshad Ahmad" <address@hidden>
To: "silpa-discuss" <address@hidden>, "Riyaz Ahmed" <address@hidden>
Sent: Thursday, March 3, 2016 6:59:46 PM
Subject: Re: [silpa-discuss] (no subject)

Thanks for the reply,

Here are the references to my Indic transliterations systems:

Indic-Roman system: https://github.com/irshadbhat/python-irtrans 
Hindi-Urdu system: https://github.com/irshadbhat/python-hutrans
Indic-Indic system: https://github.com/irshadbhat/indic-wx-converter

Indic-Roman is for Indic to Roman and vice-versa transliterations. This system 
currently transliterates between the following language pairs:
English <-> Hindi, 
English <-> Gujurati,
English <-> Telugu.

I've a better performing system in my local repository which also works for 
Tamil, Malayalam, Kannada, Bengali, Oriya, Punjabi, Urdu, Assamese etc. I 
haven't just committed the changes yet. Please note that the system is not 
rule-based, rather build using Machine-Learning. 

Hindi-Urdu system transliterates between hindi <-> urdu. This system is 
separately developed because of the huge vocabulary overlap and other 
similarities between the two languages, which makes this language pair a 
special case unlike other Indian language pairs. 

Indic-Indic system (wx-converter) actually is not a transliteration system. 
This system converts Indic scripts to WX 
(https://en.wikipedia.org/wiki/WX_notation). The main idea of WX is to convert 
Indian scripts to a common representation (ASCII) and then convert this ASCII 
to Roman letters. This system works reasonable for transliteration between 
Indic scripts because Indic scripts have a special property that their phonemes 
are one-to-one aligned between their Unicode tables. The only problem with this 
scheme are the missing phonemes in some scripts like there is no "Va" in 
Bengali script which hardens transliteration. But this can be handled with some 
heuristics. This system is completely rule-based but I can develop a 
machine-learning system for Indic-Indic transliteration as well. 

Here are few examples how this system can be used for transliteration:

echo 'आम आदमी से आजादी आज भी कोसों दूर है' | converter-indic --l hin | 
converter-indic --l mal --s wx
ആമ ആദമീ സേ ആജാദീ ആജ ഭീ കോസോം ദൂര ഹൈ
echo 'आम आदमी से आजादी आज भी कोसों दूर है' | converter-indic --l hin | 
converter-indic --l tel --s wx
ఆమ ఆదమీ సే ఆజాదీ ఆజ భీ కోసోం దూర హై
echo 'आम आदमी से आजादी आज भी कोसों दूर है' | converter-indic --l hin | 
converter-indic --l ori --s wx
ଆମ ଆଦମୀ ସେ ଆଜାଦୀ ଆଜ ଭୀ କୋସୋଂ ଦୂର ହୈ
echo 'आम आदमी से आजादी आज भी कोसों दूर है' | converter-indic --l hin | 
converter-indic --l guj --s wx
આમ આદમી સે આજાદી આજ ભી કોસોં દૂર હૈ
echo 'आम आदमी से आजादी आज भी कोसों दूर है' | converter-indic --l hin | 
converter-indic --l pan --s wx
ਆਮ ਆਦਮੀ ਸੇ ਆਜਾਦੀ ਆਜ ਭੀ ਕੋਸੋਂ ਦੂਰ ਹੈ
echo 'आम आदमी से आजादी आज भी कोसों दूर है' | converter-indic --l hin | 
converter-indic --l kan --s wx
ಆಮ ಆದಮೀ ಸೇ ಆಜಾದೀ ಆಜ ಭೀ ಕೋಸೋಂ ದೂರ ಹೈ

Finally I would like to share my publication which provides the description of 
the procedure that I'he used to build the first two systems.

Thanks
--
Irshad Ahmad


----- Original Message -----
From: "Irshad Ahmad" <address@hidden>
To: "silpa-discuss" <address@hidden>
Sent: Thursday, March 3, 2016 4:31:35 PM
Subject: [silpa-discuss] (no subject)

Hello,

I would like to contribute to the transliteration module of libindic. I have 
been working on transliteration for Indian Languages for past six months. I've 
already come up with some good results for Indic scripts to Roman and 
vice-versa transliterations and transliterations within Indic scripts. I've a 
Python experience of more than 1.5 years as well. I would like to share my 
ideas and propose some extra stuff regarding Indic transliterations as well. I 
was unable to find any mentor information for the project on the wiki page. 
Could you please inform me who will be mentoring the project so that I can 
discuss my ideas with him.

Thanks
--
Irshad Ahmad



reply via email to

[Prev in Thread] Current Thread [Next in Thread]