silpa-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [silpa-discuss] Machine transliteration for Indic languages


From: Irshad Ahmad
Subject: Re: [silpa-discuss] Machine transliteration for Indic languages
Date: Mon, 14 Mar 2016 16:25:09 +0530 (IST)

Dear Sir,

I will be really glad to have Riyaz Bhat as my mentor for this project. 
Actually he is my elder brother and I have been working under his guidance 
since I joined IIIT Hyderabad for my MS. He will be the ideal mentor for this 
project, since we both work in the field of NLP for Indian Languages. More 
importantly his availability is not an issue, since he is perusing his PHD from 
the same institute. However I am open to a mentor of your choice as well.

Regards
--
Irshad Bhat

----- Original Message -----
From: "Santhosh Thottingal" <address@hidden>
To: "Irshad Ahmad" <address@hidden>
Cc: "silpa-discuss" <address@hidden>
Sent: Monday, March 14, 2016 2:34:33 PM
Subject: Re: [silpa-discuss] Machine translteration for Indic languages

(edited subject)

Irshad, I read the document. You have written the project concept very
well. I think project has lot of potential use cases.

Do you have any mentor in mind for the project? How about
https://researchweb.iiit.ac.in/~riyaz.bhat/ ?

Santhosh


On Wed, Mar 9, 2016 at 3:42 PM, Irshad Ahmad <
address@hidden> wrote:

> Hello Sir,
>
> Please find attached my Project Idea for Transliteration Module of
> Libindic.
>
> I agree to the fact that only a mapping table approach would not suffice.
> I've proposed two approaches in my project idea. First, as you have already
> mentioned, we need another set of rules to take care of special language
> characteristics. Second, rather than developing exhaustive rules, we use a
> machine learning (ML) approach. Machine Learning algorithms gives computers
> the ability to automatically capture such patterns without being explicitly
> programmed.
>
> I suggest we keep both rule-based and ML system for Indic-Indic
> transliteration and only ML system for Indic-Roman transliteration. Rest of
> the details are in the attached PDF.
>
> I hope you like the idea.
>
> Thanks
> --
> Irshad Ahmad
>
>
> ----- Original Message -----
> From: "Santhosh Thottingal" <address@hidden>
> To: "Irshad Ahmad" <address@hidden>
> Cc: "silpa-discuss" <address@hidden>, "Riyaz Ahmed" <
> address@hidden>
> Sent: Saturday, March 5, 2016 5:10:54 PM
> Subject: Re: [silpa-discuss] (no subject)
>
> Thanks Irshad for introducing your work.
>
> > echo 'आम आदमी से आजादी आज भी कोसों दूर है' | converter-indic --l hin |
> converter-indic --l mal --s wx
> > ആമ ആദമീ സേ ആജാദീ ആജ ഭീ കോസോം ദൂര ഹൈ
> [.. And othr examples..]
>
> This example illustrate one key challenge in transliteration. The output is
> wrong. But if you consider only the letter by letter transliteration output
> is correct. आम in Hindi in Malayalam is ആം. ആമ means tortoise. This
> difference is because of
> https://en.wikipedia.org/wiki/Schwa_deletion_in_Indo-Aryan_languages
>
> So, along with a mapping table approach, we need another set of rules to
> take care of this special language characteristics. Tamil has less
> consonants that can map to more than one consonant in other Indic languages
> depedending on the context. Similarly while converting to Tamil also you
> will face this difference. Malayalam has chillu letters - the vowel less
> form of consonants. It is a huge list of such language features. I believe
> this category can be rule based.
> There is another set of characterristics that cannot be rule based. In the
> past years, people using the existing transliteration library in libindic
> mailed me asking about name transliterations especially from English. Name
> is one specific set, but can be generalized as any nouns. A name like
> pradeep, prathip, pratheep, pradip, pradeeb, pratib, pratib, prateep - all
> should transliterate same to Indic languages. This is also a case where you
> miss the one-to-one correspondance of letters and mapping rules fails. I
> think you already thought about using machine learning to get this part
> done. I think solving this and making the transliteration library smart
> enough is a good project. I can think of various use cases.
>
> I must add that I am bit disconnected from this project and library for
> many months or even couple of years because of my busy job and other pet
> projects. So I might be unaware of some progress made in this area by
> researchers or developers. I also want to make clear that I am not
> committing for mentoring this, unless you really make me impressed and not
> getting anybody else :)
>
> I would suggest you to write down your project idea, including the expected
> outcome, with bried notes on the existing tools and limitations(this will
> help you to understand what is really missing, instead of doing something
> for the purpose of doing). If you do this excercise you will get more
> understanding of planning, timeline, challenges. No matter whether you get
> this in GSOC or not, that will help you to materialize the project in any
> other means.
>
> Santhosh
>



-- 
Santhosh Thottingal
http://thottingal.in



reply via email to

[Prev in Thread] Current Thread [Next in Thread]