silpa-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [silpa-discuss] (no subject)


From: Santhosh Thottingal
Subject: Re: [silpa-discuss] (no subject)
Date: Sat, 5 Mar 2016 17:10:54 +0530

Thanks Irshad for introducing your work.

> echo 'आम आदमी से आजादी आज भी कोसों दूर है' | converter-indic --l hin | converter-indic --l mal --s wx
> ആമ ആദമീ സേ ആജാദീ ആജ ഭീ കോസോം ദൂര ഹൈ
[.. And othr examples..]

This example illustrate one key challenge in transliteration. The output is wrong. But if you consider only the letter by letter transliteration output is correct. आम in Hindi in Malayalam is ആം. ആമ means tortoise. This difference is because of https://en.wikipedia.org/wiki/Schwa_deletion_in_Indo-Aryan_languages 

So, along with a mapping table approach, we need another set of rules to take care of this special language characteristics. Tamil has less consonants that can map to more than one consonant in other Indic languages depedending on the context. Similarly while converting to Tamil also you will face this difference. Malayalam has chillu letters - the vowel less form of consonants. It is a huge list of such language features. I believe this category can be rule based.
There is another set of characterristics that cannot be rule based. In the past years, people using the existing transliteration library in libindic mailed me asking about name transliterations especially from English. Name is one specific set, but can be generalized as any nouns. A name like pradeep, prathip, pratheep, pradip, pradeeb, pratib, pratib, prateep - all should transliterate same to Indic languages. This is also a case where you miss the one-to-one correspondance of letters and mapping rules fails. I think you already thought about using machine learning to get this part done. I think solving this and making the transliteration library smart enough is a good project. I can think of various use cases.

I must add that I am bit disconnected from this project and library for many months or even couple of years because of my busy job and other pet projects. So I might be unaware of some progress made in this area by researchers or developers. I also want to make clear that I am not committing for mentoring this, unless you really make me impressed and not getting anybody else :)

I would suggest you to write down your project idea, including the expected outcome, with bried notes on the existing tools and limitations(this will help you to understand what is really missing, instead of doing something for the purpose of doing). If you do this excercise you will get more understanding of planning, timeline, challenges. No matter whether you get this in GSOC or not, that will help you to materialize the project in any other means.

Santhosh


reply via email to

[Prev in Thread] Current Thread [Next in Thread]