Followed another testing method :
1. Created a word corpus from chapter 1 of ഒരുകുടുംബപുരാണം from
sayahna.org2. Learned the word corpus (about 1000 words)
3. Tested transliterating chapter 2 of ഒരുകുടുംബപുരാണം (about 1100 words).
4. Repeated the test with and without stemmer
Saw an improvement in accuracy of 1%! The idea was that since the both the sets are from similar sources, some words would overlap and some words will repeat with a different suffix. However, I think that the decreased accuracy improvementt might be because I'm typing the manglish incorrectly. I remember you mentioning some sort of "manglish" standard. Is it available online somewhere?
The sh and ruby scripts I used and the word corpus from the novel are all in my tools repository [1]