[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Patch to install 4 new fuzzy matching algorithms into msgmerge

From: Steve Murphy
Subject: Patch to install 4 new fuzzy matching algorithms into msgmerge
Date: Fri, 25 Mar 2005 11:52:52 -0700


To preserve for posterity, the work I've done to upgrade msgmerge to
include 4 new fuzzy matching algorithms, and fix the bugs I've found, I
offer the attached patch for your consideration.

The patch is for 0.14.2 of gettext, and includes changes to docs,
makefiles, NEWS, and Changelogs. It's a bit new, but I'll send another
patch if I spot any bugs. A run over a couple hundred files have yielded
no major disasters, and all algorithms appear to function as advertised.
I'd advise a couple thousand tests before releasing it, tho!

I've already reported the bugs to this mailing list, but I also
acknowledge that patches are probably more useful than just bug reports,
and I try to provide a patch where-ever I bother to complain.

My algorithms are fairly simple and straightforward, and are documented
in the msgmerge texi documentation. It seems to generate acceptable info
and man pages from that. I also documented the fstrcmp algorithm, as I
offered in a previous letter. But, I did not touch the msginit verbage.
I simply don't have a handle on how best to describe the dependence on
those scripts in /usr/share/wherever.

The 4 new algorithms don't usually find as many matches than does
fstrcmp, but the first 3 will pretty much find a match very close to
exact. The last of the four basically just looks up all the regular
English words individually in the compendium, and concatenates the
matched msgstrs into a single msgstr. I've had a translator or two tell
me that they liked this sort of thing, which KBabel provided, as it
saves them a dictionary lookup or two in the course of their work. All 4
checks are pretty quick-- Actually, in comparison to fstrcmp, LIGHTENING
fast. [Well, even a tortoise could crawl a mile before fstrcmp is done
with a big list of big files with a big compendium! ;^) ]

To be fair to fstrcmp, tho, enough of the matches that it does turn up,
are helpful enough that I'd not remove it. The 4 algorithms I've thrown
in will helpfully prefilter out the "easy" matches for it, and save some
CPU cycles for the tough cases. YMMV!

I didn't really finish the code that generates a matched msgstr, using a
combo of the msgid being used for a match, and the msgstr turned up. It
could use some/alot of refinement, but it'll do for the moment. After
all, it is generating "fuzzy" msgstr's, after all!

It's a bit prelim, but I'd appreciate any feedback on it, including
disgust, disdain, rejection --even ad hominem attacks, what the heck.


Attachment: gettext-0.14.2-fuzzyupgrade.patch
Description: Text Data

Attachment: signature.asc
Description: This is a digitally signed message part

reply via email to

[Prev in Thread] Current Thread [Next in Thread]