[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Patch to install 4 new fuzzy matching algorithms into msgmerge
From: |
Steve Murphy |
Subject: |
Patch to install 4 new fuzzy matching algorithms into msgmerge |
Date: |
Fri, 25 Mar 2005 11:52:52 -0700 |
Hello--
To preserve for posterity, the work I've done to upgrade msgmerge to
include 4 new fuzzy matching algorithms, and fix the bugs I've found, I
offer the attached patch for your consideration.
The patch is for 0.14.2 of gettext, and includes changes to docs,
makefiles, NEWS, and Changelogs. It's a bit new, but I'll send another
patch if I spot any bugs. A run over a couple hundred files have yielded
no major disasters, and all algorithms appear to function as advertised.
I'd advise a couple thousand tests before releasing it, tho!
I've already reported the bugs to this mailing list, but I also
acknowledge that patches are probably more useful than just bug reports,
and I try to provide a patch where-ever I bother to complain.
My algorithms are fairly simple and straightforward, and are documented
in the msgmerge texi documentation. It seems to generate acceptable info
and man pages from that. I also documented the fstrcmp algorithm, as I
offered in a previous letter. But, I did not touch the msginit verbage.
I simply don't have a handle on how best to describe the dependence on
those scripts in /usr/share/wherever.
The 4 new algorithms don't usually find as many matches than does
fstrcmp, but the first 3 will pretty much find a match very close to
exact. The last of the four basically just looks up all the regular
English words individually in the compendium, and concatenates the
matched msgstrs into a single msgstr. I've had a translator or two tell
me that they liked this sort of thing, which KBabel provided, as it
saves them a dictionary lookup or two in the course of their work. All 4
checks are pretty quick-- Actually, in comparison to fstrcmp, LIGHTENING
fast. [Well, even a tortoise could crawl a mile before fstrcmp is done
with a big list of big files with a big compendium! ;^) ]
To be fair to fstrcmp, tho, enough of the matches that it does turn up,
are helpful enough that I'd not remove it. The 4 algorithms I've thrown
in will helpfully prefilter out the "easy" matches for it, and save some
CPU cycles for the tough cases. YMMV!
I didn't really finish the code that generates a matched msgstr, using a
combo of the msgid being used for a match, and the msgstr turned up. It
could use some/alot of refinement, but it'll do for the moment. After
all, it is generating "fuzzy" msgstr's, after all!
It's a bit prelim, but I'd appreciate any feedback on it, including
disgust, disdain, rejection --even ad hominem attacks, what the heck.
murf
gettext-0.14.2-fuzzyupgrade.patch
Description: Text Data
signature.asc
Description: This is a digitally signed message part
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Patch to install 4 new fuzzy matching algorithms into msgmerge,
Steve Murphy <=