[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Method behind the madness?

From: Steve Murphy
Subject: Method behind the madness?
Date: Wed, 09 Mar 2005 22:59:44 -0700


I'm in the process of inserting another "fuzzy" match between the exact
match search and the fuzzy
match search. I note that the provided fuzzy matching, at least for the
language I'm working with (Kinyarwanda), 
is very disappointing. So, this is why I'm poking my nose around in your

I note that at the higher level, the "definitions" list is stocked with
the compendium(s), and then the "def" po file is
inserted as the first entry.  This seems to guarantee a few things, one
of them I wasn't expecting:

1. If the def po file has a matching msgid, and the msgstr contains
something, it will be used, and no fuzzy match
    will be searched. This is fine and good.
2. If the def po file has a matching msgid, and the msgstr is empty,
then, unless an exact match with something better
    in it is found later in the definitions lists, then, the empty
msgstr ("") is used, and no fuzzy matches are attempted.
    This puzzled me. Indeed, it seems that the only time a fuzzy match
will be attempted, is if there is an entry in the
    POT (ref), which is new, and no matching msgid exists in the "def"
po. After the PO file has the msgid, there is
    no longer any opportunity to fuzzy match it.

Is #2 a bug, or a feature?  By changing the code in msgmerge.c (line
935, in func match_domain) from:

      defmsg = message_list_list_search (definitions, refmsg->msgid);
      if (defmsg)
          /* Merge the reference with the definition: take the #. and


      defmsg = message_list_list_search (definitions, refmsg->msgid);
      if (defmsg && defmsg->msgstr && defmsg->msgstr[0] )
          /* Merge the reference with the definition: take the #. and

gives behavior I would expect.  --- Perhaps I should say, behavior ****
I **** would expect.

Now, I'm kinda young to this code, and I am unschooled in all the subtle
craftiness and complexity
of the info flow in translations, and if I'm thinking wrongly about how
all this should work, please, 
please, be patient with me and correct my thinking! A whole nation could
benefit from your wisdom.

One other thing, that I think could be considered a bug. I've read the
info manual, and the man pages
for msgmerge, and all the other verbage in the info doc, but nowhere did
I find any kind of description
of just how the fuzzy matcher works. And I see the code in the src
directory for the fuzzy matching, but
there is really no explanation of the algorithm, and it is not
intuitively obvious (at least, not to me), exactly
what the algorithm is doing. I can kinda see how it (fstrcmp) assigns a
"value" to the "closeness" between strings,
and selects the "closest" string from the lot. But a high-level
explanation in the documentation might help 
shape more intelligent expectations.


Attachment: signature.asc
Description: This is a digitally signed message part

reply via email to

[Prev in Thread] Current Thread [Next in Thread]