bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: msgmerge speedup: fstrcmp and diffseq improvements


From: Bruno Haible
Subject: Re: msgmerge speedup: fstrcmp and diffseq improvements
Date: Mon, 15 Sep 2008 00:04:07 +0200
User-agent: KMail/1.5.4

Hi Ralf,

> gettext-tools/src/ChangeLog:
> 2008-09-14  Ralf Wildenhues  <address@hidden>
> 
>       * message.c (fuzzy_search_goal_function): New argument
>       'lower_bound'.  Rewrite to use fstrcmp_if_higher, passing it
>       a lower acceptable bound for the similarity.
>       (message_list_search_fuzzy_inner): Adjust caller.
>       * message.h (fuzzy_search_goal_function): Adjust declaration.
>       * msgl-fsearch.c (message_fuzzy_index_search): Adjust callers.
>       * msgmerge.c (definitions_search_fuzzy): Likewise.

Thanks a lot! I've applied a small variant of this. In
fuzzy_search_goal_function one has to take into account possible rounding
errors while subtracting the "advantage" value.

> FYI, the changes to msgl-fsearch.c and msgmerge.c are untested, except
> for a 'make all check' in gettext.

If some code change passes "make check" and careful code inspection, it's
ok to commit in gettext. The gettext testsuite coverage is quite good.

In summary, the time for "msgmerge af.po coreutils.pot -o /dev/null" was
reduced
  - from 152 sec. before
  - to 19 sec. with this patch and the simple bound shortcut in fstrcmp, and
  - further down to 16.5 sec. with the "early abort" in diffseq.h.

Bruno


2008-09-14  Ralf Wildenhues  <address@hidden>
            Bruno Haible  <address@hidden>

        * message.h (fuzzy_search_goal_function): Add 'lower_bound' argument.
        * message.c (fuzzy_search_goal_function): Likewise. Use fstrcmp_bounded
        instead of fstrcmp.
        (message_list_search_fuzzy_inner): Pass fuzzy_search_goal_function the
        best weight known so far, to shortcut computations.
        * msgl-fsearch.c (message_fuzzy_index_search): Likewise.
        * msgmerge.c (definitions_search_fuzzy): Update
        fuzzy_search_goal_function calls.

*** message.h   7 Oct 2007 19:35:27 -0000       1.27
--- message.h   14 Sep 2008 21:35:33 -0000
***************
*** 1,5 ****
  /* GNU gettext - internationalization aids
!    Copyright (C) 1995-1998, 2000-2007 Free Software Foundation, Inc.
  
     This file was written by Peter Miller <address@hidden>
  
--- 1,5 ----
  /* GNU gettext - internationalization aids
!    Copyright (C) 1995-1998, 2000-2008 Free Software Foundation, Inc.
  
     This file was written by Peter Miller <address@hidden>
  
***************
*** 317,326 ****
  
  
  /* The goal function used in fuzzy search.
!    Higher values indicate a closer match.  */
  extern double
         fuzzy_search_goal_function (const message_ty *mp,
!                                  const char *msgctxt, const char *msgid);
  
  /* The threshold for fuzzy-searching.
     A message is considered only if  fstrcmp (msg, given) > FUZZY_THRESHOLD.  
*/
--- 317,329 ----
  
  
  /* The goal function used in fuzzy search.
!    Higher values indicate a closer match.
!    If the result is < LOWER_BOUND, an arbitrary other value < LOWER_BOUND can
!    be returned.  */
  extern double
         fuzzy_search_goal_function (const message_ty *mp,
!                                  const char *msgctxt, const char *msgid,
!                                  double lower_bound);
  
  /* The threshold for fuzzy-searching.
     A message is considered only if  fstrcmp (msg, given) > FUZZY_THRESHOLD.  
*/
*** message.c   7 Oct 2007 19:35:27 -0000       1.32
--- message.c   14 Sep 2008 21:35:36 -0000
***************
*** 1,5 ****
  /* GNU gettext - internationalization aids
!    Copyright (C) 1995-1998, 2000-2007 Free Software Foundation, Inc.
  
     This file was written by Peter Miller <address@hidden>
  
--- 1,5 ----
  /* GNU gettext - internationalization aids
!    Copyright (C) 1995-1998, 2000-2008 Free Software Foundation, Inc.
  
     This file was written by Peter Miller <address@hidden>
  
***************
*** 531,552 ****
  
  double
  fuzzy_search_goal_function (const message_ty *mp,
!                           const char *msgctxt, const char *msgid)
  {
!   /* The use of 'volatile' guarantees that excess precision bits are dropped
!      before the addition and before the following comparison at the caller's
!      site.  It is necessary on x86 systems where double-floats are not IEEE
!      compliant by default, to avoid that msgmerge results become platform and
!      compiler option dependent.  'volatile' is a portable alternative to gcc's
!      -ffloat-store option.  */
!   volatile double weight = fstrcmp (msgid, mp->msgid);
    /* A translation for a context is a good proposal also for another.  But
       give mp a small advantage if mp is valid regardless of any context or
       has the same context as the one being looked up.  */
    if (mp->msgctxt == NULL
        || (msgctxt != NULL && strcmp (msgctxt, mp->msgctxt) == 0))
!     weight += 0.00001;
!   return weight;
  }
  
  
--- 531,567 ----
  
  double
  fuzzy_search_goal_function (const message_ty *mp,
!                           const char *msgctxt, const char *msgid,
!                           double lower_bound)
  {
!   double bonus = 0.0;
    /* A translation for a context is a good proposal also for another.  But
       give mp a small advantage if mp is valid regardless of any context or
       has the same context as the one being looked up.  */
    if (mp->msgctxt == NULL
        || (msgctxt != NULL && strcmp (msgctxt, mp->msgctxt) == 0))
!     {
!       bonus = 0.00001;
!       /* Since we will consider (weight + bonus) at the end, we are only
!        interested in weights that are >= lower_bound - bonus.  Subtract
!        a little more than the bonus, in order to avoid trouble due to
!        rounding errors.  */
!       lower_bound -= bonus * 1.01;
!     }
! 
!   {
!     /* The use of 'volatile' guarantees that excess precision bits are dropped
!        before the addition and before the following comparison at the caller's
!        site.  It is necessary on x86 systems where double-floats are not IEEE
!        compliant by default, to avoid that msgmerge results become platform 
and
!        compiler option dependent.  'volatile' is a portable alternative to
!        gcc's -ffloat-store option.  */
!     volatile double weight = fstrcmp_bounded (msgid, mp->msgid, lower_bound);
! 
!     weight += bonus;
! 
!     return weight;
!   }
  }
  
  
***************
*** 567,573 ****
  
        if (mp->msgstr != NULL && mp->msgstr[0] != '\0')
        {
!         double weight = fuzzy_search_goal_function (mp, msgctxt, msgid);
          if (weight > *best_weight_p)
            {
              *best_weight_p = weight;
--- 582,589 ----
  
        if (mp->msgstr != NULL && mp->msgstr[0] != '\0')
        {
!         double weight =
!           fuzzy_search_goal_function (mp, msgctxt, msgid, *best_weight_p);
          if (weight > *best_weight_p)
            {
              *best_weight_p = weight;
*** msgl-fsearch.c      7 Oct 2007 19:35:29 -0000       1.3
--- msgl-fsearch.c      14 Sep 2008 21:35:35 -0000
***************
*** 1,5 ****
  /* Fast fuzzy searching among messages.
!    Copyright (C) 2006 Free Software Foundation, Inc.
     Written by Bruno Haible <address@hidden>, 2006.
  
     This program is free software: you can redistribute it and/or modify
--- 1,5 ----
  /* Fast fuzzy searching among messages.
!    Copyright (C) 2006, 2008 Free Software Foundation, Inc.
     Written by Bruno Haible <address@hidden>, 2006.
  
     This program is free software: you can redistribute it and/or modify
***************
*** 553,559 ****
                      {
                        message_ty *mp = findex->messages[ptr->index];
                        double weight =
!                         fuzzy_search_goal_function (mp, msgctxt, msgid);
  
                        if (weight > best_weight)
                          {
--- 553,560 ----
                      {
                        message_ty *mp = findex->messages[ptr->index];
                        double weight =
!                         fuzzy_search_goal_function (mp, msgctxt, msgid,
!                                                     best_weight);
  
                        if (weight > best_weight)
                          {
***************
*** 598,604 ****
        for (j = 0; j < mlp->nitems; j++)
          {
            message_ty *mp = mlp->item[j];
!           double weight = fuzzy_search_goal_function (mp, msgctxt, msgid);
  
            if (weight > best_weight)
              {
--- 599,606 ----
        for (j = 0; j < mlp->nitems; j++)
          {
            message_ty *mp = mlp->item[j];
!           double weight =
!             fuzzy_search_goal_function (mp, msgctxt, msgid, best_weight);
  
            if (weight > best_weight)
              {
*** msgmerge.c  24 Aug 2008 01:01:23 -0000      1.60
--- msgmerge.c  14 Sep 2008 21:35:35 -0000
***************
*** 776,783 ****
        /* Choose the best among mp1, mp2.  */
        if (mp1 == NULL
          || (mp2 != NULL
!             && (fuzzy_search_goal_function (mp2, msgctxt, msgid)
!                 > fuzzy_search_goal_function (mp1, msgctxt, msgid))))
        mp1 = mp2;
      }
  
--- 776,783 ----
        /* Choose the best among mp1, mp2.  */
        if (mp1 == NULL
          || (mp2 != NULL
!             && (fuzzy_search_goal_function (mp2, msgctxt, msgid, 0.0)
!                 > fuzzy_search_goal_function (mp1, msgctxt, msgid, 0.0))))
        mp1 = mp2;
      }
  





reply via email to

[Prev in Thread] Current Thread [Next in Thread]