bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gperf-bugs] Revised gperf 2.7.2 patch


From: Bruno Haible
Subject: Re: [Gperf-bugs] Revised gperf 2.7.2 patch
Date: Mon, 18 Nov 2002 13:58:01 +0100 (CET)

On Tuesday 01 October 2002 19:53, Bruce Lilly wrote:
> In February 2002, I posted some patches to gperf 2.7.2 to address
> a number of issues:
> 
> 1. efficient case-independent matching of keywords
> 2. aoutomatic determination of optimal key positions
> 
> A revised patch is available at
> http://users.erols.com/blilly/mailparse/gperf.patch
> 
> Item #1 above has not changed.  Item #2 has been
> substantially improved; the method now used is much
> faster and handles some pathological cases that were
> not handled in the earlier patch.

Thanks for the patch. I'm adding functionality equivalent to #2 to
gperf-2.8. I couldn't use your patch because the previous gperf
sources were already very convoluted, nearly unmaintainable.  Had to
clean up the gperf sources first.

Can you explain what the distinction between
  2: key position might be useful
  3: key position uniquely identifies keywords of that length or longer
brings in practice? I mean, in either case you can have several
positions of the same type, and you don't know a priori which one to
choose. So you need some form of backtracking anyway.

Item #1 has already been rejected earlier.

> Additional issues are also handled:
> 
> 3. more efficient code when all keywords have the same length
> 4. more efficient code when the -l option is used
> 5. correct handling of automatic key position determination
>     when -n option is used
> 
> Details:
> 
> Vanilla gperf 2.7.2 generates a test like:
> 
>   if (len <= MAX_WORD_LENGTH && len >= MIN_WORD_LENGTH)
> 
> Clearly that is inefficient when all keywords have the
> same length (MAX_WORD_LENGTH == MIN_WORD_LENGTH). The
> patch generates the following in such cases:
> 
>    if (len == MIN_WORD_LENGTH)
> 
> which requires a single comparison rather than two.

Modern compilers generate a single comparison for this expression
anyway.

> When the -l option is used, vanilla gperf checks the
> length via lengthtable and a string comparison, then
> makes an additional check for a terminating '\0';
> that additional check is unnecessary (because of the
> lengthtable check).

When -l is given, gperf 2.7.2 uses a lengthtable access and then a
string comparison with memcmp, excluding the terminating '\0'.

> The revised automatic optimal key position determining
> code ensures that at least one key position is used if
> the -n option is specified or if all keywords have the
> same length.

That's what one would expect from an automatic key position
determination anyway.

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]