bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] fall back to glibc matcher if a multibyte match is found


From: Jim Meyering
Subject: Re: [RFC PATCH] fall back to glibc matcher if a multibyte match is found
Date: Fri, 30 Apr 2010 18:31:57 +0200

Paolo Bonzini wrote:

> This patch works around the performance problems that are still in
> current grep.  Red Hat will probably be using it in its own 2.6.x.
>
> For UTF-8 it should trigger only in the presence of MBCSET, e.g. [a-z]
> or [à] (nad the latter case could be avoided).
>
> For other character sets all brackets, and `.' as well, will trigger it.
>
> Thoughts?
> ---
>  src/dfa.c |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/src/dfa.c b/src/dfa.c
> index 2bc0c0e..775943c 100644
> --- a/src/dfa.c
> +++ b/src/dfa.c
> @@ -3213,6 +3213,15 @@ dfaexec (struct dfa *d, char const *begin, char *end,
>                  continue;
>                }
>
> +         if (backref)
> +              {
> +                *backref = 1;
> +                free(mblen_buf);
> +                free(inputwcs);
> +                *end = saved_end;
> +                return (char *) p;
> +              }
> +
>              /* Can match with a multibyte character (and multi character
>                 collating element).  Transition table might be updated.  */
>              s = transit_state(d, s, &p);

Sounds like a good change, but please add a comment.
Can you suggest a pathologically bad example
with which we can try to come up with a performance-measuring
addition to the test suite?

I'll take a closer look next week.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]