Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters

From:	Paolo Bonzini
Subject:	Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters
Date:	Mon, 12 Jul 2010 15:16:58 +0200
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100621 Fedora/3.0.5-1.fc13 Lightning/1.0b2pre Thunderbird/3.0.5

On 07/12/2010 01:38 AM, Pádraig Brady wrote:

On 11/07/10 15:20, Paolo Bonzini wrote:

On 07/07/2010 03:44 PM, Pádraig Brady wrote:

Subject: [PATCH] unistr/u8-strchr: speed up searching for ASCII
characters

* lib/unistr/u8-strchr.c (u8_strchr): Use strchr() for
the single byte case as it was measured to be 50% faster
than the existing code on x86 linux.  Also add a comment
on why not to use memmem() for the moment for the multibyte case.


If p is surely a valid UTF-8 string, you can do better in general like
this.  Say [q, q+q_len) points to an UTF-8 representation of uc:

   for (; p = strchr (p, *q)&&  memcmp (p+1, q+1, q_len-1); p += q_len)
     ;

   return p;


That would be an improvement if strchr() would skip lots of p at a time,
to counter the function call overhead. However, the first byte of a multibyte
UTF-8 char is the same for a lot of characters, so I'm guessing there would
be lots of false positives in practice?

I guess it depends. Absolutely awful for Greek/Arabic/etc., probablynot too bad for European languages. Also probably not too bad whensearching in mixed single-/multi-byte text (e.g. code with foreignlanguage comments).

A lot of the startup overhead of strchr is to align to a word andmultiply the sought character by 0x1010101. All these could be doneonly once. I wonder if a completely inlined fast strchr would be toocomplex to be worth the improvement...


Paolo

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Pádraig Brady, 2010/07/07
- Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Simon Josefsson, 2010/07/07
  - Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Pádraig Brady, 2010/07/07
    - Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Pádraig Brady, 2010/07/08
- Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Ralf Wildenhues, 2010/07/07
  - Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Pádraig Brady, 2010/07/08
- Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Bruno Haible, 2010/07/11
- Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Paolo Bonzini, 2010/07/11
  - Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Pádraig Brady, 2010/07/11
    - Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Paolo Bonzini <=
    - Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Bruno Haible, 2010/07/18
    - Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Pádraig Brady, 2010/07/20

Prev by Date: _Exit detection
Next by Date: [PATCH] strtod: make it more-accurate typically, and don't require libm
Previous by thread: Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters
Next by thread: Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters
Index(es):
- Date
- Thread