Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algo

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algo

From:	Paolo Bonzini
Subject:	Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm
Date:	Thu, 29 Jul 2010 11:26:32 +0200
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100621 Fedora/3.0.5-1.fc13 Lightning/1.0b2pre Thunderbird/3.0.5

On 07/29/2010 11:05 AM, Pádraig Brady wrote:

I would better recommend to use
the u8_strstr function.


I wonder could we speed that up for UTF-8
by just deferring to strstr() ?
I've not tested this so feel free to bin it.

An alternative idea: check if there is more than 1 character usingu8_mbtouc. If there is one, use the faster u8_strchr algorithm (you canjust use u8_strchr, even though that does a useless conversion back toUTF-8). If there is more than one, use strstr directly.

And this gives another opportunity: we can now define u8_strstr andu16_strstr to always return a pointer to a valid UTF-8 sequence:


/* Find the first occurrence of NEEDLE in HAYSTACK.  For UTF-8 and
   UTF-16, the returned value will never point in the middle of a
   multibyte sequence or surrogate pair.  If this cannot be satisfied,
   for example due to an invalid NEEDLE, NULL is returned.  */

Why? Because the only cases when you can get a false positive are: 1)when you have illegal sequences at the beginning of the needle, whichnow we could easily check; 2) when both the needle and haystack have thesame illegal UTF-8 in it, which I think could be ignored.

For u16_strstr you need to look for an invalid surrogate character atthe beginning of the needle.


Thoughts?  Would anybody give a shot to the above?

Paolo

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v2 5/5] unistr/u8-chr, unistr/u8-strchr: use Boyer-Moore like algorithm., (continued)
- [PATCH v2 5/5] unistr/u8-chr, unistr/u8-strchr: use Boyer-Moore like algorithm., bonzini, 2010/07/23
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Bruno Haible, 2010/07/23
  - Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/23
    - Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Pádraig Brady, 2010/07/27
    - Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/27
    - Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Pádraig Brady, 2010/07/27
    - Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/27
    - Re: ucs4_t type, Bruno Haible, 2010/07/28
    - Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Bruno Haible, 2010/07/28
    - Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Pádraig Brady, 2010/07/29
    - Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini <=
    - Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Pádraig Brady, 2010/07/29
    - Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/29
    - Message not available
    - Message not available
    - Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/29
    - Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Bruno Haible, 2010/07/29
    - Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Paolo Bonzini, 2010/07/29
    - Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Bruno Haible, 2010/07/31
    - guarantees of u8_mbtouc/u8_strmbtouc (was Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm), Paolo Bonzini, 2010/07/29
    - Re: guarantees of u8_mbtouc/u8_strmbtouc, Bruno Haible, 2010/07/31
    - Re: guarantees of u8_mbtouc/u8_strmbtouc, Paolo Bonzini, 2010/07/31
    - Re: guarantees of u8_mbtouc/u8_strmbtouc, Bruno Haible, 2010/07/31

Prev by Date: Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm
Next by Date: Re: Help with openpty
Previous by thread: Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm
Next by thread: Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm
Index(es):
- Date
- Thread