bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algo


From: Paolo Bonzini
Subject: Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm
Date: Tue, 27 Jul 2010 20:39:09 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100621 Fedora/3.0.5-1.fc13 Lightning/1.0b2pre Thunderbird/3.0.5

On 07/27/2010 06:28 PM, Pádraig Brady wrote:
On 27/07/10 19:14, Paolo Bonzini wrote:
On 07/27/2010 06:06 PM, Pádraig Brady wrote:

I would suggest a new function due to the
way I see this function called most often.
I.E. repeatedly with the same character.

Is this really a bottleneck?  i.e., what does u8_uctomb_aux look like in
the profile when do a million u8_strchr calls on an empty string?

Well it would be a bit faster,
but mainly a bit easier to use.
I.E. one could do stuff like:

   while ((f=u8_str_u8_chr (s, "–", 3));

Ok, that's a different usecase that makes more sense. I thought you referred to something like

  char c[6];
  size_t size = u8_uctomb_aux (c, uc, sizeof c);
  ...
  while ((f=u8_str_u8_chr (s, c, size)));

This one instead is less likely to be useful.

However, note that in C1X you could do

  while ((f=u8_strchr (s, u'–')));

BTW, there's an interesting difference between char32_t and ucs4_t, in that the former has "the same size, signedness, and alignment as uint_least32_t", while libunistring uses uint32_t to define the latter. I wonder if libunistring should be changed to:

1) detect _Char32_t (or uchar.h and char32_t) and use it if available,

2) use uint_least32_t if not available.

It would be a no-op everywhere except possibly for some C++ programs, and it wouldn't affect binary compatibility.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]