[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gnulib] new module proposal: strip
From: |
Bruno Haible |
Subject: |
Re: [bug-gnulib] new module proposal: strip |
Date: |
Mon, 4 Sep 2006 14:21:18 +0200 |
User-agent: |
KMail/1.9.1 |
Hello Davide,
The first loop looks fine, safe for multibyte locales. But in the second loop:
> > In multibyte strings you cannot "go backwards". You have to write the
> > algorithm in a way that progresses from the first to the last multibyte
> > character. (*) In this case, you can do so by moving from first to last,
> > memorizing the position of the last non-whitespace character. More
> > precisely, a pointer pointing after this character. When you have
> > reached the end of the string, you put a '\0' where the memoized pointer
> > points to, and are done.
> Done. Thanks for the suggestions.
Well, that's not what I meant. By doing "x--" you are still stepping backwards
byte after byte. You can't safely do that in a multibyte string. Also the
total running times of the strlen calls now sums up to O(n^2) worst-case.
What I meant is something like
char *last_non_space = d;
for (mbi_init(i, d, strlen(d)); mbi_avail(i); mbi_advance(i)) {
...
}
*last_non_space = '\0';
Also, some fallback code should be provided for systems without multibyte
string functions. This fallback code is generally more efficient than the
multibyte code, but only applicable when MB_CUR_MAX == 1. Therefore in
other files (see strstr.c etc.) we generally use this template:
do_something(...)
{
#if HAVE_MBRTOWC
if (MB_CUR_MAX > 1)
{
... here comes the multibyte (<wctype.h>) variant ...
}
else
#endif
{
... here comes the unibyte (<ctype.h>) variant ...
}
}
Bruno