[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Using libunistring for string comparisons et al
From: |
Ludovic Courtès |
Subject: |
Re: Using libunistring for string comparisons et al |
Date: |
Sun, 13 Mar 2011 22:30:48 +0100 |
User-agent: |
Gnus/5.110013 (No Gnus v0.13) Emacs/23.3 (gnu/linux) |
Hi Mark,
Mark H Weaver <address@hidden> writes:
> Unfortunately, the alternatives are not pleasant. We have a bunch of
> bugs in our string handling functions. Currently, our case-insensitive
> string comparisons and case conversions are not correct for several
> languages including German, according to the R6RS among other things.
>
> We could easily fix these problems by using libunistring, which provides
> the operations we need, but only if we use a single string
> representation, and one that is supported by libunistring (UTF-8,
> UTF-16, or UTF-32).
I don’t think so. For instance, you could “upgrade” narrow strings to
UTF-32 and then use libunistring on that. That would fix case-folding
for “Straße”, I guess.
> So, our options appear to be:
>
> * Use only wide strings internally.
>
> * Reimplement several complex functions from libunistring within guile
> (string comparisons and case conversions).
>
> * Convert strings to a libunistring-supported representation, and
> possibly back again, on each operation. For example, this will be
> needed when comparing two narrow strings, when comparing a narrow
> string to a wide string, or when applying a case conversion to a
> narrow string.
>
> Our use of two different internal string representations is another
> problem. Right now, our string comparisons are painfully inefficient.
Inefficient in the (unlikely) case that you’re comparing a narrow and a
wide string of different lengths.
So yes, the current implementation has bugs, but I think most if not all
can be fixed with minimal changes. Would you like to look into it
for 2.0.x?
Using UTF-8 internally has problems of its own, as Mike explained, which
is why it was rejected in the first place.
Thanks,
Ludo’.
- uc_tolower (uc_toupper (x)), Mark H Weaver, 2011/03/10
- Re: uc_tolower (uc_toupper (x)), Mike Gran, 2011/03/10
- Using libunistring for string comparisons et al, Mark H Weaver, 2011/03/11
- Re: Using libunistring for string comparisons et al, Mark H Weaver, 2011/03/11
- Re: Using libunistring for string comparisons et al, Mark H Weaver, 2011/03/11
- Re: Using libunistring for string comparisons et al, Ludovic Courtès, 2011/03/12
- Re: Using libunistring for string comparisons et al, Mark H Weaver, 2011/03/12
- Re: Using libunistring for string comparisons et al,
Ludovic Courtès <=
- Re: Using libunistring for string comparisons et al, Andy Wingo, 2011/03/30
- O(1) accessors for UTF-8 backed strings, Mark H Weaver, 2011/03/12
- Re: O(1) accessors for UTF-8 backed strings, Alex Shinn, 2011/03/12
- Re: O(1) accessors for UTF-8 backed strings, Mark H Weaver, 2011/03/15
- Re: O(1) accessors for UTF-8 backed strings, Alex Shinn, 2011/03/15
- Re: O(1) accessors for UTF-8 backed strings, Andy Wingo, 2011/03/19
- Re: O(1) accessors for UTF-8 backed strings, Andy Wingo, 2011/03/30
- Re: Using libunistring for string comparisons et al, Andy Wingo, 2011/03/30
- Re: Using libunistring for string comparisons et al, Ludovic Courtès, 2011/03/31
- Re: Using libunistring for string comparisons et al, Ludovic Courtès, 2011/03/12